1. 程式人生 > 其它 >ALINK(二十八):特徵工程(七)特徵組合與交叉(二)Cross特徵預測/訓練 (CrossFeaturePredictBatchOp)

ALINK(二十八):特徵工程(七)特徵組合與交叉(二)Cross特徵預測/訓練 (CrossFeaturePredictBatchOp)

Cross特徵預測 (CrossFeaturePredictBatchOp)

Java 類名:com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp

Python 類名:CrossFeaturePredictBatchOp

功能介紹

特徵列組合演算法能夠將選定的離雜湊組合成單列的向量型別的資料。

引數說明

名稱

中文名稱

描述

型別

是否必須?

預設值

outputCol

輸出結果列列名

輸出結果列列名,必選

String

numThreads

元件多執行緒執行緒個數

元件多執行緒執行緒個數

Integer

1

modelStreamFilePath

模型流的檔案路徑

模型流的檔案路徑

String

null

modelStreamScanInterval

掃描模型路徑的時間間隔

描模型路徑的時間間隔,單位秒

Integer

10

modelStreamStartTime

模型流的起始時間

模型流的起始時間。預設從當前時刻開始讀。使用yyyy-mm-dd hh:mm:ss.fffffffff格式,詳見Timestamp.valueOf(String s)

String

null

程式碼示例

Python 程式碼

from pyalink.alink import *
import pandas as pd
useLocalEnv(1)
df = pd.DataFrame([
["1.0", "1.0", 1.0, 1],
["1.0", "1.0", 0.0, 1],
["1.0", "0.0", 1.0, 1],
["1.0", "0.0", 1.0, 1],
["2.0", "3.0", None, 0],
["2.0", "3.0", 1.0, 0],
["0.0", "1.0", 2.0, 0],
["0.0", "1.0", 1.0, 0]])
data = BatchOperator.fromDataframe(df, schemaStr="
f0 string, f1 string, f2 double, label bigint") train = CrossFeatureTrainBatchOp().setSelectedCols(['f0','f1','f2']).linkFrom(data) CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).collectToDataFrame()

Java 程式碼

import org.apache.flink.types.Row;
import com.alibaba.alink.operator.batch.BatchOperator;
import com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp;
import com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp;
import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
import org.junit.Test;
import java.util.Arrays;
import java.util.List;
public class CrossFeaturePredictBatchOpTest {
  @Test
  public void testCrossFeaturePredictBatchOp() throws Exception {
    List <Row> df = Arrays.asList(
      Row.of("1.0", "1.0", 1.0, 1),
      Row.of("1.0", "1.0", 0.0, 1),
      Row.of("1.0", "0.0", 1.0, 1),
      Row.of("1.0", "0.0", 1.0, 1),
      Row.of("2.0", "3.0", null, 0),
      Row.of("2.0", "3.0", 1.0, 0),
      Row.of("0.0", "1.0", 2.0, 0)
    );
    BatchOperator <?> data = new MemSourceBatchOp(df, "f0 string, f1 string, f2 double, label int");
    BatchOperator <?> train = new CrossFeatureTrainBatchOp().setSelectedCols("f0", "f1", "f2").linkFrom(data);
    new CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).print();
  }
}

執行結果

f0

f1

f2

label

cross

1.0

1.0

1.0000

1

$36$0:1.0

1.0

1.0

0.0000

1

$36$9:1.0

1.0

0.0

1.0000

1

$36$6:1.0

1.0

0.0

1.0000

1

$36$6:1.0

2.0

3.0

null

0

$36$22:1.0

2.0

3.0

1.0000

0

$36$4:1.0

0.0

1.0

2.0000

0

$36$29:1.0

0.0

1.0

1.0000

0

$36$2:1.0

Cross特徵訓練 (CrossFeatureTrainBatchOp)

Java 類名:com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp

Python 類名:CrossFeatureTrainBatchOp

功能介紹

特徵列組合演算法能夠將選定的離雜湊組合成單列的向量型別的資料。

引數說明

名稱

中文名稱

描述

型別

是否必須?

預設值

selectedCols

選擇的列名

計算列對應的列名列表

String[]

程式碼示例

Python 程式碼

from pyalink.alink import *
import pandas as pd
useLocalEnv(1)
df = pd.DataFrame([
["1.0", "1.0", 1.0, 1],
["1.0", "1.0", 0.0, 1],
["1.0", "0.0", 1.0, 1],
["1.0", "0.0", 1.0, 1],
["2.0", "3.0", None, 0],
["2.0", "3.0", 1.0, 0],
["0.0", "1.0", 2.0, 0],
["0.0", "1.0", 1.0, 0]])
data = BatchOperator.fromDataframe(df, schemaStr="f0 string, f1 string, f2 double, label bigint")
train = CrossFeatureTrainBatchOp().setSelectedCols(['f0','f1','f2']).linkFrom(data)
CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).collectToDataFrame()

Java 程式碼

import org.apache.flink.types.Row;
import com.alibaba.alink.operator.batch.BatchOperator;
import com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp;
import com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp;
import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
import org.junit.Test;
import java.util.Arrays;
import java.util.List;
public class CrossFeatureTrainBatchOpTest {
  @Test
  public void testCrossFeatureTrainBatchOp() throws Exception {
    List <Row> df = Arrays.asList(
      Row.of("1.0", "1.0", 1.0, 1),
      Row.of("1.0", "1.0", 0.0, 1),
      Row.of("1.0", "0.0", 1.0, 1),
      Row.of("1.0", "0.0", 1.0, 1),
      Row.of("2.0", "3.0", null, 0),
      Row.of("2.0", "3.0", 1.0, 0),
      Row.of("0.0", "1.0", 2.0, 0)
    );
    BatchOperator <?> data = new MemSourceBatchOp(df, "f0 string, f1 string, f2 double, label int");
    BatchOperator <?> train = new CrossFeatureTrainBatchOp().setSelectedCols("f0", "f1", "f2").linkFrom(data);
    new CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).print();
  }
}

執行結果

f0

f1

f2

label

cross

1.0

1.0

1.0000

1

$36$0:1.0

1.0

1.0

0.0000

1

$36$9:1.0

1.0

0.0

1.0000

1

$36$6:1.0

1.0

0.0

1.0000

1

$36$6:1.0

2.0

3.0

null

0

$36$22:1.0

2.0

3.0

1.0000

0

$36$4:1.0

0.0

1.0

2.0000

0

$36$29:1.0

0.0

1.0

1.0000

0

$36$2:1.0