ALINK(二十八):特徵工程(七)特徵組合與交叉(二)Cross特徵預測/訓練 (CrossFeaturePredictBatchOp)
Cross特徵預測 (CrossFeaturePredictBatchOp)
Java 類名:com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp
Python 類名:CrossFeaturePredictBatchOp
功能介紹
特徵列組合演算法能夠將選定的離雜湊組合成單列的向量型別的資料。
引數說明
名稱 |
中文名稱 |
描述 |
型別 |
是否必須? |
預設值 |
outputCol |
輸出結果列列名 |
輸出結果列列名,必選 |
String |
✓ |
|
numThreads |
元件多執行緒執行緒個數 |
元件多執行緒執行緒個數 |
Integer |
1 |
|
modelStreamFilePath |
模型流的檔案路徑 |
模型流的檔案路徑 |
String |
null |
|
modelStreamScanInterval |
掃描模型路徑的時間間隔 |
描模型路徑的時間間隔,單位秒 |
Integer |
10 |
|
modelStreamStartTime |
模型流的起始時間 |
模型流的起始時間。預設從當前時刻開始讀。使用yyyy-mm-dd hh:mm:ss.fffffffff格式,詳見Timestamp.valueOf(String s) |
String |
null |
程式碼示例
Python 程式碼
from pyalink.alink import * import pandas as pd useLocalEnv(1) df = pd.DataFrame([ ["1.0", "1.0", 1.0, 1], ["1.0", "1.0", 0.0, 1], ["1.0", "0.0", 1.0, 1], ["1.0", "0.0", 1.0, 1], ["2.0", "3.0", None, 0], ["2.0", "3.0", 1.0, 0], ["0.0", "1.0", 2.0, 0], ["0.0", "1.0", 1.0, 0]]) data = BatchOperator.fromDataframe(df, schemaStr="f0 string, f1 string, f2 double, label bigint") train = CrossFeatureTrainBatchOp().setSelectedCols(['f0','f1','f2']).linkFrom(data) CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).collectToDataFrame()
Java 程式碼
import org.apache.flink.types.Row; import com.alibaba.alink.operator.batch.BatchOperator; import com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp; import com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp; import com.alibaba.alink.operator.batch.source.MemSourceBatchOp; import org.junit.Test; import java.util.Arrays; import java.util.List; public class CrossFeaturePredictBatchOpTest { @Test public void testCrossFeaturePredictBatchOp() throws Exception { List <Row> df = Arrays.asList( Row.of("1.0", "1.0", 1.0, 1), Row.of("1.0", "1.0", 0.0, 1), Row.of("1.0", "0.0", 1.0, 1), Row.of("1.0", "0.0", 1.0, 1), Row.of("2.0", "3.0", null, 0), Row.of("2.0", "3.0", 1.0, 0), Row.of("0.0", "1.0", 2.0, 0) ); BatchOperator <?> data = new MemSourceBatchOp(df, "f0 string, f1 string, f2 double, label int"); BatchOperator <?> train = new CrossFeatureTrainBatchOp().setSelectedCols("f0", "f1", "f2").linkFrom(data); new CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).print(); } }
執行結果
f0 |
f1 |
f2 |
label |
cross |
1.0 |
1.0 |
1.0000 |
1 |
$36$0:1.0 |
1.0 |
1.0 |
0.0000 |
1 |
$36$9:1.0 |
1.0 |
0.0 |
1.0000 |
1 |
$36$6:1.0 |
1.0 |
0.0 |
1.0000 |
1 |
$36$6:1.0 |
2.0 |
3.0 |
null |
0 |
$36$22:1.0 |
2.0 |
3.0 |
1.0000 |
0 |
$36$4:1.0 |
0.0 |
1.0 |
2.0000 |
0 |
$36$29:1.0 |
0.0 |
1.0 |
1.0000 |
0 |
$36$2:1.0 |
Cross特徵訓練 (CrossFeatureTrainBatchOp)
Java 類名:com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp
Python 類名:CrossFeatureTrainBatchOp
功能介紹
特徵列組合演算法能夠將選定的離雜湊組合成單列的向量型別的資料。
引數說明
名稱 |
中文名稱 |
描述 |
型別 |
是否必須? |
預設值 |
selectedCols |
選擇的列名 |
計算列對應的列名列表 |
String[] |
✓ |
程式碼示例
Python 程式碼
from pyalink.alink import * import pandas as pd useLocalEnv(1) df = pd.DataFrame([ ["1.0", "1.0", 1.0, 1], ["1.0", "1.0", 0.0, 1], ["1.0", "0.0", 1.0, 1], ["1.0", "0.0", 1.0, 1], ["2.0", "3.0", None, 0], ["2.0", "3.0", 1.0, 0], ["0.0", "1.0", 2.0, 0], ["0.0", "1.0", 1.0, 0]]) data = BatchOperator.fromDataframe(df, schemaStr="f0 string, f1 string, f2 double, label bigint") train = CrossFeatureTrainBatchOp().setSelectedCols(['f0','f1','f2']).linkFrom(data) CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).collectToDataFrame()
Java 程式碼
import org.apache.flink.types.Row; import com.alibaba.alink.operator.batch.BatchOperator; import com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp; import com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp; import com.alibaba.alink.operator.batch.source.MemSourceBatchOp; import org.junit.Test; import java.util.Arrays; import java.util.List; public class CrossFeatureTrainBatchOpTest { @Test public void testCrossFeatureTrainBatchOp() throws Exception { List <Row> df = Arrays.asList( Row.of("1.0", "1.0", 1.0, 1), Row.of("1.0", "1.0", 0.0, 1), Row.of("1.0", "0.0", 1.0, 1), Row.of("1.0", "0.0", 1.0, 1), Row.of("2.0", "3.0", null, 0), Row.of("2.0", "3.0", 1.0, 0), Row.of("0.0", "1.0", 2.0, 0) ); BatchOperator <?> data = new MemSourceBatchOp(df, "f0 string, f1 string, f2 double, label int"); BatchOperator <?> train = new CrossFeatureTrainBatchOp().setSelectedCols("f0", "f1", "f2").linkFrom(data); new CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).print(); } }
執行結果
f0 |
f1 |
f2 |
label |
cross |
1.0 |
1.0 |
1.0000 |
1 |
$36$0:1.0 |
1.0 |
1.0 |
0.0000 |
1 |
$36$9:1.0 |
1.0 |
0.0 |
1.0000 |
1 |
$36$6:1.0 |
1.0 |
0.0 |
1.0000 |
1 |
$36$6:1.0 |
2.0 |
3.0 |
null |
0 |
$36$22:1.0 |
2.0 |
3.0 |
1.0000 |
0 |
$36$4:1.0 |
0.0 |
1.0 |
2.0000 |
0 |
$36$29:1.0 |
0.0 |
1.0 |
1.0000 |
0 |
$36$2:1.0 |