MLlib--多層感知機(MLP)演算法原理及Spark MLlib呼叫例項(Scala/Java/Python)
阿新 • • 發佈:2019-01-07
來源:http://blog.csdn.net/liulingyuan6/article/details/53432429
多層感知機
演算法簡介:
多層感知機是基於反向人工神經網路(feedforwardartificial neural network)。多層感知機含有多層節點,每層節點與網路的下一層節點完全連線。輸入層的節點代表輸入資料,其他層的節點通過將輸入資料與層上節點的權重w以及偏差b線性組合且應用一個啟用函式,得到該層輸出。多層感知機通過方向傳播來學習模型,其中我們使用邏輯損失函式以及L-BFGS。K+1層多層感知機分類器可以寫成矩陣形式如下:
中間層節點使用sigmoid方程:
輸出層使用softmax方程:
輸出層中N代表類別數目。
引數:
featuresCol:
型別:字串型。
含義:特徵列名。
labelCol:
型別:字串型。
含義:標籤列名。
layers:
型別:整數陣列型。
含義:層規模,包括輸入規模以及輸出規模。
maxIter:
型別:整數型。
含義:迭代次數(>=0)。
predictionCol:
型別:字串型。
含義:預測結果列名。
seed:
型別:長整型。
含義:隨機種子。
stepSize:
型別:雙精度型。
含義:每次迭代優化步長。
tol:
型別:雙精度型。
含義:迭代演算法的收斂性。
示例:
Scala:
- import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
- import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
- // Load the data stored in LIBSVM format as a DataFrame.
- val data = spark.read.format("libsvm")
-
.load("data/mllib/sample_multiclass_classification_data.txt")
- // Split the data into train and test
- val splits = data.randomSplit(Array(0.6, 0.4), seed = 1234L)
- val train = splits(0)
- val test = splits(1)
- // specify layers for the neural network:
- // input layer of size 4 (features), two intermediate of size 5 and 4
- // and output of size 3 (classes)
- val layers = Array[Int](4, 5, 4, 3)
- // create the trainer and set its parameters
- val trainer = new MultilayerPerceptronClassifier()
- .setLayers(layers)
- .setBlockSize(128)
- .setSeed(1234L)
- .setMaxIter(100)
- // train the model
- val model = trainer.fit(train)
- // compute accuracy on the test set
- val result = model.transform(test)
- val predictionAndLabels = result.select("prediction", "label")
- val evaluator = new MulticlassClassificationEvaluator()
- .setMetricName("accuracy")
- println("Accuracy: " + evaluator.evaluate(predictionAndLabels))
Java:
- import org.apache.spark.sql.Dataset;
- import org.apache.spark.sql.Row;
- import org.apache.spark.sql.SparkSession;
- import org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel;
- import org.apache.spark.ml.classification.MultilayerPerceptronClassifier;
- import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
- // Load training data
- String path = "data/mllib/sample_multiclass_classification_data.txt";
- Dataset<Row> dataFrame = spark.read().format("libsvm").load(path);
- // Split the data into train and test
- Dataset<Row>[] splits = dataFrame.randomSplit(newdouble[]{0.6, 0.4}, 1234L);
- Dataset<Row> train = splits[0];
- Dataset<Row> test = splits[1];
- // specify layers for the neural network:
- // input layer of size 4 (features), two intermediate of size 5 and 4
- // and output of size 3 (classes)
- int[] layers = newint[] {4, 5, 4, 3};
- // create the trainer and set its parameters
- MultilayerPerceptronClassifier trainer = new MultilayerPerceptronClassifier()
- .setLayers(layers)
- .setBlockSize(128)
- .setSeed(1234L)
- .setMaxIter(100);
- // train the model
- MultilayerPerceptronClassificationModel model = trainer.fit(train);
- // compute accuracy on the test set
- Dataset<Row> result = model.transform(test);
- Dataset<Row> predictionAndLabels = result.select("prediction", "label");
- MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator()
- .setMetricName("accuracy");
- System.out.println("Accuracy = " + evaluator.evaluate(predictionAndLabels));
Python:
- from pyspark.ml.classification import MultilayerPerceptronClassifier
- from pyspark.ml.evaluation import MulticlassClassificationEvaluator
- # Load training data
- data = spark.read.format("libsvm")\
- .load("data/mllib/sample_multiclass_classification_data.txt")
- # Split the data into train and test
- splits = data.randomSplit([0.6, 0.4], 1234)
- train = splits[0]
- test = splits[1]
- # specify layers for the neural network:
- # input layer of size 4 (features), two intermediate of size 5 and 4
- # and output of size 3 (classes)
- layers = [4, 5, 4, 3]
- # create the trainer and set its parameters
- trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)
- # train the model
- model = trainer.fit(train)
- # compute accuracy on the test set
- result = model.transform(test)
- predictionAndLabels = result.select("prediction", "label")
- evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
- print("Accuracy: " + str(evaluator.evaluate(predictionAndLabels)))