多層感知機(MLP)演算法原理及Spark MLlib呼叫例項(Scala/Java/Python)
阿新 • • 發佈:2018-12-30
多層感知機
演算法簡介:
多層感知機是基於反向人工神經網路(feedforwardartificial neural network)。多層感知機含有多層節點,每層節點與網路的下一層節點完全連線。輸入層的節點代表輸入資料,其他層的節點通過將輸入資料與層上節點的權重w以及偏差b線性組合且應用一個啟用函式,得到該層輸出。多層感知機通過方向傳播來學習模型,其中我們使用邏輯損失函式以及L-BFGS。K+1層多層感知機分類器可以寫成矩陣形式如下:
中間層節點使用sigmoid方程:
輸出層使用softmax方程:
輸出層中N代表類別數目。
引數:
featuresCol:
型別:字串型。
含義:特徵列名。
labelCol:
型別:字串型。
含義:標籤列名。
layers:
型別:整數陣列型。
含義:層規模,包括輸入規模以及輸出規模。
maxIter:
型別:整數型。
含義:迭代次數(>=0)。
predictionCol:
型別:字串型。
含義:預測結果列名。
seed:
型別:長整型。
含義:隨機種子。
stepSize:
型別:雙精度型。
含義:每次迭代優化步長。
tol:
型別:雙精度型。
含義:迭代演算法的收斂性。
示例:
Scala:
import org.apache.spark.ml.classification.MultilayerPerceptronClassifier import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator // Load the data stored in LIBSVM format as a DataFrame. val data = spark.read.format("libsvm") .load("data/mllib/sample_multiclass_classification_data.txt") // Split the data into train and test val splits = data.randomSplit(Array(0.6, 0.4), seed = 1234L) val train = splits(0) val test = splits(1) // specify layers for the neural network: // input layer of size 4 (features), two intermediate of size 5 and 4 // and output of size 3 (classes) val layers = Array[Int](4, 5, 4, 3) // create the trainer and set its parameters val trainer = new MultilayerPerceptronClassifier() .setLayers(layers) .setBlockSize(128) .setSeed(1234L) .setMaxIter(100) // train the model val model = trainer.fit(train) // compute accuracy on the test set val result = model.transform(test) val predictionAndLabels = result.select("prediction", "label") val evaluator = new MulticlassClassificationEvaluator() .setMetricName("accuracy") println("Accuracy: " + evaluator.evaluate(predictionAndLabels))
Java:
import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; import org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel; import org.apache.spark.ml.classification.MultilayerPerceptronClassifier; import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator; // Load training data String path = "data/mllib/sample_multiclass_classification_data.txt"; Dataset<Row> dataFrame = spark.read().format("libsvm").load(path); // Split the data into train and test Dataset<Row>[] splits = dataFrame.randomSplit(new double[]{0.6, 0.4}, 1234L); Dataset<Row> train = splits[0]; Dataset<Row> test = splits[1]; // specify layers for the neural network: // input layer of size 4 (features), two intermediate of size 5 and 4 // and output of size 3 (classes) int[] layers = new int[] {4, 5, 4, 3}; // create the trainer and set its parameters MultilayerPerceptronClassifier trainer = new MultilayerPerceptronClassifier() .setLayers(layers) .setBlockSize(128) .setSeed(1234L) .setMaxIter(100); // train the model MultilayerPerceptronClassificationModel model = trainer.fit(train); // compute accuracy on the test set Dataset<Row> result = model.transform(test); Dataset<Row> predictionAndLabels = result.select("prediction", "label"); MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator() .setMetricName("accuracy"); System.out.println("Accuracy = " + evaluator.evaluate(predictionAndLabels));
Python:
from pyspark.ml.classification import MultilayerPerceptronClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
# Load training data
data = spark.read.format("libsvm")\
.load("data/mllib/sample_multiclass_classification_data.txt")
# Split the data into train and test
splits = data.randomSplit([0.6, 0.4], 1234)
train = splits[0]
test = splits[1]
# specify layers for the neural network:
# input layer of size 4 (features), two intermediate of size 5 and 4
# and output of size 3 (classes)
layers = [4, 5, 4, 3]
# create the trainer and set its parameters
trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)
# train the model
model = trainer.fit(train)
# compute accuracy on the test set
result = model.transform(test)
predictionAndLabels = result.select("prediction", "label")
evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
print("Accuracy: " + str(evaluator.evaluate(predictionAndLabels)))