1. 程式人生 > >【Spark MLlib速成寶典】模型篇04樸素貝葉斯【Naive Bayes】(Python版)

【Spark MLlib速成寶典】模型篇04樸素貝葉斯【Naive Bayes】(Python版)

width pla evaluate 特征 mem order 一個數 ble same

目錄

  樸素貝葉斯原理

  樸素貝葉斯代碼(Spark Python)


樸素貝葉斯原理

  詳見博文:http://www.cnblogs.com/itmorn/p/7905975.html

返回目錄

樸素貝葉斯代碼(Spark Python)

  

  代碼裏數據:https://pan.baidu.com/s/1jHWKG4I 密碼:acq1

# -*-coding=utf-8 -*-  
from pyspark import SparkConf, SparkContext
sc = SparkContext(local)

from pyspark.mllib.regression import
LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel # Load and parse the data 加載和解析數據,將每一個數轉化為浮點數。每一行第一個數作為標記,後面的作為特征 def parsePoint(line): values = [float(x) for x in line.replace(,, ).split( )] return LabeledPoint(values[0], values[1:]) data = sc.textFile("data/mllib/ridge-data/lpsa.data
") print data.collect()[0] #-0.4307829,-1.63735562648104 -2.00621178480549 -1.86242597251066 -1.024....-0.864466507337306 parsedData = data.map(parsePoint) print parsedData.collect()[0] #(-0.4307829,[-1.63735562648,-2.00621178481,-1.86242597251,-1.024....,-0.864466507337]) # Build the model 建立模型 model = LinearRegressionWithSGD.train(parsedData, iterations=1000, step=0.1)
# Evaluate the model on training data 評估模型在訓練集上的誤差 valuesAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features))) MSE = valuesAndPreds .map(lambda vp: (vp[0] - vp[1])**2) .reduce(lambda x, y: x + y) / valuesAndPreds.count() print("Mean Squared Error = " + str(MSE)) #Mean Squared Error = 6.32693963099 # Save and load model 保存模型和加載模型 model.save(sc, "pythonLinearRegressionWithSGDModel") sameModel = LinearRegressionModel.load(sc, "pythonLinearRegressionWithSGDModel") print sameModel.predict(parsedData.collect()[0].features) #-1.86583391312

返回目錄

【Spark MLlib速成寶典】模型篇04樸素貝葉斯【Naive Bayes】(Python版)