樸素貝葉斯分類的Python實現
阿新 • • 發佈:2018-12-31
貝葉斯定理:
條件概率:
表示事件B已經發生的前提下,事件A發生的概率,叫做事件B發生下事件A的條件概率。
基本求解公式:
貝葉斯定理:
樸素貝葉斯分類:
基於假定:給定目標值時屬性之間相互條件獨立。
思想基礎:對於給出的待分類項,求解在此項出現的條件下各個類別出現的概率,哪個最大,就認為此待分類項屬於哪個類別。
naiveBayes.py
# 使用樸素貝葉斯分類 def classify(dataSet): numEntries = len(dataSet) # 計算出每種類別的數量 labelCounts = {} for featVec in dataSet: currentLabel = featVec[-1] labelCounts[currentLabel] = labelCounts.get(currentLabel, 0) + 1 # 計算出每個類的先驗概率 prob = {} for key in labelCounts: prob[key] = float(labelCounts[key]) / numEntries return prob # 使用樸素貝葉斯預測 def predict(prob, dataSet, features, newObject): numFeatures = len(dataSet[0]) - 1 # 計算條件概率 for i in range(numFeatures): labelValues = [example[-1] for example in dataSet if example[i] == newObject[features[i]]] labelCounts = {} for currentLabel in labelValues: labelCounts[currentLabel] = labelCounts.get(currentLabel, 0) + 1 for val in prob: prob[val] *= float(labelCounts.get(val, 0)) / len(labelValues) # 找出最大返回 maxProb = -1.0 for val in prob: if prob[val] > maxProb: maxProb = prob[val] label = val return label def main(): # 建立資料集 def createDataSet(): dataSet = [[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']] features = ['no surfacing', 'flippers'] return dataSet, features dataset, features = createDataSet() prob = classify(dataset) print(predict(prob, dataset, features, {'no surfacing': 1, 'flippers': 1})) print(predict(prob, dataset, features, {'no surfacing': 1, 'flippers': 0})) print(predict(prob, dataset, features, {'no surfacing': 0, 'flippers': 1})) print(predict(prob, dataset, features, {'no surfacing': 0, 'flippers': 0})) if __name__ == '__main__': exit(main())
輸出結果:
yes
no
no
no