1. 程式人生 > >樸素貝葉斯分類的Python實現

樸素貝葉斯分類的Python實現

貝葉斯定理:

條件概率:

表示事件B已經發生的前提下,事件A發生的概率,叫做事件B發生下事件A的條件概率。

基本求解公式:

貝葉斯定理:

樸素貝葉斯分類:

基於假定:給定目標值時屬性之間相互條件獨立。
思想基礎:對於給出的待分類項,求解在此項出現的條件下各個類別出現的概率,哪個最大,就認為此待分類項屬於哪個類別。

naiveBayes.py

# 使用樸素貝葉斯分類
def classify(dataSet):
    numEntries = len(dataSet)

    # 計算出每種類別的數量
    labelCounts = {}
    for featVec in dataSet:
        currentLabel = featVec[-1]
        labelCounts[currentLabel] = labelCounts.get(currentLabel, 0) + 1

    # 計算出每個類的先驗概率
    prob = {}
    for key in labelCounts:
        prob[key] = float(labelCounts[key]) / numEntries

    return prob

# 使用樸素貝葉斯預測
def predict(prob, dataSet, features, newObject):
    numFeatures = len(dataSet[0]) - 1

    # 計算條件概率
    for i in range(numFeatures):
        labelValues = [example[-1] for example in dataSet if example[i] == newObject[features[i]]]
        labelCounts = {}
        for currentLabel in labelValues:
            labelCounts[currentLabel] = labelCounts.get(currentLabel, 0) + 1
        for val in prob:
            prob[val] *= float(labelCounts.get(val, 0)) / len(labelValues)

    # 找出最大返回
    maxProb = -1.0
    for val in prob:
        if prob[val] > maxProb:
            maxProb = prob[val]
            label = val
    return label


def main():
    # 建立資料集
    def createDataSet():
        dataSet = [[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']]
        features = ['no surfacing', 'flippers']
        return dataSet, features

    dataset, features = createDataSet()

    prob = classify(dataset)

    print(predict(prob, dataset, features, {'no surfacing': 1, 'flippers': 1}))
    print(predict(prob, dataset, features, {'no surfacing': 1, 'flippers': 0}))
    print(predict(prob, dataset, features, {'no surfacing': 0, 'flippers': 1}))
    print(predict(prob, dataset, features, {'no surfacing': 0, 'flippers': 0}))

if __name__ == '__main__':
    exit(main())

輸出結果:

yes
no
no
no