1. 程式人生 > >樸素貝葉斯演算法的程式碼例項實現(python)

樸素貝葉斯演算法的程式碼例項實現(python)

本文由本人原創,僅作為自己的學習記錄

資料:假設下面是課程資料,課程資料分為,價格A,課時B,銷量C

價格A 課時B 銷量C

現在學校出了一門新的課程,課程價格A=高,課時B=多,需要預測這個課程的銷量

這個問題提出了預測之後的結果,而樸素貝葉斯正好可以滿足這一點,網上大多是直接呼叫API進行預測,實際上最好還是自己實現樸素貝葉斯,樸素貝葉斯公式:P(B|A)=P(A|B)P(B)/P(A),而本文中,公式即為,P(C|AB)=P(AB|C)P(C)/P(AB)=P(A|C)P(B|C)P(C)/P(AB),方法就是分別推算出C為低銷量,中銷量,高銷量時候的概率,然後進行比較,反饋出最大的概率為預測的結果

順便說一句,樸素兩個字意思就是說,AB之間相互獨立,互不影響,實際上價格和課時是存在一定的關係的,但是樸素貝葉斯把它當做獨立來處理,以計算銷量的預測的概率。

下面給出我的程式碼:

#coding=utf-8  
from __future__ import division
from numpy import array

def set_data(price,time,sale):
    price_number =[]
    time_number= []
    sale_number =[]
    for i in price:
        if i=="低":
            price_number.append(0)
        elif i=="中":
            price_number.append(1)
        elif i=="高":
            price_number.append(2)
    for j in time:
        if j=="少":
            time_number.append(0)
        elif j=="中":
            time_number.append(1)
        elif j=="多":
            time_number.append(2)
    for k in sale:
        if k=="低":
            sale_number.append(0)
        elif k=="中":
            sale_number.append(1)
        elif k=="高":
            sale_number.append(2)
    return price_number,time_number,sale_number

def naive_bs(price_number,time_number,sale_number,expected_price,expected_time):
    price_p=[]
    time_p=[]
    sale_p=[]
    m = array(zip(price_number,time_number,sale_number)).T
    for i in range(3):
        price_p.append(price.count(i)/len(price_number)) #計算各項概率
        time_p.append(time.count(i)/len(time_number))
        sale_p.append(sale.count(i)/len(sale_number))

    advance_sale=[]
    p_ex_price = price.count(expected_price)/len(price_number)
    p_ex_time = time.count(expected_time)/len(time_number)
    low_ex_sale=0
    middle_ex_sale=0
    high_ex_sale=0
    
    for i in range(0,len(sale_number)):
        if  sale_number[i]==0:
            low_ex_sale=low_ex_sale+1
        elif sale_number[i]==1:
            middle_ex_sale=middle_ex_sale+1
        elif sale_number[i]==2:
            high_ex_sale=high_ex_sale+1
    #統計p(c)出現的概率
    #計算不同情況
    aa=0
    bb=0
    cc=0
    for i in range(0,len(price_number)):    
        if expected_price==price_number[i] and sale_number[i]==0:
            aa=aa+1
        elif expected_price==price_number[i] and sale_number[i]==1:
            bb=bb+1
        elif expected_price==price_number[i] and sale_number[i]==2:
            cc=cc+1
    p_aa = aa/low_ex_sale
    p_bb =bb/middle_ex_sale
    p_cc = cc/high_ex_sale
    
    print "p(a|c):%s ,%s,%s"%(p_aa,p_bb,p_cc)
    aaa=0
    bbb=0
    ccc=0
    for i in range(0,len(time_number)):    
        if expected_time==time_number[i] and sale_number[i]==0:
            aaa=aaa+1
        elif expected_time==time_number[i] and sale_number[i]==1:
            bbb=bbb+1
        elif expected_time==time_number[i] and sale_number[i]==2:
            ccc=ccc+1
    p_aaa=aaa/low_ex_sale
    p_bbb=bbb/middle_ex_sale
    p_ccc=ccc/high_ex_sale
    print "p(b|c): %s,%s,%s"%(p_aaa,p_bbb,p_ccc)
    final_low_p = p_aa*p_aaa*low_ex_sale/len(sale_number)*1000
    final_midd_p = p_bb*p_bbb*middle_ex_sale/len(sale_number)*1000
    final_high_p = p_cc*p_ccc*high_ex_sale/len(sale_number)*1000
    final_list=[final_low_p,final_midd_p,final_high_p]
    final_index= final_list.index(max(final_list))
    print final_list
    if final_index==0:
        print "銷量預測銷量為低"
    elif final_index==1:
        print "銷量預測銷量為中"
    else:
        print "銷量預測銷量為高" 
if __name__=="__main__":
    price = ["低","高","低","低","中","高","低"]
    time = ["多","中","少","中","中","多","少"]
    sale = ["高","高","高","低","中","高","中"]
    
    expected_price="高" #新課程價格高
    expected_time="高"  #新課程課時多
    if expected_price=="低":
        expected_price_id=0
    elif expected_price=="中":
        expected_price_id=1
    else:
        expected_price_id=2
    if expected_time=="少":
        expected_time_id=0
    elif expected_time=="中":
        expected_time_id=1
    else:
        expected_time_id=2
    price_number,time_number,sale_number= set_data(price, time, sale)
    print price_number,time_number,sale_number
    naive_bs(price_number, time_number, sale_number, expected_price_id, expected_time_id)
    
   

程式碼對三個特徵進行處理,讓屬性分別用0,1,2來進行標識,程式碼是基於價格,課時,銷量三個特徵的列表長度相等,實際上我們拿到的資料應該是不相同的,應該先對資料處理,即進行資料預處理(主要是缺失值與異常值處理)。

下面是我在eclipse裡的執行結果:

[0, 2, 0, 0, 1, 2, 0] [2, 1, 0, 1, 1, 2, 0] [2, 2, 2, 0, 1, 2, 1]
p(a|c):0.0 ,0.0,0.5
p(b|c): 0.0,0.0,0.5
[0.0, 0.0, 142.85714285714286]
預測銷量為高

本文僅作為自己的學習記錄,可能存在很多不足之處。