樸素貝葉斯演算法的程式碼例項實現(python)
阿新 • • 發佈:2019-02-17
本文由本人原創,僅作為自己的學習記錄
資料:假設下面是課程資料,課程資料分為,價格A,課時B,銷量C
價格A | 課時B | 銷量C |
低 | 多 | 高 |
高 | 中 | 高 |
低 | 少 | 高 |
低 | 中 | 低 |
中 | 中 | 中 |
高 | 多 | 高 |
低 | 少 |
中 |
現在學校出了一門新的課程,課程價格A=高,課時B=多,需要預測這個課程的銷量
這個問題提出了預測之後的結果,而樸素貝葉斯正好可以滿足這一點,網上大多是直接呼叫API進行預測,實際上最好還是自己實現樸素貝葉斯,樸素貝葉斯公式:P(B|A)=P(A|B)P(B)/P(A),而本文中,公式即為,P(C|AB)=P(AB|C)P(C)/P(AB)=P(A|C)P(B|C)P(C)/P(AB),方法就是分別推算出C為低銷量,中銷量,高銷量時候的概率,然後進行比較,反饋出最大的概率為預測的結果
順便說一句,樸素兩個字意思就是說,AB之間相互獨立,互不影響,實際上價格和課時是存在一定的關係的,但是樸素貝葉斯把它當做獨立來處理,以計算銷量的預測的概率。
下面給出我的程式碼:
#coding=utf-8 from __future__ import division from numpy import array def set_data(price,time,sale): price_number =[] time_number= [] sale_number =[] for i in price: if i=="低": price_number.append(0) elif i=="中": price_number.append(1) elif i=="高": price_number.append(2) for j in time: if j=="少": time_number.append(0) elif j=="中": time_number.append(1) elif j=="多": time_number.append(2) for k in sale: if k=="低": sale_number.append(0) elif k=="中": sale_number.append(1) elif k=="高": sale_number.append(2) return price_number,time_number,sale_number def naive_bs(price_number,time_number,sale_number,expected_price,expected_time): price_p=[] time_p=[] sale_p=[] m = array(zip(price_number,time_number,sale_number)).T for i in range(3): price_p.append(price.count(i)/len(price_number)) #計算各項概率 time_p.append(time.count(i)/len(time_number)) sale_p.append(sale.count(i)/len(sale_number)) advance_sale=[] p_ex_price = price.count(expected_price)/len(price_number) p_ex_time = time.count(expected_time)/len(time_number) low_ex_sale=0 middle_ex_sale=0 high_ex_sale=0 for i in range(0,len(sale_number)): if sale_number[i]==0: low_ex_sale=low_ex_sale+1 elif sale_number[i]==1: middle_ex_sale=middle_ex_sale+1 elif sale_number[i]==2: high_ex_sale=high_ex_sale+1 #統計p(c)出現的概率 #計算不同情況 aa=0 bb=0 cc=0 for i in range(0,len(price_number)): if expected_price==price_number[i] and sale_number[i]==0: aa=aa+1 elif expected_price==price_number[i] and sale_number[i]==1: bb=bb+1 elif expected_price==price_number[i] and sale_number[i]==2: cc=cc+1 p_aa = aa/low_ex_sale p_bb =bb/middle_ex_sale p_cc = cc/high_ex_sale print "p(a|c):%s ,%s,%s"%(p_aa,p_bb,p_cc) aaa=0 bbb=0 ccc=0 for i in range(0,len(time_number)): if expected_time==time_number[i] and sale_number[i]==0: aaa=aaa+1 elif expected_time==time_number[i] and sale_number[i]==1: bbb=bbb+1 elif expected_time==time_number[i] and sale_number[i]==2: ccc=ccc+1 p_aaa=aaa/low_ex_sale p_bbb=bbb/middle_ex_sale p_ccc=ccc/high_ex_sale print "p(b|c): %s,%s,%s"%(p_aaa,p_bbb,p_ccc) final_low_p = p_aa*p_aaa*low_ex_sale/len(sale_number)*1000 final_midd_p = p_bb*p_bbb*middle_ex_sale/len(sale_number)*1000 final_high_p = p_cc*p_ccc*high_ex_sale/len(sale_number)*1000 final_list=[final_low_p,final_midd_p,final_high_p] final_index= final_list.index(max(final_list)) print final_list if final_index==0: print "銷量預測銷量為低" elif final_index==1: print "銷量預測銷量為中" else: print "銷量預測銷量為高" if __name__=="__main__": price = ["低","高","低","低","中","高","低"] time = ["多","中","少","中","中","多","少"] sale = ["高","高","高","低","中","高","中"] expected_price="高" #新課程價格高 expected_time="高" #新課程課時多 if expected_price=="低": expected_price_id=0 elif expected_price=="中": expected_price_id=1 else: expected_price_id=2 if expected_time=="少": expected_time_id=0 elif expected_time=="中": expected_time_id=1 else: expected_time_id=2 price_number,time_number,sale_number= set_data(price, time, sale) print price_number,time_number,sale_number naive_bs(price_number, time_number, sale_number, expected_price_id, expected_time_id)
程式碼對三個特徵進行處理,讓屬性分別用0,1,2來進行標識,程式碼是基於價格,課時,銷量三個特徵的列表長度相等,實際上我們拿到的資料應該是不相同的,應該先對資料處理,即進行資料預處理(主要是缺失值與異常值處理)。
下面是我在eclipse裡的執行結果:
[0, 2, 0, 0, 1, 2, 0] [2, 1, 0, 1, 1, 2, 0] [2, 2, 2, 0, 1, 2, 1]
p(a|c):0.0 ,0.0,0.5
p(b|c): 0.0,0.0,0.5
[0.0, 0.0, 142.85714285714286]
預測銷量為高
本文僅作為自己的學習記錄,可能存在很多不足之處。