樸素貝葉斯-Numpy-對數似然
阿新 • • 發佈:2017-08-22
連續 數學 learn append ocs 似然 mtr 詞匯 reat
《Machine Learning in Action》
為防止連續乘法時每個乘數過小,而導致的下溢出(太多很小的數相乘結果為0,或者不能正確分類)
訓練:
def trainNB0(trainMatrix,trainCategory): numTrainDocs = len(trainMatrix) numWords = len(trainMatrix[0]) pAbusive = sum(trainCategory)/float(numTrainDocs) p0Num = ones(numWords);p1Num = ones(numWords)#計算頻數初始化為1 p0Denom = 2.0;p1Denom = 2.0 #即拉普拉斯平滑 for i in range(numTrainDocs): if trainCategory[i]==1: p1Num += trainMatrix[i] p1Denom += sum(trainMatrix[i]) else: p0Num += trainMatrix[i] p0Denom += sum(trainMatrix[i]) p1Vect = log(p1Num/p1Denom)#註意 p0Vect = log(p0Num/p0Denom)#註意 return p0Vect,p1Vect,pAbusive#返回各類對應特征的條件概率向量 #和各類的先驗概率
分類:
def classifyNB(vec2Classify,p0Vec,p1Vec,pClass1): p1 = sum(vec2Classify * p1Vec) + log(pClass1)#註意 p0 = sum(vec2Classify * p0Vec) + log(1-pClass1)#註意 if p1 > p0: return 1 else: return 0 def testingNB():#流程展示 listOPosts,listClasses = loadDataSet()#加載數據 myVocabList = createVocabList(listOPosts)#建立詞匯表 trainMat = [] for postinDoc in listOPosts: trainMat.append(bagOfWord2VecMN(myVocabList,postinDoc)) p0V,p1V,pAb = trainNB0(trainMat,listClasses)#訓練 #測試 testEntry = [‘love‘,‘my‘,‘dalmation‘] thisDoc = bagOfWord2VecMN(myVocabList,testEntry) print testEntry,‘classified as: ‘,classifyNB(thisDoc,p0V,p1V,pAb)
註意:上述代碼中標有註意的地方,是公式中概率連乘變成了對數概率相加。此舉可以在數學上證明不會影響分類結果,且在實際計算中,避免了因概率因子遠小於1而連乘造成的下溢出。
樸素貝葉斯-Numpy-對數似然