應用scikit-learn做文字分類
阿新 • • 發佈:2018-12-27
###################################################### #Multinomial Naive Bayes Classifier print '*************************\nNaive Bayes\n*************************' from sklearn.naive_bayes import MultinomialNB from sklearn import metrics newsgroups_test = fetch_20newsgroups(subset = 'test', categories = categories); fea_test = vectorizer.fit_transform(newsgroups_test.data); #create the Multinomial Naive Bayesian Classifier clf = MultinomialNB(alpha = 0.01) clf.fit(fea_train,newsgroup_train.target); pred = clf.predict(fea_test); calculate_result(newsgroups_test.target,pred); #notice here we can see that f1_score is not equal to 2*precision*recall/(precision+recall) #because the m_precision and m_recall we get is averaged, however, metrics.f1_score() calculates #weithed average, i.e., takes into the number of each class into consideration.
注意我最後的3行註釋,為什麼f1≠2*(準確率*召回率)/(準確率+召回率)