一個基於Python的情感分析案例

阿新 • • 發佈：2018-12-11

情感分析：又稱為傾向性分析和意見挖掘，它是對帶有情感色彩的主觀性文字進行分析、處理、歸納和推理的過程，其中情感分析還可以細分為情感極性（傾向）分析，情感程度分析，主客觀分析等。

情感極性分析的目的是對文字進行褒義、貶義、中性的判斷。在大多應用場景下，只分為兩類。例如對於“喜愛”和“厭惡”這兩個詞，就屬於不同的情感傾向。

背景交代：爬蟲京東商城某一品牌紅酒下所有評論，區分好評和差評，提取特徵詞，用以區分新的評論。

示例1（好評）

è¿éåå¾çæè¿°

示例2（差評）

è¿éåå¾çæè¿°

讀取文字檔案

def text():
     f1 = open('E:/工作檔案/情感分析案例1/good.txt','r',encoding='utf-8') 
     f2 = open('E:/工作檔案/情感分析案例1/bad.txt','r',encoding='utf-8')
     line1 = f1.readline()
     line2 = f2.readline()
     str = ''
     while line1:
         str += line1
         line1 = f1.readline()
     while line2:
         str += line2
         line2 = f2.readline()
     f1.close()
     f2.close()
     return str

把單個詞作為特徵

def bag_of_words(words):
     return dict([(word,True) for word in words])

print(bag_of_words(text()))

è¿éåå¾çæè¿°

把雙個詞作為特徵，並使用卡方統計的方法，選擇排名前1000的雙詞

def  bigram_words(words,score_fn=BigramAssocMeasures.chi_sq,n=1000):
     bigram_finder=BigramCollocationFinder.from_words(words)
     bigrams = bigram_finder.nbest(score_fn,n)
     newBigrams = [u+v for (u,v) in bigrams]
     a = bag_of_words(words)
     b = bag_of_words(newBigrams)
     a.update(b)  #把字典b合併到字典a中
     return a 

print(bigram_words(text(),score_fn=BigramAssocMeasures.chi_sq,n=1000))

è¿éåå¾çæè¿°

結巴分詞工具進行分詞及詞性標註三種分詞模式： A、精確模式：試圖將句子最精確地切開，適合文字分析。預設是精確模式。 B、全模式：把句子中所有的可以成詞的詞語都掃描出來, 速度非常快，但是不能解決歧義 C、搜尋引擎模式：在精確模式的基礎上，對長詞再次切分，提高召回率，適合用於搜尋引擎分詞注：當指定jieba.cut的引數HMM=True時，就有了新詞發現的能力。

import jieba

def read_file(filename):
     stop = [line.strip() for line in  open('E:/工作檔案/情感分析案例1/stop.txt','r',encoding='utf-8').readlines()]  #停用詞
     f = open(filename,'r',encoding='utf-8')
     line = f.readline()
     str = []
     while line:
         s = line.split('\t')
         fenci = jieba.cut(s[0],cut_all=False)  #False預設值：精準模式
         str.append(list(set(fenci)-set(stop)))
         line = f.readline()
     return str

from nltk.probability import  FreqDist,ConditionalFreqDist
from nltk.metrics import  BigramAssocMeasures

獲取資訊量最高(前number個)的特徵(卡方統計)

def jieba_feature(number):   
     posWords = []
     negWords = []
     for items in read_file('E:/工作檔案/情感分析案例1/good.txt'):#把集合的集合變成集合
         for item in items:
            posWords.append(item)
     for items in read_file('E:/工作檔案/情感分析案例1/bad.txt'):
         for item in items:
            negWords.append(item)

     word_fd = FreqDist() #可統計所有詞的詞頻
     cond_word_fd = ConditionalFreqDist() #可統計積極文字中的詞頻和消極文字中的詞頻

     for word in posWords:
         word_fd[word] += 1
         cond_word_fd['pos'][word] += 1

     for word in negWords:
         word_fd[word] += 1
         cond_word_fd['neg'][word] += 1

     pos_word_count = cond_word_fd['pos'].N() #積極詞的數量
     neg_word_count = cond_word_fd['neg'].N() #消極詞的數量
     total_word_count = pos_word_count + neg_word_count

     word_scores = {}#包括了每個詞和這個詞的資訊量

     for word, freq in word_fd.items():
         pos_score = BigramAssocMeasures.chi_sq(cond_word_fd['pos'][word],  (freq, pos_word_count), total_word_count) #計算積極詞的卡方統計量，這裡也可以計算互資訊等其它統計量
         neg_score = BigramAssocMeasures.chi_sq(cond_word_fd['neg'][word],  (freq, neg_word_count), total_word_count) 
         word_scores[word] = pos_score + neg_score #一個詞的資訊量等於積極卡方統計量加上消極卡方統計量

     best_vals = sorted(word_scores.items(), key=lambda item:item[1],  reverse=True)[:number] #把詞按資訊量倒序排序。number是特徵的維度，是可以不斷調整直至最優的
     best_words = set([w for w,s in best_vals])
     return dict([(word, True) for word in best_words])

調整設定，分別從四種特徵選取方式開展並比較效果

def build_features():
     #feature = bag_of_words(text())#第一種：單個詞
     #feature = bigram(text(),score_fn=BigramAssocMeasures.chi_sq,n=500)#第二種：雙詞
     #feature =  bigram_words(text(),score_fn=BigramAssocMeasures.chi_sq,n=500)#第三種：單個詞和雙個詞
     feature = jieba_feature(300)#第四種：結巴分詞

     posFeatures = []
     for items in read_file('E:/工作檔案/情感分析案例1/good.txt'):
         a = {}
         for item in items:
            if item in feature.keys():
                a[item]='True'
         posWords = [a,'pos'] #為積極文字賦予"pos"
         posFeatures.append(posWords)
     negFeatures = []
     for items in read_file('E:/工作檔案/情感分析案例1/bad.txt'):
         a = {}
         for item in items:
            if item in feature.keys():
                a[item]='True'
         negWords = [a,'neg'] #為消極文字賦予"neg"
         negFeatures.append(negWords)
     return posFeatures,negFeatures

獲取訓練資料

這裡需要安裝幾個模組：scipy、numpy、sklearn scipy及numpy模組需要訪問http://www.lfd.uci.edu/~gohlke/pythonlibs，找到scipy、numpy，下載對應版本的whl

import sklearn
from nltk.classify.scikitlearn import  SklearnClassifier
from sklearn.svm import SVC, LinearSVC,  NuSVC
from sklearn.naive_bayes import  MultinomialNB, BernoulliNB
from sklearn.linear_model import  LogisticRegression
from sklearn.metrics import  accuracy_score

print('BernoulliNB`s accuracy is %f'  %score(BernoulliNB()))
print('MultinomiaNB`s accuracy is %f'  %score(MultinomialNB()))
print('LogisticRegression`s accuracy is  %f' %score(LogisticRegression()))
print('SVC`s accuracy is %f'  %score(SVC()))
print('LinearSVC`s accuracy is %f'  %score(LinearSVC()))
print('NuSVC`s accuracy is %f'  %score(NuSVC()))

一個基於Python的情感分析案例

基於keras 的 python情感分析案例IMDB影評情感分析

一個基於Python的情感分析案例

基於Python的情感分析案例

基於LVD、貝葉斯模型演算法實現的電商行業商品評論與情感分析案例

【Python專案】基於文字情感分析的電商評論重排序（以京東為例）（附程式碼）

基於情感詞典的python情感分析

一個基於Python的shell自動化框架ShutIt

初識TPOT：一個基於Python的自動化機器學習開發工具

一個基於python+selenium的page-object自動化測試框架

一個基於python簡單的裝飾器例項

部署一個基於python語言的web發布環境

Python資料分析案例實戰

ShutIt：一個基於Python的shell自動化框架

【NLP】百度AI平臺自然語言處理API呼叫（情感分析案例）

[原始碼和文件分享]基於Python實現的論壇帖子情感分析

NLP之情感分析：基於python程式設計(jieba庫)實現中文文字情感分析(得到的是情感評分)

機器學習演算法Python實現：基於情感詞典的文字情感分析

情感分析 | 一份就職宣誓也許就可以預測一個國家未來幾年的政治形勢

基於Python分析金庸小說裏的主角，原來他才是真正的主角！

用Python設計一個基於命令行的圖形界面

一個基於Python的情感分析案例

相關推薦