期末綜合大作業:詞頻統計
阿新 • • 發佈:2018-06-20
ace 技術 分享 nco IV style txt lam bubuko
#1. bigFile = open(‘big.txt‘,mode=‘r‘,encoding=‘utf-8‘) bigText=bigFile.read() bigFile.close() print(bigText) #2. replaceList=[‘,‘,‘.‘,"‘",‘\n‘] for c in replaceList: bigText=bigText.replace(c,‘‘) print(bigText) bigText=bigText.replace(‘‘,‘‘) #3. print(bigText.split(‘ ‘)) bigList = bigText.split(‘‘) #4 bigSet=set(bigList) print(bigList) bigDict={} for word in bigSet: bigDict[word]=bigList.count(word) print(bigDict) for d in bigDict: print(d,bigDict[d]) #5. wordCountList=list(bigDict.items()) print(wordCountList) wordCountList.sort(key=lambda x:x[1],reverse=True) print(wordCountList) #6. for i in range(20): print(wordCountList[i]) #7. bigCountFile=open(‘bigCount.txt‘,mode=‘a‘,encoding=‘utf-8‘) for i in range(len(wordCountList)): bigCountFile.write(str(wordCountList[i][1])+‘‘+wordCountList[i][0]+‘\n‘) bigCountFile.close()
期末綜合大作業:詞頻統計