題解「AGC048B Bracket Score」
阿新 • • 發佈:2020-11-13
import jieba excludes = {"什麼","一個","我們","那裡","你們","如今","說道","知道","起來","姑娘","這裡","出來","他們","眾人","自己", "一面","只見","怎麼","兩個","沒有","不是","不知","這個","聽見","這樣","進來","咱們","告訴","就是", "東西","襲人","回來","只是","大家","只得","老爺","丫頭","這些","不敢","出去","所以","不過","的話","不好", "姐姐","探春","鴛鴦","一時","不能","過來","心裡","如此","今日","銀子","幾個","答應","二人","還有","只管", "這麼","說話","一回","那邊","這話","外頭","打發","自然","今兒","罷了","屋裡","那些","聽說","小丫頭","不用","如何"} txt = open("紅樓夢.txt","r",encoding='utf-8').read() ''' 不寫明路徑的話,預設和儲存的python檔案在同一目錄下 注意開啟格式是utf-8,這個可以開啟txt檔案,選擇另存為,注意介面右下角的格式 ''' words = jieba.lcut(txt) ''' 利用jieba庫將紅樓夢的所有語句分成詞彙 ''' counts = {} ''' 建立的一個空的字典 ''' for word in words: if len(word) == 1: #如果長度是一,可能是語氣詞之類的,應該刪除掉 continue else: counts[word] = counts.get(word,0) + 1 ''' 如果字典中沒有這個健(名字)則建立,如果有這個健那麼就給他的計數加一 [姓名:數量],這裡是數量加一 ''' for word in excludes: del(counts[word
''' #這一步:如果列出的干擾詞彙在分完詞後的所有詞彙中那麼刪除 ''' items = list(counts.items()) ''' 把儲存[姓名:個數]的字典轉換成列表 ''' items.sort(key=lambda x:x[1],reverse = True) ''' 對上述列表進行排序,'True'是降序排列 ''' for i in range(20): word,count = items[i] print("{0:<10}{1:>5}".format(word,count))