此作業要求參見：https://edu.cnblogs.com/campus/nenu/2020Fall/homework/11206

詞頻統計 SPEC

老五在寢室吹牛他熟讀過《魯濱遜漂流記》，在女生面前吹牛熱愛《呼嘯山莊》《簡愛》和《飄》，在你面前說通讀了《戰爭與和平》。但是，他的四級至今沒過。你們幾個私下商量，這幾本大作的單詞量怎麼可能低於四級，大家聽說你學習《構建之法》，一致推舉你寫個程式名字叫wf，統計英文作品的單詞量並給出每個單詞出現的次數，準備用於打臉老五。

希望實現以下效果。以下效果中數字純屬編造。

功能1小檔案輸入。為表明程式能跑，結果真實而不是迫害老五，請他親自鍵
盤在控制檯下輸入命令。

>type test.txt
My English  
is very very pool.

>wf -s test.txt
total 5

very    2
my      1
english 1
is      1
pool    1

為了評估老五的詞彙量而不是閱讀量，total一項中相同的單詞不重複計數數，出現2

次的very計數1次。

因為用過控制檯和命令列，你早就知道，上面的">"叫做命令提示符，是作業系統的一部分，而不是你的程式的一部分。

此功能完成後你的經驗值+10.

git程式碼地址：https://github.com/gongbaby/gong

難點：對檔案的開啟以及用到的引數不記得了，還有就是找到檔案的位置，這個之前糾結了好久，最開始我是用相對路徑做的，但是這樣在功能二和功能三的時候就不知道怎麼做了，在就是做到對單詞的分割，不知道如何進行分割，然後請教了上屆的師兄，瞭解到用到split函式。

重點：運用了collections模組中的counter字串的統計

程式碼：

def texto(file_dir):
    total = 0
    i = 0
    patt = re.compile("\w+")
    count = collections.Counter(patt.findall(
        open(file_dir, 'rt').read()))
    for key, value in count.most_common():
        if count[key] > 1:
            i = i + 1
    file  
= open(file_dir, "r")
    for line in file.readlines():
        word = line.split(" ")
        total += len(word)
    print("total", total - i)
    for key, value in count.most_common():
        print(key,value)

難點：在功能一的基礎上，只需稍加改進功能二

重點：利用了正則表示式對字母，數字，符號進行匹配，變成空字串。把所有單詞變成小寫

功能二程式碼：

def texttw(file_dir_name):
    file_dir = file_dir_name + ".txt"
    total = 0
    i = 0
    patt = re.compile("\w+")
    count = collections.Counter(patt.findall(
        open(file_dir, 'rt').read()))
    for key, value in count.most_common():
        if counts[key] > 1:
            i = i + 1
    file = open(file_dir, "r")
    word = re.findall(r'[a-z0-9^-]+', file.read().lower())
    total = len(word)
    print("total", total - i,end="")
    print(" words")
    for key, value in count.most_common(10):
        print(key, value)

# 重點難點：利用os 顯示出全部的資料夾，以及找到檔案的目錄file_dir = "failpath" + file_name。

功能三程式碼：

def textth(file_folder):
    file_names = os.listdir(file_folder)
    for file_name in file_names:
        print(file_name)
    for file_name in file_names:
        file_dir = "failpath" + file_name
        total = 0
        i = 0
        patt = re.compile("\w+")
        count = collections.Counter(patt.findall(
            open(file_dir, 'rt').read()))
        for key, value in count.most_common():
            if counts[key] > 1:
                i = i + 1
        file = open(file_dir, "r")
        for line in file.readlines():
            word = line.split(" ")
            total += len(word)
        print(file_dir)
        print("total", total - i)
        for key, value in counts.most_common(10):
            print(key, value)
        pass

#重點，難點：對於重定向的理解。

功能4程式碼：

def textf(strTxt):
    regEx = re.compile(u'\t|\n|\.|-|;|\)|\(|\?|"')     
    txtStr = re.sub(regEx, '', strTxt).lower().split()
    printsort(txtStr)
    return
def printsort(strList, isfile = True):
    strDict = { }                                     
    for str in strList:
        strDict[str] = strDict.get(str, 0) + 1
    strDictSort = sorted(strDict.items(), key = lambda item : item[1], reverse = True)
    print("total %d words \n" % len(strDictSort))
    if(len(strDictSort) > 10):
        for i in range(10):
            print("{:5} {:5}".format(strDictSort[i][0], strDictSort[i][1]))
        if(isfile == False):
            print("----")
    else:
        for i in range(len(strDictSort)):
            print("{:5} {:5}".format(strDictSort[i][0], strDictSort[i][1]))
        if(isfile == False):
            print("----")
    return

psp

準備工作	預計花費時間/min	實際花費時間/min	時間差/min	原因
安裝pycharm	30	40	10	總是讓360不知道給我攔截到哪裡去了，以為是病毒，在後續下載別的軟體也有過這個情況，後來就給它解除安裝了，重新安了一個火絨
功能1	60	254	194	看了整體的題目，實在不懂什麼意思，後來一步一步的做，在後期因為縮排吃了很多虧，因為最開始沒注意這個問題，就很多錯誤。
功能2	60	45	15	在功能一的基礎上，就相對好做了，因為我傳入的資料不是很大，python也能夠支援，只需要處理-s的命令
功能3	60	236	176	最開始沒有理解目錄是如何輸入的，也沒理解如何在控制檯將目錄顯示出來，後來參考了同學的功能，也問了同學，理解了過程，前期浪費了時間，後期在檔案目錄在控制檯的輸入也浪費了很多時間。因為這兩個不在同一資料夾中，總是顯示找不到的錯誤。
功能4	60	126	66	對於重定向的概念完全不理解，後來問了好多同學。

宮立秋20200917-2 詞頻統計

詞頻統計 SPEC

宮立秋20200917-2 詞頻統計

20200917-2 詞頻統計

張兵傑 20200917-2 詞頻統計

宮立秋20200917-3 白名單

Python英文文章詞頻統計(14份劍橋真題詞頻統計)

C語言實現英文文字詞頻統計

Python：詞頻統計及排序

詞頻統計例項

spark 詞頻統計

詞頻統計 SPEC

python使用jieba實現簡單的詞頻統計

第八次 Hive 操作與應用詞頻統計

第八次：Hive 操作與應用詞頻統計

042 例項10-文字詞頻統計

宮立秋20201207-總結

Python詞頻統計的3種方法

浙大版《Python 程式設計》題目集第7章-1 詞頻統計

詞頻統計方案與具體實現-elasticsearch、spark、python

leetcode--shell練習之詞頻統計

09 使用python完成詞頻統計

宮立秋20200917-2 詞頻統計

詞頻統計 SPEC

相關推薦