shell指令碼統計檔案中單詞的個數

阿新 • • 發佈：2019-02-06

一、方案 http://www.cnblogs.com/youxuguang/p/5917215.html

方法一：

（1）cat file|sed 's/[,.:;/!?]/ /g'|awk '{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}' #其中file為要操作的檔案，sed中/ /間有一個空格。

（2）sed 's/[,.:;/!?]/ /g' file|awk '{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}' #（1）和（2）效果一致。

方法二：

（1）awk 'BEGIN{RS="[,.:;/!?]"}{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}' file

這裡 -F',' 表明每個詞用逗號分隔  https://zhidao.baidu.com/question/586302142.html
NF其實是number of field, 即整行(或者說record)裡面詞 (更準確的翻譯應該是域)的總數
NF-1 就是倒數第二個詞

sed ′s/\%//g‘
s表示替換，\%就表示百分號，s/\%//將％替換為空，最後的g標誌表示全部替換

即刪除所有的百分號

{for(i=1;i<=NF;i++)a[$i]++;
NF表示單行（記錄）中欄位數，$i表示對應欄位，假設文字是 "a b c d a b a" ,a出現3次所以a[a]++執行了3次，a[a]的值增加了3,這個迴圈完成後a陣列中為已經讀取的行相同內容欄位出現次數

for (i in a) 表示依次迭代a陣列的下標，賦值給變數 i,如上例a陣列的下標會是a,b,c,d(順序是隨機的），這四個下標會被按隨機順序賦值給變數 i。
print i"="a[i],列印欄位 i和其出現次數a[i]
這樣寫每讀取一行都會列印一次已經重複出現過的欄位統計，為什不只列印最終統計呢，像下面這樣

awk '{for(i=1;i<=NF;i++)a[$i]++;}

END{for(i in a)print i" = "a[i]}' tongji.txt

二、驗證

[[email protected] shell]# cat file
hello world,hi girl;how old are you?
where are you from?
how are you?
i am fine!thinks.
and you?
http://www.cnblogs.com/youxuguang/

[[email protected] shell]# cat file|sed 's/[,.:;/!?]/ /g'|awk '{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}'
com 1
http 1
from 1
www 1
i 1
you 4
hi 1
hello 1
youxuguang 1
and 1
world 1
cnblogs 1
where 1
old 1
how 2
fine 1
am 1
are 3
girl 1
thinks 1

[[email protected] shell]# sed 's/[,.:;/!?]/ /g' file|awk '{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}'
com 1
http 1
from 1
www 1
i 1
you 4
hi 1
hello 1
youxuguang 1
and 1
world 1
cnblogs 1
where 1
old 1
how 2
fine 1
am 1
are 3
girl 1
thinks 1

[[email protected] shell]# awk 'BEGIN{RS="[,.:;/!?]"}{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}' file
com 1
http 1
from 1
www 1
i 1
you 4
hi 1
hello 1
youxuguang 1
and 1
world 1
cnblogs 1
where 1
old 1
how 2
fine 1
am 1
are 3
girl 1
thinks 1

shell指令碼統計檔案中單詞的個數

shell指令碼統計檔案中單詞的個數

使用shell指令碼統計檔案中ip出現的次數

統計檔案中單詞個數--c++實現

統計檔案中單詞的個數

shell 指令碼替換檔案中的某個字串

Hadoop 統計檔案中單詞出現的次數

統計檔案中單詞出現的頻次

Shell指令碼統計指定目錄下子目錄中的檔案個數

使用shell指令碼統計原始碼檔案中的註釋行數.(// , /**/)

演算法之“統計字串中單詞的個數”

【OS大作業】用多執行緒統計txt檔案中字元個數（Java實現）

sort +awk+uniq 統計檔案中出現次數最多的前10個單詞

Hadoop 統計檔案中某個單詞出現的次數

shell指令碼統計目錄下檔案數量

linux統計txt檔案中單詞出現次數並排序

linux中sort（統計檔案中出現次數最多的前10個單詞）

Linux系統中執行.sh（Shell指令碼）檔案

shell 指令碼統計資料夾下所有檔案的字元數

做一個詞頻統計程式，該程式具有以下功能基本要求：（1）可匯入任意英文文字檔案（2）統計該英文檔案中單詞數和各單詞出現的頻率（次數），並能將單詞按字典順序輸出。（3）將單詞及頻率寫入檔案。

用python統計檔案中各個單詞出現的次數

shell指令碼統計檔案中單詞的個數

相關推薦