shell 統計單詞頻率
阿新 • • 發佈:2018-11-22
#!/bin/bash
#n個出現頻率最高的單詞
help(){ echo "該shell指令碼統計一個文字中出現次數最多的n個單詞"
echo "usage: sh "$0" filename n"
echo "filename 為你要統計的文字名稱 n為要統計的單詞個數"
echo "sh "$0" englist_statment.txt 10"
}
:<<EOF
First Flight
Mr. Johnson had never been up in an aerophane before and he had read a lot about air accidents, so one day when a friend offered to take him for a ride in his own small phane, Mr. Johns
on was very worried about accepting. Finally, however, his friend persuaded him that it was very safe, and Mr. Johnson boarded the plane.
His friend started the engine and began to taxi onto the runway of the airport. Mr. Johnson had heard that the most dangerous part of a flight were the take-off and the landing, so he w
as extremely frightened and closed his eyes.
After a minute or two he opened them again, looked out of the window of the plane, and said to his friend, Look at those people down there. They look as small as ants, dont they?
Those are ants, answered his friend. Were still on the ground.
EOF
if [[ -z "$1" || -z "$2" ]];then
help
exit
fi
if [[ -f "$1" ]];then
statis=$(more "$1" |tr -cs "[a-z][A-Z]" "\n"|tr A-Z a-z|sort|uniq -c|sort -k1nr -k2|head -"$2")
echo "$statis"
else
help
exit 1
fi
[ [email protected] shellscript]# sh statis_word.sh englist_statment.txt 5
10 the
6 and
6 his
5 a
5 friend
#如果沒有正確使用 列印幫助資訊
[[email protected] shellscript]# sh statis_word.sh englist_statment.txt
該shell指令碼統計一個文字中出現次數最多的n個單詞
usage: sh statis_word.sh filename n
filename 為你要統計的文字名稱 n為要統計的單詞個數
sh statis_word.sh englist_statment.txt 10
[ [email protected] shellscript]# tr --help
Usage: tr [OPTION]... SET1 [SET2]
Translate, squeeze, and/or delete characters from standard input,
writing to standard output.
-c, -C, --complement first complement SET1
-d, --delete delete characters in SET1, do not translate
-s, --squeeze-repeats replace each input sequence of a repeated character
that is listed in SET1 with a single occurrence
of that character
-t, --truncate-set1 first truncate SET1 to length of SET2
--help display this help and exit
--version output version information and exit
SETs are specified as strings of characters. Most represent themselves.
Interpreted sequences are:
\NNN character with octal value NNN (1 to 3 octal digits)
\\ backslash
\a audible BEL
\b backspace
\f form feed
\n new line
\r return
\t horizontal tab
\v vertical tab
CHAR1-CHAR2 all characters from CHAR1 to CHAR2 in ascending order
[CHAR*] in SET2, copies of CHAR until length of SET1
[CHAR*REPEAT] REPEAT copies of CHAR, REPEAT octal if starting with 0
[:alnum:] all letters and digits
[:alpha:] all letters
[:blank:] all horizontal whitespace
[:cntrl:] all control characters
[:digit:] all digits
[:graph:] all printable characters, not including space
[:lower:] all lower case letters
[:print:] all printable characters, including space
[:punct:] all punctuation characters
[:space:] all horizontal or vertical whitespace
[:upper:] all upper case letters
[:xdigit:] all hexadecimal digits
[=CHAR=] all characters which are equivalent to CHAR
Translation occurs if -d is not given and both SET1 and SET2 appear.
-t may be used only when translating. SET2 is extended to length of
SET1 by repeating its last character as necessary. Excess characters
of SET2 are ignored. Only [:lower:] and [:upper:] are guaranteed to
expand in ascending order; used in SET2 while translating, they may
only be used in pairs to specify case conversion. -s uses SET1 if not
translating nor deleting; else squeezing uses SET2 and occurs after
translation or deletion.
Report bugs to < [email protected]>.
tr -cs "[A-Z][a-z]" "[\n*]"
#測試下 -c的意思,有一個test0.sh的檔案.裡面有大寫字母 小寫字母 數字
[[email protected] shellscript]# more test0.sh
M C a b 8 6
[[email protected] shellscript]# more test0.sh |tr -c "[A-Z]" "$"
$$M$C$$$$$$$$$$$
[[email protected] shellscript]# more test0.sh |tr -c "[a-z]" "$"
$$$$$$$a$b$$$$$$
[[email protected] shellscript]# more test0.sh |tr -c "[:digit:]" "$"
$$$$$$$$$$$8$6$$
可以看出-c是取反的意思.意思是把除SET1之外的替換為 SET2
-s 就是把連續出現的只保留一個.
[[email protected] shellscript]# more test0.sh |tr -cs "[:digit:]" "$"
$8$6$[[email protected] shellscript]#
tr -cs "[a-z][A-Z]" "\n" 就是把除單詞之外的替換為換行符.然後只保留一個.