Linux之12——常用統計命令之uniq
一,uniq幹什麼用的
文字中的重複行,基本上不是我們所要的,所以就要去除掉。linux下有其他命令可以去除重複行,但是我覺得uniq還是比較方便的一個。使用uniq的時候要注意以下二點
1,對文字操作時,它一般會和sort命令進行組合使用,因為uniq 不會檢查重複的行,除非它們是相鄰的行。如果您想先對輸入排序,使用sort -u。
2,對文字操作時,若域中為先空字元(通常包括空格以及製表符),然後非空字元,域中字元前的空字元將被跳過
二,uniq引數說明
$ uniq --help 用法:uniq [選項]... [檔案] 從輸入檔案或者標準輸入中篩選相鄰的匹配行並寫入到輸出檔案或標準輸出。不附加任何選項時匹配行將在首次出現處被合併。
長選項必須使用的引數對於短選項時也是必需使用的。
-c, --count //在每行前加上表示相應行目出現次數的字首編號
-d, --repeated //只輸出重複的行
-D, --all-repeated //只輸出重複的行,不過有幾行輸出幾行
-f, --skip-fields=N //-f 忽略的段數,-f 1 忽略第一段
-i, --ignore-case //不區分大小寫
-s, --skip-chars=N //根-f有點像,不過-s是忽略字元, -s 5 就忽略後面5個字元
-u, --unique //去除重複的後,全部顯示出來,根mysql的distinct功能上有點像
-z, --zero-terminated //end lines with 0 byte, not newline
-w, --check-chars=N //對每行第N 個字元以後的內容不作對照
--help //顯示此幫助資訊並退出
--version //顯示版本資訊並退出
三,測試文字檔案uniqtest
this is a test this is a test this is a test i am tank i love tank i love tank this is a test whom have a try WhoM have a try you have a try i want to abroad those are good men we are good men
四,例項詳解
從例子中我們可以看出,uniq的一個特性,檢查重複行的時候,只會檢查相鄰的行。重複資料,肯定有很多不是相鄰在一起的。
$ uniq -c uniqtest 3 this is a test 1 i am tank 2 i love tank 1 this is a test //和第一行是重複的 1 whom have a try 1 WhoM have a try 1 you have a try 1 i want to abroad 1 those are good men 1 we are good men
這樣就可以解決上個例子中提到的問題
$ sort uniqtest |uniq -c 1 WhoM have a try 1 i am tank 2 i love tank 1 i want to abroad 4 this is a test 1 those are good men 1 we are good men 1 whom have a try 1 you have a try
uniq -d 只顯示重複的行
$ uniq -d -c uniqtest 3 this is a test 2 i love tank
uniq -D 只顯示重複的行,並且把重複幾行都顯示出來。他不能和-c一起使用
$ uniq -D uniqtest this is a test this is a test this is a test i love tank i love tank
在這裡those只有一行,顯示的卻是重複了,這是因為,-f 1 忽略了第一列,檢查重複從第二欄位開始的。
$ uniq -f 1 -c uniqtest 3 this is a test 1 i am tank 2 i love tank 1 this is a test 2 whom have a try //與下面一行為啥沒算重複?(下面一行have前多了個空格) 1 you have a try 1 i want to abroad 2 those are good men //與下面一行算重複
檢查的時候,不區分大小寫
$ uniq -i -c uniqtest 3 this is a test 1 i am tank 2 i love tank 1 this is a test 2 whom have a try //一個大寫,一個小寫 1 you have a try 1 i want to abroad 1 those are good men 1 we are good men
檢查的時候,不考慮前4個字元,這樣whom have a try 就和 you have a try 就一樣了。
$ uniq -s 4 -c uniqtest 3 this is a test 1 i am tank 2 i love tank 1 this is a test 3 whom have a try //跟上一個例子有什麼不同 1 i want to abroad 1 those are good men 1 we are good men
去重複的項,然後全部顯示出來
$ uniq -u uniqtest i am tank this is a test whom have a try WhoM have a try you have a try i want to abroad those are good men we are good men
對每行第2個字元以後的內容不作檢查,所以i am tank 根 i love tank就一樣了。
$ uniq -w 2 -c uniqtest 3 this is a test 3 i am tank 1 this is a test 1 whom have a try 1 WhoM have a try 1 you have a try 1 i want to abroad 1 those are good men 1 we are good men