文本處理工具介紹

阿新 • • 發佈：2018-08-18

重復行 ets tailf 混合默認字母順序 window 解釋最長

這篇文章主要介紹一些文本處理工具，包括less、cat、head、tail、cut
基礎不牢，地動山搖

cat命令

作用：顯示文本內容
選項：

-A：顯示所有特殊字符，包括空格、windows中的^M
-E：顯示行結束符$
-n: 對顯示出的每一行進行編號
-b：非空行編號
-s：壓縮連續的空行成一行

-A

在windows下新建一個文件，編輯內容如下：
aa
bb
cc
使用rz命令上次到linux
[root@centos7 data ]#cat -A test.txt 
aa^M$                                    ##顯示^M的特殊字符，這表示windows中的回車;$表示行結束符，等同於-E.
bb^M$
cc[root@centos7 data ]#
擴展：在windows中的換行符由回車和換行即\r\n表示，\r表示回車，\n表示換行，而在linux中的換行符僅用\n表示，於是多出來的\r被解釋成了^M，在linux 編輯文件時發現^M，可以確定在windows中編輯過，特別是在允許腳本的時候，方便排錯

cat編輯文件

[root@centos7 data ]#cat  f1
a
b
c                             ##   Ctrl+d結束並退出
[root@centos7 data ]#cat f1
a
b
c

[root@centos7 data ]#cat > f2 <<EOF
> XIN
> YUANHONLI
> HAH
> EOF                                  ## 末尾和上面的EOF要對應，也可以使用其他的單詞表示
[root@centos7 data ]#cat f2
XIN
YUANHONLI
HAH

tac

即，cat命令反過來，
作用：垂直方向，倒過來顯示

cc[root@centos7 data ]#cat test
aa 
bb
cc 
dd 
[root@centos7 data ]#tac test
dd 
cc 
bb
aa

rev

水平方向，倒過來顯示

[root@centos7 data ]#cat test 
aabbcc
[root@centos7 data ]#rev test
ccbbaa

head

作用：顯示文本前#行內容
語法：head [OPTION]... [FILE]...
選項：
-c #：指定獲取前#字節
-n #: 指定獲取前#行
-# ：指定行數

[root@centos7 data ]#head -c 3 test
aab[root@centos7 data ]#

[root@centos7 data ]#head -n 3 /etc/fstab 

#
# /etc/fstab

使用head -c取隨機數（要求：大小寫字母、數字，10位長度）
[root@centos7 data ]#cat /dev/urandom          ## urandom是一個設備，存儲隨機數
[root@centos7 data ]#tr -dc "[[:alnum:]]" < /dev/urandom |head -c 10    ##使用tr刪除除字母、數字的所有字符，然後head取
si8eE8JYSI[root@centos7 data ]#

tail

作用：顯示文本後#行內容
語法：tail [OPTION]... [FILE]...
選項：
-c #：指定獲取後#字節
-n #：指定獲取後#行
-# : 指定行數
-f ：跟蹤文件描述符，常用於日誌監控
-F ：跟蹤文件名
tailf類似tail -f

-f 和 -F的區別：-F是跟蹤文件名，一旦文件被刪除，則停止跟蹤。而-f即使文件被刪除，也還跟蹤。

[root@centos7 data ]#cat test 
aabbcc
[root@centos7 data ]#tail -c 2 test
c                                        ## 默認會將末尾的換行符也當成一個字節

[root@centos7 data ]#cat f1
a
b
c

[root@centos7 data ]#tail -c 1 f1

[root@centos7 data ]#tail -c 2 f1
c                                          ## 同上，末尾的換行符也是一個字節     
[root@centos7 data ]#tail -c 3 f1

c

cut

作用：根據分隔符，取特定的列
cut [OPTION]... [FILE]...
選項：
-d delimiter:指定分隔符，默認是tab
-f fileds：
#：第#列
#,#[,#]:不連續的多個列，例如1,3,6
#-#：連續的多列，例如1-6
混合使用：1-3,7
-c:按字符數取列
--output-delimiter=string:指定輸出的分隔符，即自定義分隔符

在/etc/passwd文件中取出所有用戶的UID？
[root@centos7 data ]#cut -d: -f3 /etc/passwd
0
1
2
3
4
5
6
7
8
11
12

取出磁盤利用率
[root@centos7 data ]#df |tr -s ‘ ‘ |cut -d " " -f5|cut -d% -f1      ##因為空格不只一個，所以使用tr -s壓縮為一個空格，然後再取
Use
7
0
0
2
0
1
16
1
100
或者直接使用tr壓縮時替換分隔符為%，直接取
[root@centos7 data ]#df |tr -s ‘ ‘ % |cut -d% -f5
Use
7
0
0
2
0
1
16
1
100

自定義分隔符
[root@centos7 data ]#cut -d: -f1,3 --output-delimiter=+  /etc/passwd
root+0
bin+1
daemon+2
adm+3

[root@centos7 data ]#cut -d: -f1,3 --output-delimiter===  /etc/passwd
root==0
bin==1
daemon==2
adm==3
lp==4
sync==5
shutdown==6

按字符數取列
[root@centos7 data ]#cut -c1-3 /etc/passwd
roo
bin
dae
adm
lp:
syn
shu
hal
mai

取ip地址:  先取第二行，然後取列
CentOS6
[root@CentOS6 ~ ]#ifconfig eth0 |grep -w "inet"|tr -s " " :|cut -d: -f4   ##Centos6以：作為分隔符
192.168.64.128

CentOS7:
[root@centos7 data ]#ifconfig ens33 |grep -w "inet" |tr -s " " |cut -d" " -f3
192.168.64.134

取Centos系統的主版本號：
[root@centos7 data ]#cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)
[root@CentOS6 ~ ]#cat /etc/centos-release 
CentOS release 6.10 (Final)

[root@CentOS6 ~ ]#tr -dc "[:digit:]." < /etc/centos-release |cut -d. -f1        ##除了數字和點不刪，其他全部刪除
6
[root@centos7 data ]#tr -dc "[:digit:]." < /etc/centos-release |cut -d. -f1
7

wc

word count的簡寫，
作用：統計一個文件有多少行，多少單詞、多少字節、多少字符（註意：字節和字符不一樣）
選項：
-l 只計數行數
-w 只計數單詞總數
-c 只計數字節總數
-m 只計數字符總數
-L 顯示文件中最長行的長度

默認顯示行數、單詞、字節數
[root@centos7 data ]#cat test 
aabbcc
[root@centos7 data ]#wc test
1 1 7 test                              ## 7個字節是因為末尾的換行符的存在
由於結果帶有文件名，不方便後期做運算，可以用下面的方式僅顯示數字
[root@centos7 data ]#cat test|wc 
      1       1       7

[root@centos7 data ]#cat /etc/issue
\S
Kernel \r on an \m

[root@centos7 data ]#wc -l /etc/issue
3 /etc/issue
[root@centos7 data ]#cat /etc/issue|wc -l
3
[root@centos7 data ]#cat /etc/issue|wc -w
6
[root@centos7 data ]#cat /etc/issue|wc -c
23
[root@centos7 data ]#cat /etc/issue|wc -m
23

統計當前登錄用戶數
[root@centos7 data ]#who
root     :0           2018-08-18 10:50 (:0)
root     pts/0        2018-08-18 10:52 (:0)
root     pts/1        2018-08-18 10:55 (192.168.64.1)
root     pts/2        2018-08-18 13:26 (192.168.64.1)
root     pts/3        2018-08-18 14:30 (192.168.64.1)
[root@centos7 data ]#who |wc -l
5

sort

作用：指定分隔符，根據第幾列對文件進行排序
語法：sort [options] file(s)
選項：
-t ：指定分隔符，等同於cut的-d選項
-k # ：對第幾列進行排序
-n ：按數字大小進行排序，默認按照字母順序排序
-r ：倒序
-R : 隨機排序
-u ：刪除輸出中的重復行

取出/etc/passwd文件中的第一列和第三列，並按照數字排序
[root@centos7 data ]#cut -d: -f1,3 /etc/passwd |sort -t: -k2 -n
root:0
bin:1
daemon:2
adm:3
lp:4
sync:5
shutdown:6
halt:7
mail:8
operator:11
games:12
ftp:14
rpcuser:29
rpc:32
ntp:38

倒序
[root@centos7 data ]#cut -d: -f1,3 /etc/passwd |sort -t: -k2 -nr
nfsnobody:65534
xin:1000
polkitd:999
libstoragemgmt:998
colord:997
saslauth:996
setroubleshoot:995
chrony:994
unbound:993
gluster:992
geoclue:991
gnome-initial-setup:990

去重
[root@centos7 data ]#cat f1
aa
aa
bb
bb
cc
dd
[root@centos7 data ]#sort -u f1
aa
bb
cc
dd

隨機抽取學號
[root@centos7 data ]#seq 72 |sort -R|head -n1
14
[root@centos7 data ]#seq 72 |sort -R|head -n1
40
[root@centos7 data ]#seq 72 |sort -R|head -n1
67

uniq

作用：刪除相鄰的重復的行
語法:uniq [OPTION]... [FILE]...

選項：
-c: 顯示每行重復出現的次數
-d: 僅顯示重復過的行
-u: 僅顯示不曾重復的行
常和sort 命令一起配合使用：
sort userlist.txt | uniq -c

[root@centos7 data ]#cat f1
aa
aa
bb
aa
bb
bb
cc
dd
dd
[root@centos7 data ]#uniq f1           ## 默認uniq僅刪除相鄰的重復的行
aa
bb
aa
bb
cc
dd

[root@centos7 data ]#sort f1|uniq       ##可sort先排序，再uniq刪除重復行
aa
bb
cc
dd

[root@centos7 data ]#sort f1|uniq -c    ## 統計重復出現的次數
      3 aa
      3 bb
      1 cc
      2 dd

統計一篇英語文檔中每個單詞出現多少次？並統計出現頻率最高的前3個單詞？
[root@centos7 data ]#cat f1
aa
aa
bb yy
aa www
bb
bb  zzz
cc  yy
dd  ww
dd
[root@centos7 data ]#tr -s " " "\n" < f1 |sort|uniq -c
      3 aa
      3 bb
      1 cc
      2 dd
      1 ww
      1 www
      2 yy
      1 zzz
[root@centos7 data ]#tr -s " " "\n" < f1 |sort|uniq -c|sort -nr|head -n3
      3 bb
      3 aa
      2 yy
[root@centos7 data ]#

如何取出兩個文件的交集，即相同的行(保證一個文件中沒有重復的行)
[root@centos7 data ]#cat f1 
aa
bb yy
aa www
bb
bb  zzz
cc  yy
dd  ww
dd
[root@centos7 data ]#cat f2
aa
bb yy
bb
cc  yy
zz
sss
[root@centos7 data ]#cat f1 f2|sort |uniq -d
aa
bb
bb yy
cc  yy
或
[root@centos7 data ]#grep -f f1 f2
aa
bb yy
bb
cc  yy

對httpd的access訪問日誌，判斷有多少ip在訪問，訪問次數分別是多少？取出前10個訪問量最多的ip地址？
[root@centos7 data ]#cat access_log 
192.168.32.7 - - [30/Jul/2018:10:15:34 +0800] "GET / HTTP/1.0" 403 4961 "-" "ApacheBench/2.3"
192.168.32.7 - - [30/Jul/2018:10:15:34 +0800] "GET / HTTP/1.0" 403 4961 "-" "ApacheBench/2.3"
192.168.32.7 - - [30/Jul/2018:10:15:34 +0800] "GET / HTTP/1.0" 403 4961 "-" "ApacheBench/2.3"
192.168.32.7 - - [30/Jul/2018:10:15:34 +0800] "GET / HTTP/1.0" 403 4961 "-" "ApacheBench/2.3"
192.168.32.7 - - [30/Jul/2018:10:15:34 +0800] "GET / HTTP/1.0" 403 4961 "-" "ApacheBench/2.3"
[root@centos7 data ]#cut -d" " -f1 access_log |sort |uniq -c
   2000 192.168.32.17
      5 192.168.32.5
   1100 192.168.32.7
[root@centos7 data ]#cut -d" " -f1 access_log |sort |uniq -c|sort -nr|head 
   2000 192.168.32.17
   1100 192.168.32.7
      5 192.168.32.5

diff

作用：比較兩個文件有什麽不同
選項:
-u

[root@centos7 data ]#cat f1 
aa
bb yy
a www
bb
bb  zzz
cc  yy
dd  ww
dd
[root@centos7 data ]#cat f2
aa
bb yy
bb
cc  yy
zz
sss
[root@centos7 data ]#diff -u f1 f2
--- f1  2018-08-18 16:45:43.484986457 +0800   ## -開頭表示第一個文件
+++ f2  2018-08-18 16:31:53.353991142 +0800   ## +開頭表示第二個文件
@@ -1,8 +1,6 @@                               ## 比較的範圍：f1的1-8行；f2的1-6行
 aa                                           ## 前面為空表示兩個文件的交集
 bb yy
-a www                                        ## 即f1多出a www,刪除後兩個文件相同
 bb
-bb  zzz
 cc  yy
-dd  ww
-dd
+zz                                           ## 即f2多出zz,刪除後兩個文件相同
+sss

建議：vimdiff f1 f2

練習

找出ifconfig "網卡名"結果中本機的ipv4地址？

CentOS6
[root@CentOS6 ~ ]#ifconfig eth0 | grep -w "inet" |tr -s " " :|cut -d: -f4   ##Centos6以：作為分隔符
192.168.64.128

CentOS7:
[root@centos7 data ]#ifconfig ens33 | grep -w "inet" |tr -s " " |cut -d" " -f3
192.168.64.134

查出分區空間使用率的最大百分比值？

[root@centos7 data ]#df|grep "/dev/sd" |tr -s " " %|cut -d% -f5|sort -nr|head -1
16

查出用戶UID最大值的用戶名、UID及shel類型？

[root@centos7 data ]#cut -d: -f1,3,7 /etc/passwd|sort -t: -k2 -nr|head -1
nfsnobody:65534:/sbin/nologin

或
[root@centos7 data ]#sort -t: -k3 -nr /etc/passwd|head -1|cut -d: -f1,3,7
nfsnobody:65534:/sbin/nologin

查出/tmp的權限，以數字方式顯示

[root@centos7 data ]#stat /tmp |grep "Access: (" |cut -d"(" -f2 |cut -d"/" -f1
1777
[root@centos7 data ]#stat /tmp |grep "Access: (" |cut -d"(" -f2 |head -c 4
1777[root@centos7 data ]#

統計當前連接本機的每個遠程主機ip的連接數？並按照從大到小排序？

[root@centos7 data ]#netstat -nt |tr -s " " : |cut -d: -f6|sort|uniq -c|sort -nr

文本處理工具介紹

重復行 ets tailf 混合默認字母順序 window 解釋最長這篇文章主要介紹一些文本處理工具，包括less、cat、head、tail、cut基礎不牢，地動山搖 cat命令作用：顯示文本內容選項： -A：顯示所有特殊字符，包括空格、windows中的^M

文本處理工具介紹

cat命令

tac

rev

head

tail

cut

wc

sort

uniq

diff

練習

文本處理工具介紹

運維學習之sed文本處理工具

Linux Shell 文本處理工具集錦

Linux文本處理工具

【Linux相識相知】文本處理工具之grepegrepfgrep及正則表達式

基本文本處理工具

文本處理工具之grep

2.2-IO重定向，管道及文本處理工具

Linux的文本處理工具

Linux文本處理工具sed練習題

Linux文本處理工具sed

Linux學習匯總——Linux用戶組管理，文件權限管理，文本處理工具grep及egrep

2017-12-9Linux基礎知識(16)文本處理工具

第七章 linux文本處理工具

關於Linux，用戶，組，權限，文本處理工具，正則表達式，vim文本編輯器

文本處理工具學習總結

Linux文本處理工具之grep sed簡概

Linux的文本處理工具淺談-awk sed grep

強大的文本處理工具之三awk

第六章，文本處理工具和正則表達式

文本處理工具介紹

cat命令

tac

rev

head

tail

cut

wc

sort

uniq

diff

練習

相關推薦