shell指令碼實現多臺伺服器自動巡檢
運維服務一個專案二十多臺(或者多臺)伺服器,每天要做伺服器的效能巡檢工作是檢視伺服器的CPU、記憶體、磁碟空間是否在正常值範圍內。像這樣每天或者每個固定時間段都做相同的重複工作,為了簡化重複的工作,我寫了基於liunx伺服器的自動巡檢指令碼,在crontab中設定一個固定時間進行自動執行即可,以減少人工重複勞動的工作。
環境:
我的專案上主要伺服器是LINUX和AIX兩種伺服器,總數在30臺左右。現在的工作量是每週巡檢兩次,都是手動登入到每臺伺服器使用相應的命令檢視伺服器效能引數。
思路:
1、所有的伺服器之間的網路都是在同一個區域網內,所有網路兩兩相通。
2、在其中選擇一臺效能相對較好或者是伺服器執行壓力較小的伺服器,作為巡檢伺服器。
3、通過這一伺服器來實現對其他伺服器的巡檢,然後把巡檢結果記錄到巡檢伺服器上。
4、每臺伺服器巡檢結果都以時間和ip做命名用來區分,最後將所有巡檢結果壓縮打包。
5、每次維護人員只需要定時去取這個壓縮包檢視最後結果即可,免去了對每臺伺服器都需要登入和輸入相同的命令進行檢視。
具體實現指令碼
指令碼1
#! /bin/bash echo "start running" | tee -a LANG=en set `date` path="/home/check" echo "start running" | tee -a $path/log/$1-$2-$3.log if [ -d /home/check/result/$1-$2-$3 ]; then echo '' else mkdir -p /home/check/result/$1-$2-$3 echo `date +"%Y/%m/%d-%H:%M:%S"` "create " "$1-$2-$3" "directory success "|tee -a $path/log/$1-$2-$3.log fi echo `date +"%Y/%m/%d-%H:%M:%S"` "starting reading linuxconfig.txt " |tee -a $path/log/$1-$2-$3.log cat "$path"/config/linuxconfig.txt| while read line; do ip=`echo $line |cut -d '=' -f2` echo `date +"%Y/%m/%d-%H:%M:%S"` "check LINUX " $ip " starting " |tee -a $path/log/$1-$2-$3.log ( sleep 1 #echo account echo root sleep 1 #echo password echo root sleep 3 echo "free -k" echo "" echo "df -k" echo "" #memory_used_rate echo "ps -ef| grep java" echo "" echo "netstat -an|egrep -n '80|22|21|23|9043|9044|45331|45332|39194|19195'" echo "" #echo "ifconfig -a " echo "/sbin/ip ad" echo "" echo " tail -2000 /var/log/messages | grep -v snmp |grep -i error " echo "" echo "/bin/dmesg |grep -i error" echo "" echo "top -n1|sed -n '1,5p'" echo "exit" echo "/usr/bin/vmstat 1 3" echo "" sleep 5 )|telnet $ip >/home/check/result/$1-$2-$3/$ip-$1-$2-$3-$4.txt echo `date +"%Y/%m/%d-%H:%M:%S"` "check LINUX " $ip " end" |tee -a $path/log/$1-$2-$3.log echo "" | tee -a $path/log/$1-$2-$3.log done echo `date +"%Y/%m/%d-%H:%M:%S"` "end reading linuxconfig.txt " |tee -a $path/log/$1-$2-$3.log echo `date +"%Y/%m/%d-%H:%M:%S"` "starting reading AIXconfig.txt " | tee -a $path/log/$1-$2-$3.log cat "$path"/config/AIXconfig.txt| while read line; do ip=`echo $line |cut -d '=' -f2` echo `date +"%Y/%m/%d-%H:%M:%S"` "check IBM AIX " $ip " starting " |tee -a $path/log/$1-$2-$3.log ( sleep 1 #echo account echo root sleep 1 #echo password echo root sleep 5 echo "" #echo "df -k" echo "df -g" echo "" #memory_used_rate echo "ps -ef| grep java" echo "" echo "netstat -an|egrep -n '80|22|21|23|9043|9044|45331|45332|39194|19195'" echo "" echo "ifconfig -a" echo "" echo "topas" echo "exit" sleep 5 )|telnet $ip >/home/check/result/$1-$2-$3/$ip-$1-$2-$3-$4.txt echo `date +"%Y/%m/%d-%H:%M:%S"` "check IBM AIX " $ip " end " |tee -a $path/log/$1-$2-$3.log echo "" | tee -a $path/log/$1-$2-$3.log done echo `date +"%Y/%m/%d-%H:%M:%S"` "end reading AIXconfig.txt " | tee -a $path/log/$1-$2-$3.log zip -r /home/check/result/$1-$2-$3/$1-$2-$3.zip /home/check/result/$1-$2-$3/* echo "End running "
注意:該指令碼的巡檢是基於TELNET服務所以被檢伺服器必須開啟TELNET服務
指令碼2
#!/bin/bash #admin:spirits #***********CPU檢測************* echo "`date '+%Y年%m月%d日 %H:%M:%S'` 資料庫伺服器硬體情況開始巡檢。。。" top -bn 6 >>top grep -n "%id" top >> newtop grep -n "zombie" top >> insisttop top1=`cat newtop | awk '{print $5}' | sed -n 4p | sed 's/%//g' |sed 's/id,//g'` top2=`cat newtop | awk '{print $5}' | sed -n 5p | sed 's/%//g' |sed 's/id,//g'` top3=`cat newtop | awk '{print $5}' | sed -n 6p | sed 's/%//g' |sed 's/id,//g'` top4=`cat insisttop | awk '{print $10}' | sed -n 2p | sed 's/%//g' |sed 's/id,//g'` #echo "top4:$top4" if [ $top4 -gt 0 ] then echo "`date '+%Y年%m月%d日 %H:%M:%S'` 採集處理伺服器上出現殭屍程序,巡檢程式將自動kill該程序,如需人工確認請執行命令top後再執行ps -A -ostat,ppid,pid,cmd | grep -e '^[Zz]'來確認是否將殭屍程序殺死" >> ./newreport.txt ps -A -o stat,ppid,pid,cmd | grep -e '^[Zz]' | awk '{print $2}' | xargs kill -9 else echo "`date '+%Y年%m月%d日 %H:%M:%S'` 採集處理伺服器上無殭屍程序正常執行!" fi a=${top1:0:2} b=${top2:0:2} c=${top3:0:2} echo "top1: $a" echo "top2: $b" echo "top3: $c" if [ $a -lt 20 ]&&[ $b -lt 20 ]&&[ $c -lt 20 ] ; then echo "`date '+%Y年%m月%d日 %H:%M:%S'` 資料庫伺服器CPU佔用率不正常,top取到的值是$top1,$top2,$top3,小於參考值20,請及時處理!" >> ./newreport.txt else echo "CPU佔用率正常!" fi rm -rf top rm -rf newtop rm -rf insisttop #***************記憶體檢測*********** free1=`free -g | awk '{print $4}' | sed -n 3p | sed 's/%//g' |sed 's/t//g'` total=`free -g | awk '{print $2}' | sed -n 2p | sed 's/%//g' |sed 's/t//g'` canshu=0.2 tempd=`echo $total $canshu |awk '{print $1*$2}'` biaozhun=${tempd%.*} if [ $free1 -le $biaozhun ] ; then echo "`date '+%Y年%m月%d日 %H:%M:%S'` 資料庫伺服器記憶體佔用率過高,free -g取到的值是$free1,小於等於參考值$biaozhun,請及時處理!" >> ./newreport.txt else echo "記憶體佔用率正常!" fi #**************檔案系統巡檢********** df1=`df -h | awk '{print $5}' | sed -n 2p | sed 's/%//g'` df2=`df -h | awk '{print $5}' | sed -n 3p | sed 's/%//g'` df3=`df -h | awk '{print $5}' | sed -n 4p | sed 's/%//g'` df4=`df -h | awk '{print $5}' | sed -n 5p | sed 's/%//g'` df5=`df -h | awk '{print $5}' | sed -n 6p | sed 's/%//g'` if [ $df1 -gt 90 ]||[ $df2 -gt 90 ]||[ $df3 -gt 90 ]||[ $df4 -gt 90 ]||[ $df5 -gt 90 ] ; then echo "`date '+%Y年%m月%d日 %H:%M:%S'` 資料庫伺服器磁碟佔用率過高!df -h取到的值是$df1,$df2,$df3,$df4,$df5,參考值是90,若其中一個或一個以上大於參考值,請及時處理!" >> ./newreport.txt else echo "磁碟佔用率正常!" fi #*********************磁碟IO效能巡檢*************** iostat -x 2 5 >>iostat.txt scvtm1=" `cat iostat.txt | awk '{print $11}' | sed -n 16p | sed 's/%//g' `" scvtm2="` cat iostat.txt | awk '{print $11}' | sed -n 17p | sed 's/%//g'`" scvtm3="` cat iostat.txt | awk '{print $11}' | sed -n 18p | sed 's/%//g'`" scvtm4="` cat iostat.txt | awk '{print $11}' | sed -n 19p | sed 's/%//g'`" scvtm13="` cat iostat.txt | awk '{print $11}' | sed -n 25p | sed 's/%//g'`" scvtm6=" `cat iostat.txt | awk '{print $11}' | sed -n 26p | sed 's/%//g' `" scvtm7="` cat iostat.txt | awk '{print $11}' | sed -n 27p | sed 's/%//g'`" scvtm8="` cat iostat.txt | awk '{print $11}' | sed -n 28p | sed 's/%//g'`" scvtm9="` cat iostat.txt | awk '{print $11}' | sed -n 34p | sed 's/%//g'`" scvtm10="` cat iostat.txt | awk '{print $11}' | sed -n 35p | sed 's/%//g'`" scvtm11="` cat iostat.txt | awk '{print $11}' | sed -n 36p | sed 's/%//g'`" scvtm12="` cat iostat.txt | awk '{print $11}' | sed -n 37p | sed 's/%//g'`" util1="`cat iostat.txt | awk '{print $12}' | sed -n 16p | sed 's/%//g'`" util2="` cat iostat.txt | awk '{print $12}' | sed -n 17p | sed 's/%//g'`" util3="` cat iostat.txt | awk '{print $12}' | sed -n 18p | sed 's/%//g'`" util4="` cat iostat.txt | awk '{print $12}' | sed -n 19p | sed 's/%//g'`" util5="` cat iostat.txt | awk '{print $12}' | sed -n 25p | sed 's/%//g'`" util6=" `cat iostat.txt | awk '{print $12}' | sed -n 26p | sed 's/%//g' `" util7="` cat iostat.txt | awk '{print $12}' | sed -n 27p | sed 's/%//g'`" util8="` cat iostat.txt | awk '{print $12}' | sed -n 28p | sed 's/%//g'`" util9="` cat iostat.txt | awk '{print $12}' | sed -n 34p | sed 's/%//g'`" util10="` cat iostat.txt | awk '{print $12}' | sed -n 35p | sed 's/%//g'`" util11="` cat iostat.txt | awk '{print $12}' | sed -n 36p | sed 's/%//g'`" util12="` cat iostat.txt | awk '{print $12}' | sed -n 37p | sed 's/%//g'`" #***********1/2/3/4**************** maxa=`echo "$scvtm1 $scvtm2 $scvtm3 $scvtm4" | awk '{for(i=1;i<=NF;i++)$i>a?a=$i:a}END{print a}'` #*************13/6/7/8/************** maxb=`echo "$scvtm13 $scvtm6 $scvtm7 $scvtm8" | awk '{for(i=1;i<=NF;i++)$i>a?a=$i:a}END{print a}'` #*************************9/10/11/12****************** maxc=`echo "$scvtm9 $scvtm10 $scvtm11 $scvtm12" | awk '{for(i=1;i<=NF;i++)$i>a?a=$i:a}END{print a}'` #********************util1/2/3/4********************** maxd=`echo "$util1 $util2 $util3 $util4" | awk '{for(i=1;i<=NF;i++)$i>a?a=$i:a}END{print a}'` #**********************util5/6/7/8******************* maxe=`echo "$util5 $util6 $util7 $util8" | awk '{for(i=1;i<=NF;i++)$i>a?a=$i:a}END{print a}'` #***********************util9/10/11/12*************** maxf=`echo "$util9 $util10 $util11 $util12" | awk '{for(i=1;i<=NF;i++)$i>a?a=$i:a}END{print a}'` #******************做判斷************************ m=${maxa:0:1} n=${maxb:0:1} h=${maxc:0:1} k=${maxd:0:1} l=${maxe:0:1} o=${maxf:0:1} if [ $m -ge 15 ]&&[ $k -ge 99 ]&&[ $k -lt 100 ]$$[ $n -ge 15 ]&&[ $l -ge 99 ]&&[ $l -lt 100 ]&&[ $h -ge 15]&&[ $o -ge 99 ]&&[ $o -lt 100 ] then echo "`date '+%Y年%m月%d日 %H:%M:%S'` 資料庫伺服器磁碟IO存在瓶頸,請及時處理!" >> ./newreport.txt else echo "磁碟IO正常!" fi rm -rf ./iostat.txt #*********************************網路連通性檢測********************** network1=`ping -s 4096 -c 5 135.0.51.15 | awk '{print $6}' | sed -n 9p | sed 's/%//g' |sed 's/t//g'` if [ $network1 -gt 0 ] then echo "`date '+%Y年%m月%d日 %H:%M:%S'` 資料庫伺服器到該目標IP之間的網路不穩定,ping取到的值是$network1,大於參考值是0,系統存在風險,請及時處理!" >> ./newreport.txt else echo "網路連通性正常!" fi echo "`date '+%Y年%m月%d日 %H:%M:%S'` 資料庫伺服器硬體情況巡檢結束!"
指令碼3
運維需要了解伺服器的資源使用率可以通過指令碼檢視,多臺可以通過配合ansible進行檢視
#!/bin/bash
phy_cpu=$(cat /proc/cpuinfo | grep "physical id"|sort | uniq | wc -l)
logic_cpu_num=$(cat /proc/cpuinfo | grep "processor"| wc -l)
cpu_core_num=$(cat /proc/cpuinfo | grep "cores"|uniq|awk -F: '{print $2}')
cpu_freq=$(cat /proc/cpuinfo | grep MHz | uniq | awk -F: '{print $2}')
system_core=$(uname -r)
system_version=$(cat /etc/redhat-release)
system_hostname=$(hostname | awk '{print $1}')
systemc_envirement_variables=$(env | grep PATH)
mem_free=$(grep MemFree /proc/meminfo)
disk_usage=$(df -h)
system_uptime=$(uptime)
system_load=$(cat /proc/loadavg)
system_ip=$(ifconfig | grep "inet"|grep -v "127.0.0.1"|awk -F: '{print $1}'|awk 'NR==1{print}'| awk '{print $2}') #自己改的
mem_info=$(/usr/sbin/dmidecode | grep -A 16 "Memory Device"|grep -E "Size|Locator"|grep -v Bank)
mem_total=$(grep MemTotal /proc/meminfo)
day01=$(date +%Y)
day02=$(date +%m)
day03=$(date +%d)
path=inspection.txt
echo -e " " > $path
echo -e $day01年$day02月$day03系統巡檢報告 >> $path
echo -e 主機名:"\t"$system_hostname >> $path
echo -e 伺服器IP: "\t"$system_ip >> $path
echo -e 系統核心: "\t"$system_core >> $path
echo -e 作業系統版本:"\t"$system_version >> $path
echo -e 磁碟使用情況: "\t""\t" $disk_usage >> $path
echo -e CPU核數:"\t"$cpu_core_num >> $path
echo -e 物理CPU個數:"\t"$phy_cpu >> $path
echo -e 邏輯CPU個數:"\t"$logic_cpu_num >> $path
echo -e 系統環境變數:"\t"$systemc_envirement_variables >> $path
echo -e CPU的主頻:"\t"$cpu_freq >> $path
echo -e 記憶體簡要資訊:"\t"$mem_info >> $path
echo -e 記憶體總大小:"\t"$mem_total >> $path
echo -e 記憶體空間: "\t"$mem_free >> $path
echo -e 時間/系統執行時間/當前登陸使用者/系統過去1分鐘/5分鐘/15分鐘內平均負載/"\t"$system_uptime >> $path
echo -e 1分鐘/5分鐘/15分鐘平均負載/在取樣時刻,執行任務的數目/系統活躍任務的個數/最大的pid值執行緒/ "\t"$system_load >> $path
指令碼4
#!/bin/bash
#set -x
2012-02-25
#version: 2.0
export LC_ALL="en_US.UTF-8"
server_info(){
echo ====================================================
#echo ======Time======
#date
echo ======1 hostname======
/bin/hostname
echo ======2 IP MASK======
/sbin/ifconfig eth0|grep "inet addr:"|awk '{print $2,"/ "$4}'
echo ======3 Gateway======
cat /etc/sysconfig/network|grep GATEWAY|awk -F "=" '{print $2}'
echo ======4 Product Name======
dmidecode | grep -A10 "System Information$" |grep "Product Name:"|awk '{print $3,$4,$5}'
##echo ======Host SN======
##dmidecode | grep -A10 "System Information$" |grep "Serial Number:"|awk '{print "SN:",$3}'
echo ======5 CPU ======
cat /proc/cpuinfo|grep "name"|cut -d: -f2 |awk '{print "*"$1,$2,$3,$4}'|uniq -c
echo ======6 Physical memory number======
dmidecode | grep -A 16 "Memory Device$" |grep Size:|grep -v "No Module Installed"|awk '{print "*" $2,$3}'|uniq -c
echo ======7 System version ======
cat /etc/issue | head -1
echo =========================================================
}
OS_info(){
echo ==========================================================
echo ======1 kernel version ======
uname -a
echo ======2 running day ======
/usr/bin/uptime |awk '{print $3,$4}'
echo ==========================================================
}
performance_info(){
echo ==========================================================
echo ======1 CPU used ======
top -n 1 |grep C[Pp][Uu] |grep id|awk '{print $5}'|awk -F "%" '{print $1}'
#cpu_total=$(cat /proc/stat | grep 'cpu ' | awk '{print $2+$3+$4+$5+$6+$7+$8}')
#cpu_idle=$(cat /proc/stat | grep 'cpu ' |awk '{print $5}')
#cpu_use=`expr 100-"$cpu_idle/$cpu_total*100"|bc -l`
#echo $cpu_total
#echo $cpu_idle
#echo $cpu_use
echo ======2 memory used ======
#free -m |grep Mem|awk '{print $2,$3}'
mem_total=$(free -m |grep Mem|awk '{print $2}')
mem_used=$(free -m |grep Mem|awk '{print $3}')
mem_rate=`expr $mem_used/$mem_total*100|bc -l`
echo $mem_rate
echo ======3 swap used ======
#free -m |grep Swap|awk '{print $2,$3}'
Swap_total=$(free -m |grep Swap|awk '{print $2}')
Swap_used=$(free -m |grep Swap|awk '{print $3}')
Swap_rate=`expr $Swap_used/$Swap_total*100|bc -l`
echo $Swap_rate
echo ======4 top pic ======
top -b -n 1|head -25
echo ==========================================================
}
sec_info(){
echo ======1 user load ======
w
echo ======2 file used ======
df -ah
echo ======3 demsg error======
dmesg |grep fail
dmesg |grep error
echo ======4 demsg error======
lastlog
}
system_hardware_config(){
echo ===========================disk====================================
df -H |awk "{OFS=\"\t\"}{ print \$1,\$2,\$3,\$4,\$5,\$6}"
echo ===========================free====================================
free |head -1 |awk "{OFS=\"\t\"} {print \$1,\$2,\$6}"
free -m |awk "BEGIN{OFS=\"\t\"}{if (NR==2 ||NR==4 )print \$2,\$3,\$7}"
}
server_info>>$(/bin/hostname)-`date +%F`
OS_info>>$(/bin/hostname)-`date +%F`
performance_info>>$(/bin/hostname)-`date +%F`
sec_info>>$(/bin/hostname)-`date +%F`
echo "run Ok"
日常LINUX巡檢命令
hostname
uname -a
netstat -rn
ifconfig -a
cat /etc/sysconfig/hwconf
cat /proc/meminfo
cat /proc/cpuinfo
cat /proc/swaps
sfdisk -g
df –k
sfdisk –g
dmesg
more /var/log/boot.log
more /var/log/messages
linux伺服器的日常巡檢指令碼
1、需巡檢的伺服器上定時執行:
#!/bin/sh
echo "------------ daily check begin -----------------" >>dc1.txt
#cd /home/wjlcn/monitor/check
cd /home/wjlcn/monitor/check/
date=`date +%c`
filename=`hostname`_check_`date +%Y%m%d`.txt
echo "-----------sar -ru 10 3----------------" >>dc1.txt
sar -ru 10 3 |sed -n '21,25p' >>dc1.txt
echo "------------top -d 1 -n 1 -------------" >>dc1.txt
/usr/bin/top -b -d 1 -n 1 |sed -n '1,10p' |awk '{print $9,$12}' >top1.txt
sed '1,7d' top1.txt >>dc1.txt
echo "------------free -m ----------------" >>dc1.txt
free -m >>dc1.txt
echo "--------------df -h ---------------" >>dc1.txt
df -h >>dc1.txt
echo "---------- tripwire --check ----------">> dc1.txt
/usr/sbin/tripwire --check|sed -n '10p;18p;33,37p' >>dc1.txt
echo $date >>$filename
cat dc1.txt >>$filename
echo $date >>$filename
echo "--------------- the end ---------------" >>$filename
rm dc1.txt top1.txt
2、定時上傳至ftp伺服器
# 這樣就只需在ftp伺服器上巡檢所有的伺服器即可
#!/bin/sh
cd /home/itownet/monitor/check
LOFFILE=ftp.log
ftp -n >>$LOFFILE <<EOF
open IP
user user password
binary
cd test/pcreport
put *.txt
bye
EOF
檔案說明
該Shell指令碼旨在針對大量Linux伺服器的巡檢提供一種相對自動化的解決方案。指令碼組成有三部分:shellsh.sh、checksh.sh、file.txt;這三個檔案需放在一個資料夾下以root許可權執行,缺一不可。
指令碼用法:
將要巡檢的伺服器的IP地址和對應的密碼全部放入file.txt中儲存,每行一個IP對應一個密碼即可。然後用如下命令執行:
./ shellsh.sh file.txt192.168.182.143 123456
其中file.txt可以更換檔名,192.168.182.143為你想儲存巡檢日誌的到哪個伺服器的伺服器IP,123456為該伺服器的密碼。
執行結果:
執行完後會在192.168.182.143伺服器的/tmp目錄下升成一個目錄,即:GatherLogDirectory這個目錄下即存放的是被巡檢的伺服器的巡檢日誌,這些日誌以被巡檢的伺服器的IP命名,形如:192.168.182.146.log。在被巡檢的伺服器上會升成兩個目錄即:CheckScript、 LocalServerLogDirectory;其中CheckScript中是checksh.sh指令碼,LocalServerLogDirectory中存放的是checksh.sh在該伺服器上執行後升成的日誌。
測試結果:
我只在虛擬機器上的三臺Linux系統上測試過,分別是Ubuntu、RedHat、Kali。執行正常,平均巡檢一個伺服器花費3分鐘。
cat shellsh.sh
#!/bin/bash
login_info=$1
gather_server_ip=$2
gather_server_password=$3
grep_ip=`ifconfig | grep '\([[:digit:]]\{1,3\}\.\)\{3\}[[:digit:]]\{1,3\}' --color=auto -o | sed -e '2,5d'`
GatherPath="/tmp/GatherLogDirectory"
CheckScriptPath="/tmp/CheckScript"
if [ $# -ne 3 ]; then
echo -e "Parameters if fault!\n"
echo -e "Please using:$0 login_info gather_server_ip\n"
echo -e "For example: $0 IpAndPassword.txt $grep_ip\n"
exit;
fi
if [ ! -x "$GatherPath" ];then
mkdir "$GatherPath"
echo -e "The log's path is: $GatherPath"
fi
cat $login_info | while read line
do
server_ip=`echo $line|awk '{print $1}'`
server_password=`echo $line|awk '{print $2}'`
login_server_command="ssh -o StrictHostKeyChecking=no root@$server_ip"
scp_gather_server_checksh="scp checksh.sh root@$server_ip:$CheckScriptPath"
/usr/bin/expect<<EOF
set timeout 20
spawn $login_server_command
expect {
"*yes/no" { send "yes\r"; exp_continue }
"*password:" { send "$server_password\r" }
}
expect "Permission denied, please try again." {exit}
expect "#" { send "mkdir $CheckScriptPath\r"}
expect eof
exit
EOF
/usr/bin/expect<<EOF
set timeout 20
spawn $scp_gather_server_checksh
expect {
"*yes/no" { send "yes\r"; exp_continue }
"*password:" { send "$server_password\r" }
}
expect "Permission denied, please try again." {exit}
expect "Connection refused" {exit}
expect "100%"
expect eof
exit
EOF
/usr/bin/expect<<EOF
set timeout 60
spawn $login_server_command
expect {
"*yes/no" { send "yes\r"; exp_continue }
"*password:" { send "$server_password\r" }
}
expect "Permission denied, please try again." {exit}
expect "#" { send "cd $CheckScriptPath;./checksh.sh $gather_server_ip $gather_server_password\r"}
expect eof
exit
EOF
done
cat checksh.sh
#!/bin/bash
########################################################################################
#Function:
#This script checks the system's information,disks's information,performance,etc...of the
#server
#
#Author:
#By Jack Wang
#
#Company:
#ShaanXi Great Wall Information Co.,Ltd.
########################################################################################
########################################################################################
#
#GatherServerIpAddress is the server's IP address that gather the checking log
#GatherServerPassword is the server's IP address that gather the checking log
#
########################################################################################
GatherServerIpAddress=$1
GatherServerPassword=$2
########################################################################################
#GetTheIpCommand is a command that you can get the IP address
########################################################################################
GetTheIpCommand=`ifconfig | grep '\([[:digit:]]\{1,3\}\.\)\{3\}[[:digit:]]\{1,3\}' --color=auto -o | sed -e '2,5d'`
########################################################################################
#LogName is a command that Your logs'name
########################################################################################
LogName=`ifconfig|grep '\([[:digit:]]\{1,3\}\.\)\{3\}[[:digit:]]\{1,3\}' --color=auto -o|sed -e '2,5d'``echo "-"``date +%Y%M%d`
########################################################################################
#
#GatherLogPath is a path that collecting log path
#LocalServerLogPath is local log path
#
########################################################################################
GatherServerLogPath="/tmp/GatherLogDirectory"
LocalServerLogPath="/tmp/LocalServerLogDirectory"
########################################################################################
#LinuxOsInformation is function that usege to collect OS's information
########################################################################################
LinuxOsInformation(){
Hostname=`hostname`
UnameA=`uname -a`
OsVersion=`cat /etc/issue | sed '2,4d'`
Uptime=`uptime|awk '{print $3}'|awk -F "," '{print $1}'`
ServerIp=`ifconfig|grep "inet"|sed '2,4d'|awk -F ":" '{print $2}'|awk '{print $1}'`
ServerNetMask=`ifconfig|grep "inet"|sed '2,4d'|awk -F ":" '{print $4}'|awk '{print $1}'`
ServerGateWay=`netstat -r|grep "default"|awk '{print $2}'`
SigleMemoryCapacity=`dmidecode|grep -P -A5 "Memory\s+Device"|grep "Size"|grep -v "Range"|grep '[0-9]'|awk -F ":" '{print $2}'|sed 's/^[ \t]*//g'`
MaximumMemoryCapacity=`dmidecode -t 16|grep "Maximum Capacity"|awk -F ":" '{print $2}'|sed 's/^[ \t]*//g'`
NumberOfMemorySlots=`dmidecode -t 16|grep "Number Of Devices"|awk -F ":" '{print $2}'|sed 's/^[ \t]*//g'`
MemoryTotal=`cat /proc/meminfo|grep "MemTotal"|awk '{printf("MemTotal:%1.0fGB\n",$2/1024/1024)}'|awk -F ":" '{print $2}'`
PhysicalMemoryNumber=`dmidecode|grep -A16 "Memory Device"|grep "Size:"|grep -v "No Module Installed"|grep -v "Range Size:"|wc -l`
ProductName=`dmidecode|grep -A10 "System Information"|grep "Product Name"|awk -F ":" '{print $2}'|sed 's/^[ \t]*//g'`
SystemCPUInfomation=`cat /proc/cpuinfo|grep "name"|cut -d: -f2|awk '{print "*"$1,$2,$3,$4}'|uniq -c|sed 's/^[ \t]*//g'`
echo -e "Hostname|$Hostname\nUnamea|$UnameA\nOsVersion|$OsVersion\nUptime|$Uptime\nServerIp|$ServerIp\nServerNetMask|$ServerNetMask\nServerGateWay|$ServerGateWay\nSigleMemoryCapacity|$SigleMemoryCapacity\nMaximumMemoryCapacity|$MaximumMemoryCapacity\nNumberOfMemorySlots|$NumberOfMemorySlots\nMemoryTotal|$MemoryTotal\nPhysicalMemoryNumber|$PhysicalMemoryNumber\nProductName|$ProductName\nSystemCPUInformation|$SystemCPUInfomation"
}
PerformanceInfomation (){
CPUIdle=`top -d 2 -n 1 -b|grep C[Pp][Uu]|grep id|awk '{print $5}'|awk -F "%" '{print $1}'`
CPUloadAverage=`top -d 2 -n 1 -b|grep "load average:"|awk -F ":" '{print $5}'|sed 's/^[ \t]*//g'`
ProcessNumbers=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $3}'`
Proce***unning=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $8}'`
ProcessSleeping=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $11}'`
ProcessStoping=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $16}'`
ProcessZombie=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $21}'`
UserSpaceCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $4}'`
SystemSpaceCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $8}'`
ChangePriorityCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $12}'`
WaitingCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $19}'`
HardwareIRQCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $23}'`
SoftwareIRQCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $27}'`
MemUsed=`top -d 2 -n 1 -b|grep "Mem"|awk -F "[: ,]" '{print $11}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
MemFreeP=`top -d 2 -n 1 -b|grep "Mem"|awk -F "[: ,]" '{print $16}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
MemBuffersP=` top -d 2 -n 1 -b|grep "Mem"|awk -F "[: ,]" '{print $22}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
CacheCachedP=`top -d 2 -n 1 -b|grep "Swap"|awk -F "[: ,]" '{print $24}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
CacheTotal=`top -d 2 -n 1 -b|grep "Swap"|awk -F "[: ,]" '{print $4}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
CacheUsed=`top -d 2 -n 1 -b|grep "Swap"|awk -F "[: ,]" '{print $14}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
CacheFree=`top -d 2 -n 1 -b|grep "Swap"|awk -F "[: ,]" '{print $18}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
echo -e "CPUIdle|$CPUIdle\nCPUloadAverage|$CPUloadAverage\nProcessNumbers|$ProcessNumbers\nProce***unning|$Proce***unning\nProcessSleeping|$ProcessSleeping\nProcessStoping|$ProcessStoping\nProcessZombie|$ProcessZombie\nUserSpaceCPU|$UserSpaceCPU\nSystemSpaceCPU|$SystemSpaceCPU\nChangePriorityCPU|$ChangePriorityCPU\nWaitingCPU|$WaitingCPU\nHardwareIRQCPU|$HardwareIRQCPU\nSoftwareIRQCPU|$SoftwareIRQCPU\nMemUsed|$MemUsed\nMemFreeP|$MemFreeP\nMemBuffersP|$MemBuffersP\nCacheCachedP|$CacheCachedP\nCacheTotal|$CacheTotal\nCacheUsed|$CacheUsed\nCacheFree|$CacheFree\n"
}
OprateSystemSec () {
echo '======================UserLogin======================'
w
echo '======================FileUsed======================='
df -ah
echo '======================dmesgError====================='
dmesg | grep error
echo '======================dmesgFail======================'
dmesg | grep Fail
echo '======================BootLog========================'
more /var/log/boot.log | grep -V "OK" | sed '1,6d'
echo '======================route -n======================='
route -n
echo '======================iptables -L===================='
iptables -L
echo '======================netstat -lntp=================='
netstat -lntp
echo '======================netstat -antp=================='
netstat -antp
echo '======================BootLog========================'
netstat -s
echo '======================netstat -s====================='
last
echo '======================du -sh /etc/==================='
du -sh /etc/
echo '======================du -sh /boot/=================='
du -sh /boot/
echo '======================du -sh /dev/==================='
du -sh /dev/
echo '======================df -h=========================='
df -h
echo '======================mount | column -t=============='
mount | column -t
}
TopAndVmstat(){
top -d 2 -n 1 -b
vmstat 1 10
}
CheckGatherLog(){
if [ -f "$LocalServerLogPath/$GetTheIpCommand.log" ];then
rm -rf $LocalServerLogPath/$GetTheIpCommand.log
fi
if [ ! -x "$LocalServerLogPath" ];then
mkdir "$LocalServerLogPath"
fi
if [ ! -f "$LocalServerLogPath/$GetTheIpCommand.log" ];then
touch $LocalServerLogPath/$GetTheIpCommand.log
LinuxOsInformation>>$LocalServerLogPath/$GetTheIpCommand.log
PerformanceInfomation>>$LocalServerLogPath/$GetTheIpCommand.log
OprateSystemSec>>$LocalServerLogPath/$GetTheIpCommand.log
TopAndVmstat>>$LocalServerLogPath/$GetTheIpCommand.log
fi
}
CheckGatherLog
SCP_LOG_TO_GATHER_SERVER="scp $LocalServerLogPath/$GetTheIpCommand.log root@$GatherServerIpAddress:$GatherServerLogPath"
/usr/bin/expect<<EOF
set timeout 50
spawn $SCP_LOG_TO_GATHER_SERVER
expect {
"*yes/no)?"
{
send "yes\n"
"*password:*" {send "GatherServerPassword\n"}
}
"*password:"
{
send "$GatherServerPassword\n"
}
}
expect "*password:" { send "$GatherServerPassword\n" }
expect "100%"
expect eof
EOF
# file.txt內容形式
cat file.txt
192.168.182.143 123456
192.168.182.129 123456
192.168.182.146 123456
注:192.168.182.143是被巡檢的伺服器ip,123456是被巡檢的伺服器密碼。
cat check_linux.sh
#!/bin/bash
check_process(){
tolprocess=`ps auxf|grep DisplayMa[nager]|wc -l`
#if [ "$tolprocess" -lt "1" ];then
if [ "$tolprocess" -ge "1" ];then
echo 'process ok'
else
echo 'fail'
fi
}
check_log(){
if [ -e /etc/syslog-ng/syslog-ng.conf ];then
conlog=`cat '/etc/syslog-ng/syslog-ng.conf'|grep "10.70.72.253"|wc -l`
if [ "$conlog" -ge "1" ];then
echo 'syslog-ng ok'
fi
elif [ -e /etc/syslog.conf ];then
conlog=`cat '/etc/syslog.conf'|grep "10.70.72.253"|wc -l`
if [ "$conlog" -ge "1" ];then
echo 'syslog ok'
fi
else
echo 'log not find or error'
fi
}
check_cpuidle(){
mincpu=`sar -u 2 10|grep all|awk '{print $NF}'|sort -nr|tail -1`
if [ $(echo "${mincpu} < 20" | bc) = 1 ];then
#if [ "$mincpu" -le "20" ];then
echo 'cpu idle is less than 20% ,please check'
else
echo 'cpu idle is more than 20%, it is ok '
fi
}
check_mem(){
vmstat 2 10
}
check_disk(){
chkdsk=`fdisk -l|egrep 'failed|unsynced|unavailable'|wc -l`
if [ "$chkdsk" -ge "1" ];then
echo 'fdisk check ok '
else
echo 'fdisk check find error,please check your disk '
fi
}
check_io(){
util=`sar -d 2 10|egrep -v 'x86|^$|await'|awk '{print $NF}'|sort -nr|tail -1`
await=`sar -d 2 10|egrep -v 'x86|^$|await'|awk '{print $(NF-2)}'|sort -nr|tail -1`
if [ $(echo "${util} < 80" | bc) = 1 ] && [ $(echo "${await} < 100" | bc) = 1 ] ;then
echo 'disk io check is fine'
else
echo 'disk io use too high '
fi
}
check_swap(){
tolswap=`cat /proc/meminfo|grep SwapTotal|awk '{print $2}'`
#awk '/SwapTotal/{total=$2}/SwapFree/{free=$2}END{print (total-free)/1024}' /proc/meminfo
useswap=`awk '/SwapTotal/{total=$2}/SwapFree/{free=$2}END{print (total-free)}' /proc/meminfo `
util=`awk 'BEGIN{printf "%.1f\n",'$useswap'/'$tolswap'}'`
if [ $(echo "${util} < 0.3" | bc) = 1 ] || [ $(echo "${useswap} < 1024" | bc) = 1 ] ;then
echo 'swap use is ok '
else
echo "useswap: $useswap kb, swap util is $util"
fi
}
check_dmesg(){
chkdm=`dmesg |egrep 'scsi reset|file system full'|wc -l`
if [ "$chkdm" -ge "1" ];then
echo 'dmesg test ok '
else
echo 'dmesg check find error '
fi
}
check_boot(){
chkdm=`cat /var/log/boot.msg|egrep 'scsi reset|file system full'|wc -l`
if [ "$chkdm" -ge "1" ];then
echo 'boot check fine '
else
echo 'boot check find error '
fi
}
check_inode(){
maxinode=`df -i|awk '{print $5}'|egrep -v 'IUse|-' |sed 's/%//g'|sort -nr|head -1`
if [ $(echo "${maxinode} < 80" | bc) = 1 ];then
echo 'inode check ok '
else
echo 'inode used more than 80% '
fi
}
check_df(){
dfuse=`df -HT|awk '{print $6}'|grep -v Use|sed 's/%//g'|sort -nr|head -1`
if [ $(echo "${dfuse} < 80" | bc) = 1 ];then
echo 'disk used is less than 80% ,it is ok !'
elif [ $(echo "${dfuse} > 80" | bc) = 1 ] && [ $(echo "${dfuse} < 90" | bc) = 1 ];then
echo 'warning , disk used more than 80% and less than 90% '
else
echo ' Critical, disk used more than 90% '
fi
}
echo '################### check process ###################'
check_process
echo '################### check syslog ####################'
check_log
echo '################### check cpuidle ###################'
check_cpuidle
echo '################### echo memory stat ################'
check_mem
echo '################### check fdisk #####################'
check_disk
echo '################### check io used ###################'
check_io
echo '################### check swap used #################'
check_swap
echo '################### check dmesg #####################'
check_dmesg
echo '################### check inode #####################'
check_inode
echo '################### check disk used #################'
check_df