Shell-Bash 根據日期時間統計訊息行數
需求:統計某個檔案或者特定目錄下Kafka訊息一段時間範圍內每個小時訊息條數或者每天訊息條數。
一行資料行例項:2018-03-28T03:00:02 ...
分析:
引數:1.指定檔案的檔案絕對路徑
2.指定日期或者開始日期和結束日期
3.指定小時或者開始小時和結束小時
4.指定統計維度(小時或者天)
邏輯:1.輸出邏輯 -- 檔案 + 螢幕列印
2.統計邏輯 -- grep + wc -l
3.按天範圍統計邏輯
4.按小時範圍統計邏輯
5.按統計維度選擇3/4的函式
6.檔案分類處理(單個檔案或者特定目錄下檔案)
7.對使用者輸入的日期進行判斷和初始化處理
8.對使用者輸入的小時進行判斷和初始化處理
9.輔助函式,修剪、加字首0、自定義LOGGER(檢查各函式引數)、檔名格式化
擴充套件:1.增加目錄引數,不再限定單一目錄。
2.絕對路徑和相對路徑均適用。
3.對特大檔案進行按行拆分處理後再統計。
4.相容時間戳格式的時間範圍。
Shell階段性總結:
1. Shell指令碼除錯全域性開啟,首行宣告 #!/bin/bash -xv .
2.獲取指令碼目錄, $( cd "$(dirname $0)" && pwd) .
3.引數選項設定,-o為短引數列表,-l為長引數列表,雙引號內不帶冒號的為不能帶引數值的引數,帶冒號的必須帶引數值.
OPTS=`getopt -o "s:,e:,dd:,df:,sh:,eh:,dh:,hd,h" -l "start-date:,end-date:,designed-date:,designed-file:,start-hour:,end-hour:,designed-hour:,hour-dimension,help" -- "[email protected]"`
eval set -- "$OPTS"
4.Trim函式在shell中實現,%%為右邊最長匹配去除,##為左邊最長匹配去除.
trimmed=""
function trim(){
trimmed=$1
trimmed=${trimmed%%\ }
trimmed=${trimmed##\ }
trimmed=${trimmed%%\t}
trimmed=${trimmed##\t}
}
5.數值比較用符號應使用 ((>,>=,<,<=,=)),用字母應該使用[ -eq,-le,-ge,-lt,-gt,-ne ] .
6.字串比較使用[ -z,-n,=,==,>,<],其中>,<比較字母ASCII順序.
7.獲取距離指定日期後i天的日期,轉換成特定格式字串,$(date +%Y-%m-%d --date="${from_date} + ${i} day") .
8.獲取指定日期的時間戳,$(date -d "$to_date" +%s),用於日期比較.
9.函式名變數,$FUNCNAME;函式名呼叫棧陣列 ${FUNCNAME[@]},如 another_func test_func main ;主要用於日誌定位.
10.函式中區域性變數宣告,local designed_date .
11.測試使用全部寫在USAGE的Example裡面的例子,因此Example要寫的較為全面,同時寫在Example中便於隨時補充。
12.help中USAGE變數填充,<< EOF .. EOF,告訴主shell,後續的輸入,是其他命令或者子shell的輸入,直到遇到EOF為止,再回到主shell.
read -d '' USAGE << EOF
##填充具體的使用說明
EOF
附Shell指令碼:
#!/bin/bash
read -d '' USAGE << EOF
USAGE
$0 [--start-date=yyyy-mm-dd --end-date=yyyy-mm-dd] [--designed-date=yyyy-mm-dd] [--start-hour=HH --end-hour=HH] [--designed-hour=HH] [--designed-file=absulte path] [--hour-dimension] [--help]
DESCRIPTION
This script is for count lines between start-date&start-hour and end-date&end-hour.
Or count lines between designed-date&start-hour and designed-date&end-hour.
Or count lines at designed-date&designed-hour.
Notice, this script can not count greater than 30 days.
The following arguments are optional.
--start-date
The begin of date window which lines should be counted, cooperate with --start-hour,--end-date,--end_hour.
If not specified, it is set to today by default.
--end-date
The end of date window which lines should be counted, cooperate with --start-hour,--start-date,--end_hour.
If not specified, it is set to today by default.
--start-hour
The begin of hour window which lines should be counted, cooperate with (--start-date,--end-date) or --designed-date,--end_hour.
If not specified, it is set to 0 by default.
--end-hour
The end of hour window which lines should be counted, cooperate with --start-hour,(--end-date,--end_date) or --designed-date.
If not specified, it is set to now hour by default.
--designed-date
The designed date to be counted.
If not specified, use --start-date and --end-date as default.
--designed-hour
The designed hour to be counted.
If not specified, use --start-hour and --end-hour as default.
--designed-file
The absulte path of file to be counted.
If not specified, all path under absulte path /data/rawdata/kafka/ will be checked.
--help
Display this HELP information
EXAMPLE:
$0 --designed-date=2017-10-25
$0 --designed-date=2017-10-25 --designed-hour=06 --hour-dimension
$0 --designed-date=2017-10-25 --designed-file=/data/123.txt
$0 --designed-date=2017-10-25 --designed-hour=06 --designed-file=/data/123.txt --hour-dimension
$0 --start-date=2017-10-25 --end-date=2017-10-26
$0 --start-date=2017-10-25 --end-date=2017-10-26 --start-hour=06 --end-hour=07 --hour-dimension
$0 --start-date=2017-10-25 --end-date=2017-10-26 --designed-file=/data/123.txt
$0 --start-date=2017-10-25 --end-date=2017-10-26 --start-hour=06 --end-hour=07 --designed-file=/data/123.txt --hour-dimension
EOF
start_date=""
end_date=""
designed_date=""
designed_file=""
start_hour=0
end_hour=0
designed_hour=0
hour_dimension="False"
OPTS=`getopt -o "s:,e:,dd:,df:,sh:,eh:,dh:,hd,h" -l "start-date:,end-date:,designed-date:,designed-file:,start-hour:,end-hour:,designed-hour:,hour-dimension,help" -- "[email protected]"`
eval set -- "$OPTS"
while true
do
case "$1" in
--start-date ) start_date="$2"; shift 2 ;;
--end-date ) end_date="$2"; shift 2 ;;
--designed-date ) designed_date="$2"; shift 2 ;;
--designed-file ) designed_file="$2"; shift 2 ;;
--start-hour ) start_hour="$2"; shift 2 ;;
--end-hour ) end_hour="$2"; shift 2 ;;
--designed-hour ) designed_hour="$2"; shift 2 ;;
--hour-dimension ) hour_dimension="True"; shift ;;
--help )
echo "$USAGE"
shift ;
exit 0 ;;
-- ) shift; break ;;
* ) break ;;
esac
done
now_date=$(date "+%Y-%m-%d");
now_hour=$(date "+%H");
now_time=`date "+%Y-%m-%d %H:%M:%S"`
IS_FILE_OUT=1
ALLOWED_MAX_PERIOD=30
DIR="$( cd "$( dirname "$0" )" && pwd )"
echo "Today is $now_date,now hour is $now_hour"
echo "Script directory:$DIR"
num=""
function add_prefix_zero(){
case "$1" in
[0-9]*) ;;
*) echo "Error:not a number."; exit 1;;
esac
num=$1
if [ $num -ge 0 ] && [ $num -le 9 ] ;then
num="0$num"
fi
}
trimmed=""
function trim(){
trimmed=$1
trimmed=${trimmed%%\ }
trimmed=${trimmed##\ }
trimmed=${trimmed%%\t}
trimmed=${trimmed##\t}
}
IS_INFO_OUT=0
function info(){
if [ $IS_INFO_OUT -eq 1 ] ; then
echo "[[ Function=$1 --> Info=$2 ]]"
fi
}
saved_filename=""
function format_filename(){
info "$FUNCNAME" "from_date:$1 to_date:$2 from_hour:$3 to_hour:$4"
local from_date=$1
local to_date=$2
local from_hour=$3
local to_hour=$4
if [ -z $designed_file ] ; then
saved_filename="$from_date-$from_hour--$to_date-$to_hour-`echo $now_time | md5sum | cut -d- -f 1`"
trim $saved_filename
saved_filename="$trimmed.txt"
else
file_prefilx=${designed_file##*/}
saved_filename="${file_prefilx%%\.txt}-$from_date-$from_hour--$to_date-$to_hour-`echo $now_time | md5sum | cut -d- -f 1`"
trim $saved_filename
saved_filename="$trimmed.txt"
fi
info "$FUNCNAME" "saved_filename:$saved_filename"
}
function out_sum_date_hour(){
info "$FUNCNAME" "designed_file:$1 designed_date:$2 designed_hour:$3"
local designed_file=$1
local designed_date=$2
local designed_hour=$3
local output_message=""
if [ 0 -eq $IS_FILE_OUT ] ; then
output_message="Filename:${designed_file} Date:${designed_date} ${designed_hour+Hour:$designed_hour} Count:`grep "${designed_date}${designed_hour+T$designed_hour}" ${designed_file} | wc -l`"
echo $output_message
else
output_message="Filename:${designed_file} Date:${designed_date} ${designed_hour+Hour:$designed_hour} Count:`grep "${designed_date}${designed_hour+T$designed_hour}" ${designed_file} | wc -l`"
echo $output_message
echo $output_message >> $DIR/"$saved_filename"
fi
}
function file_from_to_date(){
info "$FUNCNAME" "from_date:$1 to_date:$2 designed_file:$3"
local from_date=$1
local to_date=$2
local designed_file=$3
local to_timestamp=`date -d "$to_date" +%s`
local someday=""
local timestamp=0
for ((i=0; i<=$ALLOWED_MAX_PERIOD; i++))
do
someday=$(date +%Y-%m-%d --date="${from_date} + ${i} day")
out_sum_date_hour $designed_file $someday
timestamp=`date -d "$someday" +%s`
if [ $timestamp -eq $to_timestamp ] ; then
break
fi
done
}
function file_from_to_date_hour(){
info "$FUNCNAME" "from_date:$1 to_date:$2 from_hour:$3 to_hour:$4 designed_file:$5"
local from_date=$1
local to_date=$2
local from_hour=$3
local to_hour=$4
local designed_file=$5
local to_timestamp=`date -d "$to_date" +%s`
for ((i=0; i<=$ALLOWED_MAX_PERIOD; i++))
do
someday=$(date +%Y-%m-%d --date="${from_date} + ${i} day")
timestamp=`date -d "$someday" +%s`
for ((j=$from_hour; j<=$to_hour; j++))
do
add_prefix_zero $j
out_sum_date_hour $designed_file $someday $num
done
if [ $timestamp -eq $to_timestamp ] ; then
break
fi
done
}
function judge_date_hour(){
info "$FUNCNAME" "from_date:$1 to_date:$2 from_hour:$3 to_hour:$4 designed_file:$5"
local from_date=$1
local to_date=$2
local from_hour=$3
local to_hour=$4
local designed_file=$5
if [ "$hour_dimension" == "True" ] ; then
file_from_to_date_hour $from_date $to_date $from_hour $to_hour $designed_file
else
file_from_to_date $from_date $to_date $designed_file
fi
}
function from_to_date_hour(){
info "$FUNCNAME" "from_date:$1 to_date:$2 from_hour:$3 to_hour:$4"
local from_date=$1
local to_date=$2
local from_hour=$3
local to_hour=$4
local to_timestamp=`date -d "$to_date" +%s`
if [ ! -z $designed_file ] ; then
judge_date_hour $from_date $to_date $from_hour $to_hour $designed_file
else
cd /data/rawdata/kafka/
for file in $(ls *.txt*)
do
if [ ! -d $file ]; then
judge_date_hour $from_date $to_date $from_hour $to_hour $file
fi
done
fi
}
function judge_default_date(){
info "$FUNCNAME" "start_date:$start_date end_date:$end_date"
if [ ! -z $start_date ] || [ ! -z $end_date ] ; then
if [ -z $start_date ] ; then
start_date=$now_date
fi
if [ -z $end_date ] ; then
end_date=$now_date
fi
return 1
else
return 0
fi
}
function judge_default_hour(){
info "$FUNCNAME" "start_hour:$start_hour end_hour:$end_hour"
if [ ! 0 -eq $start_hour ] || [ ! 0 -eq $end_hour ] ; then
if [ 0 -eq $start_hour ] ; then
start_hour=0
fi
if [ 0 -eq $end_hour ] ; then
end_hour=$now_hour
fi
return 1
else
return 0
fi
}
### \$start_date and \$end_date can not coexist with \$designed_date
if [ ! -z $start_date ] || [ ! -z $end_date ] && [ ! -z $designed_date ] ; then
exit 1
fi
### \$start_hour and \$end_hour can not coexist with \$designed_hour
if [ ! 0 -eq $start_hour ] || [ ! 0 -eq $end_hour ] && [ ! 0 -eq $designed_hour ] ; then
exit 1
fi
if [ -z $start_date ] && [ -z $end_date ] && [ -z $designed_date ] ; then
start_date=$now_date
end_date=$now_date
fi
if [ 0 -eq $start_hour ] && [ 0 -eq $end_hour ] && [ 0 -eq $designed_hour ] ; then
start_hour=0
end_hour=$now_hour
fi
judge_default_date
if [ $? -eq 1 ] ; then
judge_default_hour
if [ $? -eq 1 ] ; then
format_filename $start_date $end_date $start_hour $end_hour
from_to_date_hour $start_date $end_date $start_hour $end_hour
else
format_filename $start_date $end_date $designed_hour $designed_hour
from_to_date_hour $start_date $end_date $designed_hour $designed_hour
fi
else
judge_default_hour
if [ $? -eq 1 ] ; then
format_filename $designed_date $designed_date $start_hour $end_hour
from_to_date_hour $designed_date $designed_date $start_hour $end_hour
else
format_filename $designed_date $designed_date $designed_hour $designed_hour
from_to_date_hour $designed_date $designed_date $designed_hour $designed_hour
fi
fi