1. 程式人生 > >Shell-Bash 根據日期時間統計訊息行數

Shell-Bash 根據日期時間統計訊息行數

需求:統計某個檔案或者特定目錄下Kafka訊息一段時間範圍內每個小時訊息條數或者每天訊息條數。

一行資料行例項:2018-03-28T03:00:02 ...

分析:

 引數:1.指定檔案的檔案絕對路徑

           2.指定日期或者開始日期和結束日期

           3.指定小時或者開始小時和結束小時

           4.指定統計維度(小時或者天)

邏輯:1.輸出邏輯 -- 檔案 + 螢幕列印

          2.統計邏輯 -- grep  + wc -l

          3.按天範圍統計邏輯

          4.按小時範圍統計邏輯

          5.按統計維度選擇3/4的函式

          6.檔案分類處理(單個檔案或者特定目錄下檔案)

          7.對使用者輸入的日期進行判斷和初始化處理

          8.對使用者輸入的小時進行判斷和初始化處理

          9.輔助函式,修剪、加字首0、自定義LOGGER(檢查各函式引數)、檔名格式化

擴充套件:1.增加目錄引數,不再限定單一目錄。

          2.絕對路徑和相對路徑均適用。

          3.對特大檔案進行按行拆分處理後再統計。

          4.相容時間戳格式的時間範圍。

Shell階段性總結:

1. Shell指令碼除錯全域性開啟,首行宣告 #!/bin/bash -xv .

2.獲取指令碼目錄, $( cd "$(dirname $0)" && pwd) .

3.引數選項設定,-o為短引數列表,-l為長引數列表,雙引號內不帶冒號的為不能帶引數值的引數,帶冒號的必須帶引數值.

OPTS=`getopt -o "s:,e:,dd:,df:,sh:,eh:,dh:,hd,h"  -l "start-date:,end-date:,designed-date:,designed-file:,start-hour:,end-hour:,designed-hour:,hour-dimension,help" -- "[email protected]"`

eval set -- "$OPTS"

4.Trim函式在shell中實現,%%為右邊最長匹配去除,##為左邊最長匹配去除.

trimmed=""
function trim(){
    trimmed=$1
    trimmed=${trimmed%%\ }
    trimmed=${trimmed##\ }
trimmed=${trimmed%%\t}
    trimmed=${trimmed##\t}

}

5.數值比較用符號應使用 ((>,>=,<,<=,=)),用字母應該使用[ -eq,-le,-ge,-lt,-gt,-ne ] .

6.字串比較使用[ -z,-n,=,==,>,<],其中>,<比較字母ASCII順序.

7.獲取距離指定日期後i天的日期,轉換成特定格式字串,$(date +%Y-%m-%d --date="${from_date} + ${i} day") .

8.獲取指定日期的時間戳,$(date -d "$to_date" +%s),用於日期比較.

9.函式名變數,$FUNCNAME;函式名呼叫棧陣列 ${FUNCNAME[@]},如 another_func test_func main ;主要用於日誌定位.

10.函式中區域性變數宣告,local designed_date .

11.測試使用全部寫在USAGE的Example裡面的例子,因此Example要寫的較為全面,同時寫在Example中便於隨時補充。

12.help中USAGE變數填充,<< EOF .. EOF,告訴主shell,後續的輸入,是其他命令或者子shell的輸入,直到遇到EOF為止,再回到主shell.

read -d '' USAGE << EOF
##填充具體的使用說明

EOF

附Shell指令碼:

#!/bin/bash
read -d '' USAGE << EOF
USAGE
    $0 [--start-date=yyyy-mm-dd --end-date=yyyy-mm-dd] [--designed-date=yyyy-mm-dd] [--start-hour=HH --end-hour=HH] [--designed-hour=HH] [--designed-file=absulte path] [--hour-dimension] [--help]


DESCRIPTION
    This script is for count lines between start-date&start-hour and end-date&end-hour. 
     Or count lines between designed-date&start-hour and designed-date&end-hour.
     Or count lines at designed-date&designed-hour.
    Notice, this script can not  count greater than 30 days.
    The following arguments are optional.


    --start-date
        The begin of date window which lines should be counted, cooperate with --start-hour,--end-date,--end_hour.
        If not specified, it is set to today by default.


    --end-date
        The end of date window which lines should be counted, cooperate with --start-hour,--start-date,--end_hour.
        If not specified, it is set to today by default.
        
    --start-hour
        The begin of hour window which lines should be counted, cooperate with (--start-date,--end-date) or --designed-date,--end_hour.
        If not specified, it is set to 0 by default.
    --end-hour
        The end of hour window which lines should be counted, cooperate with --start-hour,(--end-date,--end_date) or --designed-date.
        If not specified, it is set to now hour by default.
    --designed-date
        The designed date to be counted.
        If not specified, use --start-date and --end-date as default.
    --designed-hour
        The designed hour to be counted.
        If not specified, use --start-hour and --end-hour as default.
    --designed-file
        The absulte path of file to be counted.
        If not specified, all path under absulte path /data/rawdata/kafka/ will be checked.
    --help
        Display this HELP information


EXAMPLE:
    $0 --designed-date=2017-10-25
    $0 --designed-date=2017-10-25 --designed-hour=06 --hour-dimension
    $0 --designed-date=2017-10-25 --designed-file=/data/123.txt
    $0 --designed-date=2017-10-25 --designed-hour=06 --designed-file=/data/123.txt --hour-dimension
    $0 --start-date=2017-10-25 --end-date=2017-10-26
    $0 --start-date=2017-10-25 --end-date=2017-10-26 --start-hour=06  --end-hour=07  --hour-dimension
    $0 --start-date=2017-10-25 --end-date=2017-10-26 --designed-file=/data/123.txt
  $0 --start-date=2017-10-25 --end-date=2017-10-26 --start-hour=06  --end-hour=07  --designed-file=/data/123.txt --hour-dimension
EOF


start_date=""
end_date=""
designed_date=""
designed_file=""
start_hour=0
end_hour=0
designed_hour=0
hour_dimension="False"


OPTS=`getopt -o "s:,e:,dd:,df:,sh:,eh:,dh:,hd,h"  -l "start-date:,end-date:,designed-date:,designed-file:,start-hour:,end-hour:,designed-hour:,hour-dimension,help" -- "[email protected]"`
eval set -- "$OPTS"


while true
do
    case "$1" in
        --start-date ) start_date="$2"; shift 2 ;;
        --end-date ) end_date="$2"; shift 2 ;;
        --designed-date ) designed_date="$2"; shift 2 ;;
        --designed-file ) designed_file="$2"; shift 2 ;;
        --start-hour ) start_hour="$2"; shift 2 ;;
        --end-hour ) end_hour="$2"; shift 2 ;;
        --designed-hour ) designed_hour="$2"; shift 2 ;;
        --hour-dimension ) hour_dimension="True"; shift ;;
        --help )
            echo "$USAGE"
            shift ;
            exit 0 ;;
        -- ) shift; break ;;
        * ) break ;;
    esac
done




now_date=$(date "+%Y-%m-%d");
now_hour=$(date "+%H");
now_time=`date "+%Y-%m-%d %H:%M:%S"`
IS_FILE_OUT=1
ALLOWED_MAX_PERIOD=30
DIR="$( cd "$( dirname "$0"  )" && pwd  )"
echo "Today is $now_date,now hour is $now_hour"
echo "Script directory:$DIR"


num=""
function add_prefix_zero(){
    case "$1" in
        [0-9]*) ;;
        *)      echo "Error:not a number."; exit 1;;
    esac
    num=$1
    if [ $num -ge 0 ] && [ $num -le 9 ] ;then
        num="0$num"
    fi
}
trimmed=""
function trim(){
    trimmed=$1
    trimmed=${trimmed%%\ }
    trimmed=${trimmed##\ }
trimmed=${trimmed%%\t}
    trimmed=${trimmed##\t}
}


IS_INFO_OUT=0
function info(){
    if [ $IS_INFO_OUT -eq 1 ] ; then
        echo "[[ Function=$1 --> Info=$2 ]]"
fi
}


saved_filename=""
function format_filename(){
    info "$FUNCNAME" "from_date:$1 to_date:$2 from_hour:$3 to_hour:$4"
    local from_date=$1
    local to_date=$2
    local from_hour=$3
    local to_hour=$4
    if [ -z $designed_file ] ; then
        saved_filename="$from_date-$from_hour--$to_date-$to_hour-`echo $now_time | md5sum | cut -d- -f 1`"
trim $saved_filename
        saved_filename="$trimmed.txt"
    else
   file_prefilx=${designed_file##*/}
        saved_filename="${file_prefilx%%\.txt}-$from_date-$from_hour--$to_date-$to_hour-`echo $now_time | md5sum | cut -d- -f 1`"
trim $saved_filename
        saved_filename="$trimmed.txt"
    fi
info "$FUNCNAME" "saved_filename:$saved_filename"  
}




function out_sum_date_hour(){
    info "$FUNCNAME" "designed_file:$1 designed_date:$2 designed_hour:$3"
    local designed_file=$1
    local designed_date=$2
    local designed_hour=$3
local output_message=""
    if [ 0 -eq $IS_FILE_OUT ] ; then
   output_message="Filename:${designed_file}  Date:${designed_date}  ${designed_hour+Hour:$designed_hour} Count:`grep "${designed_date}${designed_hour+T$designed_hour}" ${designed_file} | wc -l`"
        echo $output_message
    else 
   output_message="Filename:${designed_file}  Date:${designed_date}  ${designed_hour+Hour:$designed_hour} Count:`grep "${designed_date}${designed_hour+T$designed_hour}" ${designed_file} | wc -l`"
   echo $output_message
        echo $output_message >> $DIR/"$saved_filename"
    fi     
}


function file_from_to_date(){
    info "$FUNCNAME" "from_date:$1 to_date:$2 designed_file:$3"
    local from_date=$1
    local to_date=$2
    local designed_file=$3
    local to_timestamp=`date -d "$to_date" +%s`
    local someday=""
    local timestamp=0
    for ((i=0; i<=$ALLOWED_MAX_PERIOD; i++))
    do
        someday=$(date +%Y-%m-%d --date="${from_date} + ${i} day")
        
        out_sum_date_hour $designed_file $someday
        
        timestamp=`date -d "$someday" +%s`
        if [ $timestamp -eq $to_timestamp ] ; then
            break
        fi   
    done
}


function file_from_to_date_hour(){
    info "$FUNCNAME" "from_date:$1 to_date:$2 from_hour:$3 to_hour:$4 designed_file:$5"
    local from_date=$1
    local to_date=$2
    local from_hour=$3
    local to_hour=$4
    local designed_file=$5
    local to_timestamp=`date -d "$to_date" +%s`
    for ((i=0; i<=$ALLOWED_MAX_PERIOD; i++))
    do
        someday=$(date +%Y-%m-%d --date="${from_date} + ${i} day")
        timestamp=`date -d "$someday" +%s`
        for ((j=$from_hour; j<=$to_hour; j++))
        do
            add_prefix_zero $j
            out_sum_date_hour $designed_file $someday $num
        done
        if [ $timestamp -eq $to_timestamp ] ; then
            break
        fi          
    done
}


function judge_date_hour(){
    info "$FUNCNAME" "from_date:$1 to_date:$2 from_hour:$3 to_hour:$4 designed_file:$5"
    local from_date=$1
    local to_date=$2
    local from_hour=$3
    local to_hour=$4
    local designed_file=$5
    if [ "$hour_dimension" == "True" ] ; then
        file_from_to_date_hour $from_date $to_date $from_hour $to_hour $designed_file
        
    else
        file_from_to_date $from_date $to_date $designed_file
    fi
}


function from_to_date_hour(){
    info "$FUNCNAME" "from_date:$1 to_date:$2 from_hour:$3 to_hour:$4"
    local from_date=$1
    local to_date=$2
    local from_hour=$3
    local to_hour=$4
    local to_timestamp=`date -d "$to_date" +%s`
    if [ ! -z $designed_file ] ; then
        judge_date_hour $from_date $to_date $from_hour $to_hour $designed_file
    else
        cd /data/rawdata/kafka/
        for file in $(ls *.txt*)
        do
            if [ ! -d $file ]; then
                judge_date_hour $from_date $to_date $from_hour $to_hour $file
            fi
        done
    fi
}




function judge_default_date(){
    info "$FUNCNAME" "start_date:$start_date end_date:$end_date"
    if [ ! -z $start_date ] || [ ! -z $end_date ] ; then
        if [ -z $start_date ] ; then
            start_date=$now_date
        fi
        if [ -z $end_date ] ; then
            end_date=$now_date
        fi
        return 1 
    else
        return 0
    fi          
}


function judge_default_hour(){
    info "$FUNCNAME" "start_hour:$start_hour end_hour:$end_hour"
    if [ ! 0 -eq $start_hour ] || [ ! 0 -eq $end_hour ] ; then
        if [ 0 -eq $start_hour ] ; then
            start_hour=0
        fi
        if [ 0 -eq $end_hour ] ; then
            end_hour=$now_hour
        fi
        return 1
    else
        return 0
    fi 
}


### \$start_date and \$end_date can not coexist with \$designed_date
if [ ! -z $start_date ] || [ ! -z $end_date ] && [ ! -z $designed_date ] ; then
    exit 1
fi
### \$start_hour and \$end_hour can not coexist with \$designed_hour
if [ ! 0 -eq $start_hour ] || [ ! 0 -eq $end_hour ] && [ ! 0 -eq $designed_hour ] ; then
    exit 1
fi


if [  -z $start_date ] && [  -z $end_date ] && [  -z $designed_date ] ; then
    start_date=$now_date
    end_date=$now_date
fi


if  [ 0 -eq $start_hour ] && [ 0 -eq $end_hour ] && [ 0 -eq $designed_hour ] ; then
    start_hour=0
    end_hour=$now_hour
fi




judge_default_date
if [ $? -eq 1 ] ; then
    judge_default_hour
    if [ $? -eq 1 ] ; then
        format_filename $start_date $end_date $start_hour $end_hour
        from_to_date_hour $start_date $end_date $start_hour $end_hour
    else
        format_filename $start_date $end_date $designed_hour $designed_hour
        from_to_date_hour $start_date $end_date $designed_hour $designed_hour
    fi
else
    judge_default_hour
    if [ $? -eq 1 ] ; then
        format_filename $designed_date $designed_date $start_hour $end_hour
        from_to_date_hour $designed_date $designed_date $start_hour $end_hour
    else
        format_filename $designed_date $designed_date $designed_hour $designed_hour
        from_to_date_hour $designed_date $designed_date $designed_hour $designed_hour
    fi
fi