Hadoop基準測試HiBench 學習筆記
阿新 • • 發佈:2019-01-03
一、準備工作
作業系統 centOS 64 位
JDK 1.7
maven
hadoop 2.6
github下載HiBench 4.0
執行 bin/build-all.sh編譯 (參考HiBench 的說明,因沒有安裝spark,直接執行run-all.sh 會報錯,忽略)。
二、 編譯執行
1. conf 目錄下的benchmarks.lst檔案設定測試項,一行一個測試專案,刪除到只保留wordcount
2. conf目錄下的languages.lst檔案設定測試語言的型別,一行一種語言,用#號註釋掉spark/java等,只保留mapreduce
3. 執行run-all.sh可得到結果
三、探索
1. 複製run-all.sh成run-wordcount.sh,將其中從benchmarks.lst這一段 刪除,benchmark直接設定成等於wordcount就只執行wordcount 測試
2. 閱讀sh發現它進一步呼叫workloads/wordcount的對應prepare.sh 準備資料,然後再呼叫workloads/wordcount的對應run.sh執行測試
3. function下的workload-functions.sh 中提供了gen_report()這樣的公共函式
4. 結構複雜,功能強
5.層層剝離出必須得公共函式,timestamp()、dir_size()、gen_report()等重新整理了一個單一的腳步bench-workcount.sh
6.執行bench-workcount.sh 測試通過,這樣把hadoop wordcount基準測試的基本流程理清楚了。
#!/bin/sh PS4="#: " function timestamp(){ # get current timestamp sec=`date +%s` nanosec=`date +%N` tmp=`expr $sec \* 1000 ` msec=`expr $nanosec / 1000000 ` echo `expr $tmp + $msec` } #hadoop fs -ls /HiBench/Wordcount/Input/part-m-00000 #hadoop fs -du -s /HiBench/Wordcount/Input/part-m-00000 #32849623 /HiBench/Wordcount/Input/part-m-00000 function dir_size() { for item in $1; do if [[ $item =~ ^[0-9]+$ ]]; then echo $item fi done } function gen_report() { # dump the result to report file local workload_name=$1 local start=$2 local end=$3 local size=$4 local report_file=$5 which bc > /dev/null 2>&1 if [ $? -eq 1 ]; then assert 0 "\"bc\" utility missing. Please install it to generate proper report." return 1 fi local duration=$(echo "scale=3;($end-$start)/1000"|bc) local tput=`echo "$size/$duration"|bc` nodes=2 if [ $nodes -eq 0 ]; then nodes=1; fi local tput_node=`echo "$tput/$nodes"|bc` echo "${REPORT_TITLE}" > $report_file local report_line=$(printf "${REPORT_COLUMN_FORMATS}" ${workload_name} $(date +%F) $(date +%T) $size $duration $tput $tput_node) echo "${report_line}" >> ${report_file} } #-------------------------------------------------------------- # setting enviroment : #------------------------------------------------------------- REPORT_COLUMN_FORMATS="%-12s %-10s %-8s %-20s %-20s %-20s %-20s\n" REPORT_TITLE="Type Date Time Input_data_size Duration(s) Throughput(bytes/s) Throughput/node" REPORT_FILE="./bench-report.txt" HADOOP_HOME="/home/hadoop/cloud/hadoop" HDFS_HOST="hdfs://cld-srv-01:9000" JARS="$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar" INPUT_DIR=$HDFS_HOST/HiBench/Wordcount/Input OUTPUT_DIR=$HDFS_HOST/HiBench/Wordcount/Output #-------------------------------------------------------------- # prepare testing data #-------------------------------------------------------------- $HADOOP_HOME/bin/hadoop --config $HADOOP_HOME/etc/hadoop fs -rm -r -skipTrash $INPUT_DIR $HADOOP_HOME/bin/hadoop --config $HADOOP_HOME/etc/hadoop jar $JARS randomtextwriter \ -D mapreduce.randomtextwriter.totalbytes=3200000000 \ -D mapreduce.job.maps=12 \ -D mapreduce.job.reduces=6 \ -D mapreduce.output.fileoutputformat.compress=false \ $INPUT_DIR SIZE_INFO=`$HADOOP_HOME/bin/hadoop --config $HADOOP_HOME/etc/hadoop fs -du -s $INPUT_DIR` SIZE=`dir_size $SIZE_INFO` echo $SIZE #------------------------------------------------------------ # execute benchmark testing #------------------------------------------------------------ $HADOOP_HOME/bin/hadoop --config $HADOOP_HOME/etc/hadoop fs -rm -r -skipTrash $OUTPUT_DIR START_TIME=`timestamp` $HADOOP_HOME/bin/hadoop --config $HADOOP_HOME/etc/hadoop jar $JARS wordcount \ -D mapreduce.output.fileoutputformat.compress=false \ -D mapreduce.job.maps=12 \ -D mapreduce.job.reduces=6 \ $INPUT_DIR \ $OUTPUT_DIR END_TIME=`timestamp` gen_report "HadoopWordCount" $START_TIME $END_TIME $SIZE $REPORT_FILE