對HAWQ進行TPC-DS測試
hawk跑TPC-DS
建立資料夾,把TPC工具放入
cd /tpcds/v2.1.0/tools/
./dsdgen -DIR /opt/3t_data -SCALE 3000-parallel 20 -child 20 -TERMINATE N
[[email protected] /]# mkdir tpcds_3t [[email protected] /]# ls bin boot cgroups_test dev etc hadoop home lib lib64 lost+found media mnt opt proc root sbin selinux srv sys tmp tpcds_3t usr var [[email protected] /]# cd tpcds_3t [[email protected] tpcds_3t]# ls DSTools.zip [[email protected] tpcds_3t]# |
解壓工具包,進入tools編譯
[[email protected] tpcds_3t]# unzip DSTools.zip ----- [[email protected] tpcds_3t]# ls DSTools.zip TPCDSVersion1.3.1 [[email protected] tpcds_3t]# cd TPCDSVersion1.3.1/ [[email protected] TPCDSVersion1.3.1]# ls answer_sets dbgen2 query_templates query_variants specification tools [[email protected] TPCDSVersion1.3.1]# cd tools [[email protected] tools]# make |
[[email protected] tools]# ./dsqgen –help |
多執行緒生成資料,後臺執行
nohup ./dsdgen -DIR /opt/3t_data -SCALE 3000 -parallel 30 -child 1 -TERMINATE N & |
檢視後臺程序
Jobs –l |
修改query_template下query1-99模板,在行尾加define _END = "";
#!/bin/bash COUNTER=1 while [ $COUNTER -lt 100 ] do echo $COUNTER echo "define _END = \"\";">>query$COUNTER.tpl COUNTER=`expr $COUNTER + 1` done |
生成查詢語句
./dsqgen -output_dir /opt/tpc_3t_queries/ -input /tpcds_3t/TPCDSVersion1.3.1/query_templates/templates.lst -scale 3000 -dialect ansi -directory /tpcds_3t/TPCDSVersion1.3.1/query_templates -rngseed 05092045000 |
[[email protected] tools]# su gpadmin [[email protected] tools]$ psql psql (8.2.15) Type "help" for help. gpadmin=# |
gpadmin=# create database tpcds_3t; CREATE DATABASE gpadmin=# \l List of databases Name | Owner | Encoding | Access privileges -----------+---------+----------+------------------- gpadmin | gpadmin | UTF8 | postgres | gpadmin | UTF8 | template0 | gpadmin | UTF8 | template1 | gpadmin | UTF8 | tpcds | gpadmin | UTF8 | tpch | gpadmin | UTF8 | (6 rows) gpadmin=# \c tpcds You are now connected to database "tpcds" as user "gpadmin". tpcds=# |
生成表
tpcds=# \d List of relations Schema | Name | Type | Owner | Storage --------+-----------------------+-------+---------+------------- public | customer_address | table | gpadmin | append only public | customer_demographics | table | gpadmin | append only public | date_dim | table | gpadmin | append only public | dbgen_version | table | gpadmin | append only public | income_band | table | gpadmin | append only public | inventory | table | gpadmin | append only public | item | table | gpadmin | append only public | promotion | table | gpadmin | append only public | reason | table | gpadmin | append only public | ship_mode | table | gpadmin | append only public | store_returns | table | gpadmin | append only public | store_sales | table | gpadmin | append only public | time_dim | table | gpadmin | append only public | warehouse | table | gpadmin | append only public | web_page | table | gpadmin | append only public | web_site | table | gpadmin | append only (16 rows) |
拷貝yaml檔案到資料路徑
[[email protected] ds_data]# pwd /opt/ds_data [[email protected] ds_data]# ls –s |
批量修改yaml檔案(資料庫名、埠號,資料路徑,資料檔名等)
[[email protected] ds_data]# sed -i 's/5432/5430/g' *.yaml |
載入表
[[email protected] ds_data]# gpload -f call_center.yaml 2016-05-06 16:14:39|INFO|gpload session started 2016-05-06 16:14:39 2016-05-06 16:14:39|INFO|setting schema 'public' for table 'call_center' 2016-05-06 16:14:39|INFO|started gpfdist -p 8081 -P 8082 -f "data1g/call_center.dat" -t 30 2016-05-06 16:14:46|INFO|running time: 6.75 seconds 2016-05-06 16:14:46|INFO|rows Inserted = 6 2016-05-06 16:14:46|INFO|rows Updated = 0 2016-05-06 16:14:46|INFO|data formatting errors = 0 2016-05-06 16:14:46|INFO|gpload succeeded [[email protected] ds_data]# |
批量載入指令碼
#!/bin/bash for f in *.yaml do gpload -f $f done |
載入後查看錶大小
select relname, pg_size_pretty(pg_relation_size(relname)) from pg_stat_user_tables where schemaname = 'public' order by pg_relation_size(relname) desc; |
生成99條sql的日誌檔案
COUNTER=1 while [ $COUNTER -lt 100 ] do echo $COUNTER touch query$COUNTER.log chown gpadmin query$COUNTER.log COUNTER=`expr $COUNTER + 1` done |
在每一條sql之前加入\timing
執行sql批處理
time for f in query* do log=${f}".log" echo $log psql -d tpcds -f $f > $log; done |
[[email protected] query_templates]$ ./sql.sh |
合併測試結果
[[email protected] query_templates]# cat query*.log > 1g_result.log |
執行完成後清除快取
free –m echo 3 > /proc/sys/vm/drop_caches |
表載入,載入機發送速率約120MB,接收速率約50MB(這樣至少要8個小時,為什麼不切割加?)