hive實現任務並行執行
阿新 • • 發佈:2019-02-13
hive.exec.parallel引數控制在同一個sql中的不同的job是否可以同時執行,預設為false.
下面是對於該引數的測試過程:
測試sql:
select r1.a
from (select t.a from sunwg_10 t join sunwg_10000000 s on t.a=s.b) r1 join (select s.b from sunwg_100000 t join sunwg_10 s on t.a=s.b) r2 on (r1.a=r2.b);
1
Set hive.exec.parallel=false;
當引數為false的時候,三個job是順序的執行
- hive> set hive.exec.parallel=false;
- hive> select r1.a
- > from (select t.a from sunwg_10 t join sunwg_10000000 s on t.a=s.b) r1 join (select s.b from sunwg_100000 t join sunwg_10 s on t.a=s.b) r2 on (r1.a=r2.b);
- Total MapReduce jobs = 3
- Launching Job 1 out of 3
-
Number of reduce tasks not specified. Estimated from input data size: 1
- In order to change the average load for a reducer (in bytes):
- set hive.exec.reducers.bytes.per.reducer=<number>
- In order to limit the maximum number of reducers:
- set hive.exec.reducers.max=<number>
- In order to set a constant number of reducers:
-
set mapred.reduce.tasks=<number
- Cannot run job locally: Input Size (= 397778060) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
- Starting Job = job_201208241319_2001905, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2001905
- Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2001905
- Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 1
- 2012-09-07 17:55:40,854 Stage-1 map = 0%, reduce = 0%
- 2012-09-07 17:55:55,663 Stage-1 map = 14%, reduce = 0%
- 2012-09-07 17:56:00,506 Stage-1 map = 56%, reduce = 0%
- 2012-09-07 17:56:10,254 Stage-1 map = 100%, reduce = 0%
- 2012-09-07 17:56:19,871 Stage-1 map = 100%, reduce = 29%
- 2012-09-07 17:56:30,000 Stage-1 map = 100%, reduce = 75%
- 2012-09-07 17:56:34,799 Stage-1 map = 100%, reduce = 100%
- Ended Job = job_201208241319_2001905
- Launching Job 2 out of 3
- Number of reduce tasks not specified. Estimated from input data size: 1
- In order to change the average load for a reducer (in bytes):
- set hive.exec.reducers.bytes.per.reducer=<number>
- In order to limit the maximum number of reducers:
- set hive.exec.reducers.max=<number>
- In order to set a constant number of reducers:
- set mapred.reduce.tasks=<number>
- Cannot run job locally: Input Size (= 3578060) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
- Starting Job = job_201208241319_2002054, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2002054
- Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2002054
- Hadoop job information for Stage-4: number of mappers: 2; number of reducers: 1
- 2012-09-07 17:56:43,343 Stage-4 map = 0%, reduce = 0%
- 2012-09-07 17:56:48,124 Stage-4 map = 50%, reduce = 0%
- 2012-09-07 17:56:55,816 Stage-4 map = 100%, reduce = 0%
- Ended Job = job_201208241319_2002054
- Launching Job 3 out of 3
- Number of reduce tasks not specified. Estimated from input data size: 1
- In order to change the average load for a reducer (in bytes):
- set hive.exec.reducers.bytes.per.reducer=<number>
- In order to limit the maximum number of reducers:
- set hive.exec.reducers.max=<number>
- In order to set a constant number of reducers:
- set mapred.reduce.tasks=<number>
- Cannot run job locally: Input Size (= 596) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
- Starting Job = job_201208241319_2002120, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2002120
- Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2002120
- Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
- 2012-09-07 17:57:12,641 Stage-2 map = 0%, reduce = 0%
- 2012-09-07 17:57:19,571 Stage-2 map = 50%, reduce = 0%
- 2012-09-07 17:57:25,199 Stage-2 map = 100%, reduce = 0%
- 2012-09-07 17:57:29,210 Stage-2 map = 100%, reduce = 100%
- Ended Job = job_201208241319_2002120
- OK
- abcdefghijk_0
- abcdefghijk_1
- abcdefghijk_2
- abcdefghijk_3
- abcdefghijk_4
- abcdefghijk_5
- abcdefghijk_6
- abcdefghijk_7
- abcdefghijk_8
- abcdefghijk_9
- Time taken: 135.944 seconds
2
但是可以看出來其實兩個子查詢中的sql並無關係,可以並行的跑
- hive> set hive.exec.parallel=true;
- hive> select r1.a
- > from (select t.a from sunwg_10 t join sunwg_10000000 s on t.a=s.b) r1 join (select s.b from sunwg_100000 t join sunwg_10 s on t.a=s.b) r2 on (r1.a=r2.b);
- Total MapReduce jobs = 3
- Launching Job 1 out of 3
- Launching Job 2 out of 3
- Number of reduce tasks not specified. Estimated from input data size: 1
- In order to change the average load for a reducer (in bytes):
- set hive.exec.reducers.bytes.per.reducer=<number>
- In order to limit the maximum number of reducers:
- set hive.exec.reducers.max=<number>
- In order to set a constant number of reducers:
- set mapred.reduce.tasks=<number>
- Cannot run job locally: Input Size (= 397778060) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
- Number of reduce tasks not specified. Estimated from input data size: 1
- In order to change the average load for a reducer (in bytes):
- set hive.exec.reducers.bytes.per.reducer=<number>
- In order to limit the maximum number of reducers:
- set hive.exec.reducers.max=<number>
- In order to set a constant number of reducers:
- set mapred.reduce.tasks=<number>
- Cannot run job locally: Input Size (= 3578060) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
- Starting Job = job_201208241319_2001452, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2001452
- Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2001452
- Starting Job = job_201208241319_2001453, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2001453
- Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2001453
- Hadoop job information for Stage-4: number of mappers: 2; number of reducers: 1
- Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 1
- 2012-09-07 17:52:10,558 Stage-4 map = 0%, reduce = 0%
- 2012-09-07 17:52:10,588 Stage-1 map = 0%, reduce = 0%
- 2012-09-07 17:52:22,827 Stage-1 map = 14%, reduce = 0%
- 2012-09-07 17:52:22,8