hive中合理使用union all與multi insert
阿新 • • 發佈:2019-01-27
對同一張表的union all 要比多重insert快的多,
原因是hive本身對這種union all做過優化,即只掃描一次源表;
http://www.apacheserver.net/How-is-Union-All-optimized-in-Hive-at229466.htm
而多重insert也只掃描一次,但應為要insert到多個分割槽,所以做了很多其他的事情,導致消耗的時間非常長;
希望大家在開發的時候多測,多試!
lxw_test3 12億左右記錄數
Union all : 耗時7分鐘左右
- create table lxw_test5 as
- select type,popt_id,login_date
- from (
- select 'm3_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-02-01' and login_date<'2012-05-01'
- union all
- select 'mn_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-05-01' and login_date<='2012-05-09'
- union all
- select 'm3_g_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='1'
- union all
- select 'm3_l_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='2'
- union all
- select 'm3_s_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='3'
- union all
- select 'm3_o_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='4'
- union all
- select 'mn_g_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='1'
- union all
- select 'mn_l_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='2'
- union all
- select 'mn_s_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='3'
- union all
- select 'mn_o_login' as type,popt_id,login_date
- from lxw_test3
- where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='4'
- ) x
多重insert耗時25分鐘左右:
Java程式碼
- from lxw_test3
- insert overwrite table lxw_test6 partition (flag = '1')
- select 'm3_login' as type,popt_id,login_date
- where login_date>='2012-02-01' and login_date<'2012-05-01'
- insert overwrite table lxw_test6 partition (flag = '2')
- select 'mn_login' as type,popt_id,login_date
- where login_date>='2012-05-01' and login_date<='2012-05-09'
- insert overwrite table lxw_test6 partition (flag = '3')
- select 'm3_g_login' as type,popt_id,login_date
- where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='1'
- insert overwrite table lxw_test6 partition (flag = '4')
- select 'm3_l_login' as type,popt_id,login_date
- where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='2'
- insert overwrite table lxw_test6 partition (flag = '5')
- select 'm3_s_login' as type,popt_id,login_date
- where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='3'
- insert overwrite table lxw_test6 partition (flag = '6')
- select 'm3_o_login' as type,popt_id,login_date
- where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='4'
- insert overwrite table lxw_test6 partition (flag = '7')
- select 'mn_g_login' as type,popt_id,login_date
- where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='1'
- insert overwrite table lxw_test6 partition (flag = '8')
- select 'mn_l_login' as type,popt_id,login_