1. 程式人生 > >hive中合理使用union all與multi insert

hive中合理使用union all與multi insert

對同一張表的union all 要比多重insert快的多,
原因是hive本身對這種union all做過優化,即只掃描一次源表;
http://www.apacheserver.net/How-is-Union-All-optimized-in-Hive-at229466.htm
而多重insert也只掃描一次,但應為要insert到多個分割槽,所以做了很多其他的事情,導致消耗的時間非常長;
希望大家在開發的時候多測,多試!

lxw_test3 12億左右記錄數
Union all : 耗時7分鐘左右

Java程式碼 收藏程式碼

  1. create table lxw_test5 as  
  2. select type,popt_id,login_date  
  3. from ( 
  4.         select 'm3_login' as type,popt_id,login_date   
  5.         from lxw_test3  
  6.         where login_date>='2012-02-01' and login_date<'2012-05-01'
  7.         union all  
  8.         select 'mn_login' as type,popt_id,login_date  
  9.         from lxw_test3  
  10.         where login_date>='2012-05-01' and login_date<='2012-05-09'
  11.         union all  
  12.         select 'm3_g_login' as type,popt_id,login_date  
  13.         from lxw_test3  
  14.         where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='1'
  15.         union all  
  16.         select 'm3_l_login' as type,popt_id,login_date  
  17.         from lxw_test3  
  18.         where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='2'
  19.         union all  
  20.         select 'm3_s_login' as type,popt_id,login_date  
  21.         from lxw_test3  
  22.         where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='3'
  23.         union all  
  24.         select 'm3_o_login' as type,popt_id,login_date  
  25.         from lxw_test3  
  26.         where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='4'
  27.         union all  
  28.         select 'mn_g_login' as type,popt_id,login_date  
  29.         from lxw_test3  
  30.         where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='1'
  31.         union all  
  32.         select 'mn_l_login' as type,popt_id,login_date  
  33.         from lxw_test3  
  34.         where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='2'
  35.         union all  
  36.         select 'mn_s_login' as type,popt_id,login_date  
  37.         from lxw_test3  
  38.         where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='3'
  39.         union all  
  40.         select 'mn_o_login' as type,popt_id,login_date  
  41.         from lxw_test3  
  42.         where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='4'
  43. ) x 

多重insert耗時25分鐘左右:

Java程式碼

  1. from lxw_test3  
  2. insert overwrite table lxw_test6 partition (flag = '1')  
  3. select 'm3_login' as type,popt_id,login_date   
  4. where login_date>='2012-02-01' and login_date<'2012-05-01'
  5. insert overwrite table lxw_test6 partition (flag = '2')  
  6. select 'mn_login' as type,popt_id,login_date  
  7. where login_date>='2012-05-01' and login_date<='2012-05-09'
  8. insert overwrite table lxw_test6 partition (flag = '3')  
  9. select 'm3_g_login' as type,popt_id,login_date  
  10. where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='1'
  11. insert overwrite table lxw_test6 partition (flag = '4')  
  12. select 'm3_l_login' as type,popt_id,login_date  
  13. where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='2'
  14. insert overwrite table lxw_test6 partition (flag = '5')  
  15. select 'm3_s_login' as type,popt_id,login_date  
  16. where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='3'
  17. insert overwrite table lxw_test6 partition (flag = '6')  
  18. select 'm3_o_login' as type,popt_id,login_date  
  19. where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='4'
  20. insert overwrite table lxw_test6 partition (flag = '7')  
  21. select 'mn_g_login' as type,popt_id,login_date  
  22. where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='1'
  23. insert overwrite table lxw_test6 partition (flag = '8')  
  24. select 'mn_l_login' as type,popt_id,login_