1. 程式人生 > >hive優化方式和使用技巧

hive優化方式和使用技巧

部分內容出處:

http://www.atatech.org/article/detail/5617/0

一.UDFS函式介紹

1. 基本UDF

(1)SHOWFUNCTIONS:這個用來熟悉未知函式。

     DESCRIBE FUNCTION<function_name>;

(2)A IS NULL

     A IS NOT NULL

(3)A LIKE B 普通sql匹配如 like “a%”

     A RLIKE B通過正則表示式匹配

     A REGEXP B 通過正則表示式匹配

(4)round(double a):四捨五入

(5)rand(),rand(int seed):返回在(0,1)平均分佈的隨機數

(6)COALESCE(pv, 0):將 pv 為 null 的行轉為0,很實用

2. 日期函式

(1)datediff(string enddate, stringstartdate):

     返回enddate和startdate的天數的差,例如datediff('2009-03-01','2009-02-27') = 2

(2)date_add(stringstartdate, int days):

     加days天數到startdate:date_add('2008-12-31', 1) ='2009-01-01'

(3)date_sub(stringstartdate, int days):

     減days天數到startdate:date_sub('2008-12-31', 1) ='2008-12-30'

(4)date_format(date,date_pattern)

     CREATETEMPORARY FUNCTION date_format AS'com.taobao.hive.udf.UDFDateFormat';

     根據格式串format 格式化日期和時間值date,返回結果串。

     date_format('2010-10-10','yyyy-MM-dd','yyyyMMdd')

(5)str_to_date(str,format)

     將字串轉化為日期函式

CREATE TEMPORARY FUNCTIONstr_to_date AS 'com.taobao.hive.udf.UDFStrToDate';

      str_to_date('09/01/2009','MM/dd/yyyy')

3. 字串函式

(1)length(stringA):返回字串長度

(2)concat(stringA, string B...):

     合併字串,例如concat('foo','bar')='foobar'。注意這一函式可以接受任意個數的引數

(3)substr(stringA, int start) substring(string A,int start):

     返回子串,例如substr('foobar',4)='bar'

(4)substring(string A, int start,int len):

     返回限定長度的子串,例如substr('foobar',4, 1)='b'

(5)split(stringstr, string pat):

     返回使用pat作為正則表示式分割str字串的列表。例如,split('foobar','o')[2] = 'bar'。

(6)getkeyvalue(str,param):

     從字串中獲得指定 key 的 value 值 UDFKeyValue

     CREATE TEMPORARY FUNCTION getkeyvalue  AS 'com.taobao.hive.udf.UDFKeyValue';

4. 自定義函式

(1)row_number

CREATE TEMPORARY FUNCTION row_number  AS 'com.taobao.ad.data.search.udf.UDFrow_number'; 
select ip,uid,row_number(ip,uid) from ( 
	select ip,uid,logtime from atpanel 
	distribute by ip,uid 
	sort by ip,uid,logtime desc 
) a

(2)拆分key_value鍵值對

CREATE TEMPORARY FUNCTION ExplodeEX AS 'com.taobao.hive.udtf.UDTFExplodeEX'; 
select 
	split(kvs,'_')[0] as key,
	split(kvs,'_')[1] as key,
from ( select 'a-1|b-2' as kv from dual ) t
lateral view explode (split(kv,'\\|')) result as kvs

二. HIVE新特性

1. 支援多列的COUNT(*)和COUNT DISTINCT查詢

   select count(distinct col1, col2) from table_name;select count(*) from table_name;

2. 提供以本地模式執行Hive的選項

   設定mapred.job.tracker=local可開啟本地執行模式

3. 增強的列重新命名語法

   增加 ALTERTABLE table_name CHANGE old_name new_name語法。

4. 支援UNIQUE JOIN HIVE-591

   select .. from JOINTABLES (A,B,C) WITH KEYS (A.key, B.key, C.key) where ....

5. 增加檢測表和分割槽狀態的語法HIVE-667

   使用show table_name語法,檢查表和分割槽的狀態,包括大小和建立、訪問時間戳。6.      增加建表時支援STRUCT,結構體

7. 增加選擇驅動表的提示

8. 增加/*+STREAMTABLE(tb_alias)*/ HINT,以在Join操作時指定驅動表:

   SELECT /*+ STREAMTABLE(a) */ a.val, b.val, c.valFROM a

   JOIN b ON (a.key = b.key1)JOIN c ON (c.key = b.key1)

   指定此HINT後,原先預設的右表驅動會失效。

9. left Semi-Join HIVE-870

   Left Semi-Join是可以高效實現IN/EXISTS子查詢的語義。以下SQL語義:

(1)SELECT a.key, a.value FROM a WHERE a.key in (SELECT b.key FROM b);

   未實現Left Semi-Join之前,Hive實現上述語義的語句是:  

   SELECT t1.key, t1.value FROM a t1

   left outer join (SELECT distinct key from b) t2

   on t1.id = t2.id where t2.id is not null;

(2)可被替換為Left Semi-Join如下:

   SELECT a.key, a.valFROM a LEFT SEMI JOIN b on (a.key = b.key)

   這一實現減少至少1次MapReduce過程,注意Left Semi-Join的Join條件必須是等值。

10.Skew Join優化 HIVE-964 ,資料傾斜

   優化skewed join key為map join。開啟hive.optimize.skewjoin=true可優化傾斜的資料。Skew Join優化需要額外的mapjoin操作,且不能節省shuffle的代價。

11.Sorted merge (map) join HIVE-1194

   (對關鍵表key排序)

   如果MapJoin中的表都是有序的,這一特性使得Join操作無需掃描整個表,這將大大加速Join操作。可通過hive.optimize.bucketmapjoin.sortedmerge=true開啟這個功能,獲得高的效能提升。

12.支援ALTER TABLE修改分割槽的InputFormat/OutputFormat定義

   這一特性使得我們可以用壓縮方式(SequenceFileInputFormat)儲存後續表分割槽的資料,同時又不需要對以前的表分割槽做修改,即透明切換到壓縮格式。

13.支援併發提交沒有依賴關係的MR過程HIVE-549

   此前的Hive僅僅順序提交MR任務。這一增強使得沒有依賴關係的多次MR過程(例如Union all語義中的多個子查詢)可以併發提交。某些情況下可以提高單條HQL命令的響應速度。以下引數對併發提交功能啟作用:

   hive.exec.parallel[=false]

   hive.exec.parallel.thread.number[=8]

14.Sorted Group byHIVE-931

  (中間表的預處理)

   對已排序的欄位做Group by可以不再額外提交一次MR過程。這種情況下可以提高執行效率。

15.UDTF支援

   UDTF即User defined table function,是一種UDF,區別是這種UDF可以返回多條記錄。這一修改使得當前很多Transform指令碼可以被替換為更通用、更高效、更使用者友好的UDTF實現。UDTF是一種1:n輸出,可用於行轉列等。

   UDTF不支援UDTF/列混合的select、不支援巢狀、不支援相同子查詢中的GROUP BY / CLUSTER BY /DISTRIBUTE BY / SORT BY。

UDTF可與Lateral View相結合。

   動態分割槽可通過設定hive.exec.dynamic.partition=true開啟DP特性。使用方法:

   INSERT OVERWRITETABLE tbl partition (col1[=value][, col2[=value] …])

   使用hive.exec.dynamic.partition.mode = nonstrict動態分割槽有一定風險,包括小檔案、覆蓋資料等。預設分割槽開關:

   hive.exec.default.dynamic.partition.name

17.插入強制排序HIVE-1193

   只需要開啟hive.enforce.sorting選項即可。這一特性對Sorted merge bucket (map) join非常有用

18.支援檢視功能

   可用於欄位級別的許可權控制

19.支援持笛卡爾積join(1.0特性       SELECT a.*, b.*FROM aCROSS JOIN b

CREATE VIEW [IF NOT EXISTS] view_name
[ (column_name [COMMENT column_comment], … ) ]
[COMMENT ‘view_comment’]
AS SELECT …
[ ORDER BY …  LIMIT … ]

三. hive優化方式總結

1. 多表join優化程式碼結構:

   select .. from JOINTABLES (A,B,C) WITH KEYS (A.key, B.key, C.key) where ....

關聯條件相同多表join會優化成一個job

2. LeftSemi-Join是可以高效實現IN/EXISTS子查詢的語義

   SELECT a.key,a.value FROM a WHERE a.key in (SELECT b.key FROM b);

(1)未實現Left Semi-Join之前,Hive實現上述語義的語句是:

   SELECT t1.key, t1.valueFROM a  t1

   left outer join (SELECT distinctkey from b) t2 on t1.id = t2.id

   where t2.id is not null;

(2)可被替換為Left Semi-Join如下:

   SELECT a.key, a.valFROM a LEFT SEMI JOIN b on (a.key = b.key)

   這一實現減少至少1次MR過程,注意Left Semi-Join的Join條件必須是等值。

3. 預排序減少map  join和group by掃描資料HIVE-1194

(1)重要報表預排序,開啟hive.enforce.sorting選項即可

(2)如果MapJoin中的表都是有序的,這一特性使得Join操作無需掃描整個表,這將大大加速Join操作。可通過

     hive.optimize.bucketmapjoin.sortedmerge=true開啟這個功能,獲得高的效能提升。

set hive.mapjoin.cache.numrows=10000000;
set hive.mapjoin.size.key=100000;
Insert overwrite table pv_users
Select /*+MAPJOIN(pv)*/ pv.pageid,u.age 
from page_view pv
join user u on (pv.userid=u.userid;

(3)Sorted Group byHIVE-931

    對已排序的欄位做Group by可以不再額外提交一次MR過程。這種情況下可以提高執行效率。

4. 次性pv uv計算框架

(1)多個mr任務批量提交

     hive.exec.parallel[=false]

     hive.exec.parallel.thread.number[=8]

(2) 一次性計算框架,結合multi group by

     如果少量資料多個union會優化成一個job;

     反之計算量過大可以開啟批量mr任務提交減少計算壓力;

     利用兩次group by 解決count distinct 資料傾斜問題 

Set hive.exec.parallel=true;
Set hive.exec.parallel.thread.number=2;
From(
	Select
		Yw_type,
		Sum(case when type=’pv’ then ct end) as pv,
		Sum(case when type=’pv’ then 1 end) as uv,
		Sum(case when type=’click’ then ct end) as ipv,
		Sum(case when type=’click’ then 1 end) as ipv_uv
	from (
		select 
			yw_type,log_type,uid,count(1) as ct
		from (
			select ‘total’ yw_type,‘pv’ log_type,uid from pv_log 
			union all
			select ‘cat’ yw_type,‘click’ log_type,uid from click_log
		) t group by yw_type,log_type
	) t group by yw_type
) t			
Insert overwrite table tmp_1 
Select pv,uv,ipv,ipv_uv 
Where yw_type=’total’

Insert overwrite table tmp_2
Select pv,uv,ipv,ipv_uv
Where yw_type=’cat’;  


5. 控制hive中的map和reduce數

(1)合併小檔案

set mapred.max.split.size=100000000;
set mapred.min.split.size.per.node=100000000;
set mapred.min.split.size.per.rack=100000000;
set hive.input.format=
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

    hive.input.format=……表示合併小檔案。大於檔案塊大小128m的,按照128m來分隔,小於128m,大於100m的,按照100m來分隔,把那些小於100m的(包括小檔案和分隔大檔案剩下的),進行合併,最終生成了74個塊

(2)耗時任務增大map數

    setmapred.reduce.tasks=10;

6. 利用隨機數減少資料傾斜

   大表之間join容易因為空值產生資料傾斜 

select 
	a.uid
from big_table_a a
left outer join big_table_b b
on b.uid = case when a.uid is null or length(a.uid)=0
		then concat('rd_sid',rand()) else a.uid end;


四. 小技巧

1.空值處理, 結果表\N用空字串代替

   ALTER TABLE a SETSERDEPROPERTIES('serialization.null.format' = '');

2. 避免暴力掃描分割槽

   今日全量=昨日全量+今日增量

   30資料=前一個30日資料-31日資料+今日資料

   適用場景:需求穩定,需要訪問30天或1年資料

3. 利用動態分割槽減少任務執行時間

五. 通過JobTracker 源資料找出低效程式碼

1. On條件沒寫或者掃描過多分割槽情況

   Uv計算參考一次性pv uv計算框架解決方案,on或者分割槽條件沒寫去掉即可

select
	id as 天網id,prgname as 任務路徑,viewname as 顯示名稱,job_id ,job_name,job_value,
	length(trim(inputdir))-length(replace(trim(inputdir),',',''))+1 as pathcnt
from (
	select
		t1.id,t1.prgname,t1.viewname,
		t3.job_id,t3.job_name ,
		t3.job_value,
		DBMS_LOB.SUBSTR(t3.job_value,4000) as inputdir
	from(
		 select
			id,prgname,paravalue,viewname from dwa.etl_task_program t
		 where priority in('xx','xxx') --##統計的時候輸入自己的業務基線id
			and appflag=0
	) t1,
	dwa.hdp_job_map t2,
	dwa.hdp_job_conf t3
	where t1.id = t2.id
		and t2.job_id = t3.job_id
		and t2.gmtdate = trunc(sysdate-1)
		and t3.gmtdate = trunc(sysdate-1)
		and t3.job_name = 'mapred.input.dir'
)
where length(trim(inputdir))-length(replace(trim(inputdir),',','')) > 10;

2. 同一個指令碼相同單表被掃描多次

   儘量把所需要的資料一次性讀出來

select sky_id as 天網id,viewname as 天網顯示名稱,
	tab_name as 被掃描表,on_duty as 負責人,count(1) as  掃描次數 
from(
    select distinct a.tab_name,c.sql_id,a.sub_sql_id,c.sky_id,e.viewname,e.on_duty
    from dwa.meta_tab a,
    dwa.meta_sqlsub b,
    (select * from 
        (select sky_id,sql_id,sql_src,
	    row_number() over(partition by sky_id,length(sql_src) order by sql_id) rn 
	 from dwa.meta_sqlfull
    )where rn=1) c,
    dwa.meta_col d,dwa.etl_task_program e
    where e.priority  in('xx','xxx') --##統計的時候輸入自己的業務基線id
        and e.appflag=0 and e.id=c.sky_id
        and a.sub_sql_id=b.sub_sql_id and a.tab_id=d.tab_id and a.sub_sql_id=d.sub_sql_id and b.sqlfull_id=c.sql_id 
        and a.tab_name not like '%-%'  and b.sql_type='select'
    order by c.sky_id,c.sql_id,a.sub_sql_id
)group by sky_id,viewname,tab_name,on_duty
having count(1) >1
order by cnt desc;

3. Job數過多

   儘量一次性讀取所需資料

   才有union方式合併任務

   Left  outer join on條件相同會合併成一個job

 SELECT /*+ parallel(t,32) */ 
	groupname,
	id,
	BIZ_SORTID,
	ON_DUTY,
	PRGNAME,
	job_cnt,
	JOB_TOTAL_MAPS,
	JOB_TOTAL_REDUCES,
	TOTAL_TIME,
	HDFS_BYTES_READ,
	HDFS_BYTES_WRITTEN,
	TOTAL_MAP_TIME,
	TOTAL_REDUCE_TIME,
	MAP_INPUT_RECORDS,
	MAP_OUTPUT_RECORDS,
	REDUCE_INPUT_RECORDS,
	REDUCE_OUTPUT_RECORDS,
	time,
	row_number() over(partition by groupname order by TIME desc) rn_time,
	row_number() over(partition by groupname order by TOTAL_MAP_TIME+TOTAL_REDUCE_TIME desc) rn_slots
from(
	select 
	  DWA.ETL_TASK_BASELINE.name as  groupname,
	  DWA.HDP_JOB_MAP.ID,
	  DWA.ETL_TASK_PROGRAM.BIZ_SORTID,
	  DWA.ETL_TASK_PROGRAM.ON_DUTY,
	  DWA.ETL_TASK_LOG.PRGNAME,
	  count(DWA.HDP_JOB_MAP.job_id) job_cnt,  --天網任務的job數
	  sum(DWA.HDP_JOB_STAT.JOB_TOTAL_MAPS) JOB_TOTAL_MAPS,
	  sum(DWA.HDP_JOB_STAT.JOB_TOTAL_REDUCES) JOB_TOTAL_REDUCES,
	  sum(DWA.HDP_JOB_STAT.TOTAL_TIME) TOTAL_TIME,
	  sum(DWA.HDP_JOB_STAT.HDFS_BYTES_READ) HDFS_BYTES_READ,
	  sum(DWA.HDP_JOB_STAT.HDFS_BYTES_WRITTEN) HDFS_BYTES_WRITTEN,
	  sum(DWA.HDP_JOB_STAT.TOTAL_MAP_TIME) TOTAL_MAP_TIME,
	  sum(DWA.HDP_JOB_STAT.TOTAL_REDUCE_TIME) TOTAL_REDUCE_TIME,
	  sum(DWA.HDP_JOB_STAT.MAP_INPUT_RECORDS) MAP_INPUT_RECORDS,
	  sum(DWA.HDP_JOB_STAT.MAP_OUTPUT_RECORDS) MAP_OUTPUT_RECORDS, --new
	  sum(DWA.HDP_JOB_STAT.REDUCE_INPUT_RECORDS) REDUCE_INPUT_RECORDS,
	  sum(DWA.HDP_JOB_STAT.REDUCE_OUTPUT_RECORDS) REDUCE_OUTPUT_RECORDS, --new
	  trunc((DWA.ETL_TASK_LOG.edate-DWA.ETL_TASK_LOG.sdate)*24*60) time
	FROM
	  DWA.HDP_JOB_MAP,
	  DWA.ETL_TASK_PROGRAM,
	  DWA.ETL_TASK_LOG,
	  DWA.HDP_JOB_STAT,
	  DWA.ETL_TASK_BASELINE
	WHERE
	  ( DWA.HDP_JOB_STAT.JOB_ID=DWA.HDP_JOB_MAP.JOB_ID  )
	  AND  ( DWA.HDP_JOB_MAP.ID=DWA.ETL_TASK_LOG.ID  )
	  AND  ( DWA.ETL_TASK_LOG.ID=DWA.ETL_TASK_PROGRAM.ID  )
	  AND  ( DWA.ETL_TASK_PROGRAM.BASELINE_ID=DWA.ETL_TASK_BASELINE.ID  )
	  AND  
	  (
	   ( ( DWA.HDP_JOB_STAT.GMTDATE ) = trunc(sysdate)  )
	   AND
	   ( ( DWA.HDP_JOB_MAP.GMTDATE ) = trunc(sysdate)  )
	   AND
	   ( ( DWA.ETL_TASK_LOG.GMTDATE ) = trunc(sysdate)  )
	   AND DWA.ETL_TASK_PROGRAM.priority  in('xx','xxx') --##統計的時候輸入自己的業務基線id  
	  )
	GROUP BY
	  DWA.ETL_TASK_BASELINE.name,
	  DWA.HDP_JOB_MAP.ID, 
	  DWA.ETL_TASK_PROGRAM.BIZ_SORTID, 
	  DWA.ETL_TASK_PROGRAM.ON_DUTY, 
	  DWA.ETL_TASK_LOG.PRGNAME,
	  (DWA.ETL_TASK_LOG.edate-DWA.ETL_TASK_LOG.sdate)*24*60
 ) t
 where time is not null  and job_cnt>10 --job數量,可以自己定義;

4. From表個數過多(節點入度過高)

select sky_id as 天網id,viewname as 顯示名稱,
	sum(cnt) as 來源表使用次數,count(cnt) as 來源表個數 
from(
	select sky_id,viewname,tab_name,on_duty,count(1) cnt 
	from(
		select distinct a.tab_name,c.sql_id,a.sub_sql_id,c.sky_id,e.viewname,e.on_duty
		from dwa.meta_tab a,dwa.meta_sqlsub b,
		(
			select * 
			from(
				select sky_id,sql_id,sql_src,
				row_number() over(partition by sky_id,length(sql_src) order by sql_id) rn 
				from dwa.meta_sqlfull)
			where rn=1
		) c,
		dwa.meta_col d,dwa.etl_task_program e
		where e.priority  in('xx','xxx') --##統計的時候輸入自己的業務基線id  
			and e.appflag=0 and e.id=c.sky_id
			and a.sub_sql_id=b.sub_sql_id 
			and a.tab_id=d.tab_id and a.sub_sql_id=d.sub_sql_id and b.sqlfull_id=c.sql_id 
			and a.tab_name not like '%-%'  and b.sql_type='select'
		order by c.sky_id,c.sql_id,a.sub_sql_id
	)
	group by sky_id,viewname,tab_name,on_duty
	order by cnt desc
) 
group by sky_id,viewname
order by sum(cnt) desc;

5. Job傾斜情況

空值處理方法:

(1)直接過濾掉

(2)空值加上隨機數分散到不同的reduce

解決方法一job2,方法二job1

select   a11.GMTDATE as  任務執行日期,
   a11.GROUP_NAME  as 業務線名稱,
   a11.ID as 天網id,
   a11.SORT_ID as 雲梯優先順序,
   a11.NAME as 天網顯示名稱,
   a11.JOB_ID as job_id,
   a11.KEY_FLAG  是否關鍵節點任務,
   a11.USER_NAME  使用者名稱,
   sum(a11.JOB_AVG_TIME)  WJXBFS1,
   sum(a11.JOB_MAX_TIME)  WJXBFS2,
   sum(a11.JOB_AVG_RECORDS)  WJXBFS3,
   sum(a11.JOB_MAX_RECORDS)  WJXBFS4
from   DWA.VIEW_HDP_JOB_STAT   a11
where gmtdate=date'2012-09-27'
	and group_name in ('xxxxx')
--業務線名稱即天網任務配置裡的“專案”
group by   a11.GMTDATE,
   a11.GROUP_NAME,
   a11.ID,
   a11.SORT_ID,
   a11.NAME,
   a11.JOB_ID,
   a11.KEY_FLAG,
   a11.USER_NAME ;

6. 相同輸入位元組數的任務抽取與合併

   資料來源相同的任務,抽取相同的job進行合併

drop table gv_job_mapinput;
create table gv_job_mapinput as
select 
	id,prgname,job_id,MAP_INPUT_BYTES
from 
(
select 
 DWA.ETL_TASK_BASELINE.name groupname,
  DWA.HDP_JOB_MAP.ID,
  DWA.ETL_TASK_PROGRAM.BIZ_SORTID,
  DWA.ETL_TASK_PROGRAM.ON_DUTY,
  DWA.ETL_TASK_LOG.PRGNAME,
  DWA.HDP_JOB_MAP.job_id,  --天網任務的job數
  sum(DWA.HDP_JOB_STAT.JOB_TOTAL_MAPS) JOB_TOTAL_MAPS,
  sum(DWA.HDP_JOB_STAT.JOB_TOTAL_REDUCES) JOB_TOTAL_REDUCES,
  sum(DWA.HDP_JOB_STAT.TOTAL_TIME) TOTAL_TIME,
  sum(DWA.HDP_JOB_STAT.HDFS_BYTES_READ) HDFS_BYTES_READ,
  sum(DWA.HDP_JOB_STAT.HDFS_BYTES_WRITTEN) HDFS_BYTES_WRITTEN,
  sum(DWA.HDP_JOB_STAT.TOTAL_MAP_TIME) TOTAL_MAP_TIME,
  sum(DWA.HDP_JOB_STAT.TOTAL_REDUCE_TIME) TOTAL_REDUCE_TIME,
  sum(DWA.HDP_JOB_STAT.MAP_INPUT_RECORDS) MAP_INPUT_RECORDS,
  sum(DWA.HDP_JOB_STAT.MAP_INPUT_BYTES) MAP_INPUT_BYTES,
  sum(DWA.HDP_JOB_STAT.MAP_OUTPUT_RECORDS) MAP_OUTPUT_RECORDS, --new
  sum(DWA.HDP_JOB_STAT.REDUCE_INPUT_RECORDS) REDUCE_INPUT_RECORDS,
  sum(DWA.HDP_JOB_STAT.REDUCE_OUTPUT_RECORDS) REDUCE_OUTPUT_RECORDS, --new
  trunc((DWA.ETL_TASK_LOG.edate-DWA.ETL_TASK_LOG.sdate)*24*60) time
FROM
  DWA.HDP_JOB_MAP,
  DWA.ETL_TASK_PROGRAM,
  DWA.ETL_TASK_LOG,
  DWA.HDP_JOB_STAT,
  DWA.ETL_TASK_BASELINE
WHERE
  ( DWA.HDP_JOB_STAT.JOB_ID=DWA.HDP_JOB_MAP.JOB_ID  )
  AND  ( DWA.HDP_JOB_MAP.ID=DWA.ETL_TASK_LOG.ID  )
  AND  ( DWA.ETL_TASK_LOG.ID=DWA.ETL_TASK_PROGRAM.ID  )
  AND  ( DWA.ETL_TASK_PROGRAM.BASELINE_ID=DWA.ETL_TASK_BASELINE.ID  )
  AND  
  (
   ( ( DWA.HDP_JOB_STAT.GMTDATE ) = trunc(sysdate)  )
   AND
   ( ( DWA.HDP_JOB_MAP.GMTDATE ) = trunc(sysdate)  )
   AND
   ( ( DWA.ETL_TASK_LOG.GMTDATE ) = trunc(sysdate)  )
   AND
   DWA.ETL_TASK_PROGRAM.priority  in('xx','xxx')
  --##統計的時候輸入自己的業務基線id
  )
GROUP BY
  DWA.ETL_TASK_BASELINE.name,
  DWA.HDP_JOB_MAP.ID, 
  DWA.ETL_TASK_PROGRAM.BIZ_SORTID, 
  DWA.ETL_TASK_PROGRAM.ON_DUTY, 
  DWA.ETL_TASK_LOG.PRGNAME,
  DWA.HDP_JOB_MAP.job_id,
  (DWA.ETL_TASK_LOG.edate-DWA.ETL_TASK_LOG.sdate)*24*60
  )
order by MAP_INPUT_RECORDS desc ,job_id;
 
select * from gv_job_mapinput 
where id exists (
	select id from 
	(select id,prgname,count(job_id) cnt from gv_job_mapinput group by id,prgname)
	where cnt =1 
)
order by MAP_INPUT_BYTES desc;

7. 多個任務只有一個共同的父任務

drop table gvora_view_relation;
create table gvora_view_relation as 
select a.id,a.viewname,a.on_duty,a.sourceid,a.priority,a.parentid,
	b.viewname parentviewname,b.on_duty pon_duty,b.sourceid psourceid,b.priority p_priority 
from(
	select a.id,b.viewname,b.on_duty,b.sourceid,b.priority,a.parentid from 
	dwa.etl_task_relation a,
	dwa.etl_task_program b
	where a.id=b.id
) a,
dwa.etl_task_program b
where a.parentid=b.id;

select a.id as 天網id,a.viewname as 顯示名稱,rudu,chudu 
from(
	select id,viewname,count(1) rudu from gvora_view_relation
	where priority  in('xx','xxx')
	--##統計的時候輸入自己的業務基線id
	group by id,viewname
) a,
(
	select parentid,parentviewname,count(1) chudu from gvora_view_relation
	where priority  in('xx','xxx')
	 --##統計的時候輸入自己的業務基線id
	group by parentid,parentviewname
) b
where a.id=b.parentid
order by rudu +chudu desc;