Hive的內置函數

阿新 • • 發佈：2018-07-10

hadoop 模式 n) mage rac 。。擴展性 key 說明

定義：

UDF(User-Defined-Function)，用戶自定義函數對數據進行處理。

UDTF(User-Defined Table-Generating Functions) 用來解決輸入一行輸出多行(On-to-many maping) 的需求。

UDAF(User Defined Aggregation Function)用戶自定義聚合函數，操作多個數據行，產生一個數據行。

用法：

　　1、UDF函數可以直接應用於select語句，對查詢結構做格式化處理後，再輸出內容。

　　2、編寫UDF函數的時候需要註意一下幾點：

a）自定義UDF需要繼承org.apache.hadoop.hive.ql.UDF。

b）需要實現evaluate函。

c）evaluate函數支持重載。

hive的本地模式：

　　大多數的Hadoop job是需要hadoop提供的完整的可擴展性來處理大數據的。不過，有時hive的輸入數據量是非常小的。在這種情況下，為查詢出發執行任務的時間消耗可能會比實際job的執行時間要多的多。對於大多數這種情況，hive可以通過本地模式在單臺機器上處理所有的任務。對於小數據集，執行時間會明顯被縮短。

　　如此一來，對數據量比較小的操作，就可以在本地執行，這樣要比提交任務到集群執行效率要快很多。

　　配置如下參數，可以開啟Hive的本地模式：

hive> set hive.exec.mode.local.auto=true 
;(默認為false)

技術分享圖片
　　當一個job滿足如下條件才能真正使用本地模式：
　　　　1.job的輸入數據大小必須小於參數：hive.exec.mode.local.auto.inputbytes.max(默認128MB)
　　　　2.job的map數必須小於參數：hive.exec.mode.local.auto.tasks.max(默認4)
　　　　3.job的reduce數必須為0或者1

Hive已定義函數介紹：

　　1、字符串長度函數：length

　　　　語法: length(string A)
　　　　返回值: int
　　舉例：

hive> select length(‘abcedfg’) from 
 dual;
7

　　2、字符串反轉函數：reverse

　　語法: reverse(string A)
　　返回值: string
　　說明：返回字符串A的反轉結果

　　舉例：

hive> select reverse(‘abcedfg’) from dual;
gfdecba

　　3、字符串連接函數：concat

　　　　語法: concat(string A, string B…)
　　　　返回值: string
　　　　說明：返回輸入字符串連接後的結果，支持任意個輸入字符串

　　舉例：

hive> select concat(‘abc’,‘def’,‘gh’) from dual;
abcdefgh

　　4、帶分隔符字符串連接函數：concat_ws

　　　　語法: concat_ws(string SEP, string A, string B…)
　　　　返回值: string
　　　　說明：返回輸入字符串連接後的結果，SEP表示各個字符串間的分隔符
　　舉例：

hive> select concat_ws(‘,’,‘abc’,‘def’,‘gh’) from dual;
 
abc,def,gh

　　5、字符串截取函數：substr,substring

　　　　語法: substr(string A, int start),substring(string A, int start)
　　　　返回值: string
　　　　說明：返回字符串A從start位置到結尾的字符串
　　舉例：

hive> select substr(‘abcde’,3) from dual;
 
cde
 
hive> select substring(‘abcde’,3) from dual;
 
cde
 
hive> select substr(‘abcde’,-1) from dual; （和ORACLE相同）
 
e

　　6、字符串大小寫轉換

　　　　字符串轉大寫函數：upper,ucase

　　　　字符串轉小寫函數：lower,lcase

　　　　語法: lower(string A) lcase(string A)
　　　　返回值: string
　　　　說明：返回字符串A的小寫格式
　　舉例：

 
hive> select lower(‘abSEd’) from dual;
 
absed
 
hive> select lcase(‘abSEd’) from dual;
 
absed

　　7、左右去除空格函數

　　　　左邊去空格函數：ltrim

　　　　右邊去空格函數：rtrim

　　8、正則表達式替換函數：regexp_replace

　　　　語法: regexp_replace(string A, string B, string C)
　　　　返回值: string
　　　　說明：將字符串A中的符合java正則表達式B的部分替換為C。註意，在有些情況下要使用轉義字符
　　舉例：

 hive> select regexp_replace(‘foobar’, ‘oo|ar’, ”) from dual;
 
fb

　　9、正則表達式解析函數：regexp_extract

　　　　語法: regexp_extract(string subject, string pattern, int index)
　　　　返回值: string
　　　　說明：將字符串subject按照pattern正則表達式的規則拆分，返回index指定的字符。註意，在有些情況下要使用轉義字符
　　舉例：

hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) from dual;
 
the
 
hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 2) from dual;
 
bar
 
hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 0) from dual;
 
foothebar

　　10、URL解析函數：parse_url，parse_url_tuple（UDTF）

　　　　語法: parse_url(string urlString, string partToExtract [, string keyToExtract])，parse_url_tuple功能類似parse_url()，但它可以同時提取多個部分並返回
　　　　返回值: string
　　　　說明：返回URL中指定的部分。partToExtract的有效值為：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.
　　舉例：

hive> select parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1′, ‘HOST’) from dual;
 
facebook.com
 
hive> select parse_url_tuple(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1‘, ‘QUERY:k1‘, ‘QUERY:k2‘);
 
v1 v2

　　11、json解析函數：get_json_object

　　　　語法: get_json_object(string json_string, string path)
　　　　返回值: string
　　　　說明：解析json的字符串json_string,返回path指定的內容。如果輸入的json字符串無效，那麽返回NULL。
　　舉例：

hive> select get_json_object(‘{“store”:
 
> {“fruit”:\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
 
> “bicycle”:{“price”:19.95,”color”:”red”}
 
> },
 
> “email”:”amy@only_for_json_udf_test.net”,
 
> “owner”:”amy”
 
> }
 
> ‘,’$.owner’) from dual;
 
amy

　　12、集合查找函數: find_in_set

　　　　語法: find_in_set(string str, string strList)
　　　　返回值: int
　　　　說明: 返回str在strlist第一次出現的位置，strlist是用逗號分割的字符串。如果沒有找該str字符，則返回0（只能是逗號分隔，不然返回0）
　　舉例：

hive> select find_in_set(‘ab’,‘ef,ab,de’) from dual;
 
2
 
hive> select find_in_set(‘at’,‘ef,ab,de’) from dual;
 
0

　　13、行轉列：explode （posexplode Available as of Hive 0.13.0）

　　　　說明：將輸入的一行數組或者map轉換成列輸出
　　　　語法：explode(array (or map))
　　舉例：

hive> select explode(split(concat_ws(‘,‘,‘1‘,‘2‘,‘3‘,‘4‘,‘5‘,‘6‘,‘7‘,‘8‘,‘9‘),‘,‘)) from test.dual;
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9

　　14、多行轉換：lateral view

　　　　說明：lateral view用於和json_tuple，parse_url_tuple，split, explode等UDTF一起使用，它能夠將一行數據拆成多行數據，在此基礎上可以對拆分後的數據進行聚合。
　　舉例：

　　　　假設我們有一張表pageAds，它有兩列數據，第一列是pageid string，第二列是adid_list，即用逗號分隔的廣告ID集合：

string pageid	Array<int> adid_list
"front_page"	[1, 2, 3]
"contact_page"	[3, 4, 5]

　　　　要統計所有廣告ID在所有頁面中出現的次數。

　　　　首先分拆廣告ID：

SELECT pageid, adid 
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

　　　　　　　　　　　　　　　　　　　　　　執行結果如下：

string pageid	int adid
"front_page"	1
"front_page"	2
"front_page"	3
"contact_page"	3
"contact_page"	4
"contact_page"	5

　　解釋一下，from後面是你的表名，在表名後面加lateral view explode。。。（你的行轉列sql），還必須要起一個別名，我這個字段的別名為sp。然後再看看select後面的 s.*，就是原表的字段，我這裏面只有一個字段，且為X

　　多個lateral view的sql類如：

select * from exampletable lateral view explode(col1) mytable1 as mycol1 lateral view explode(mycol1) mytable2 as mycol2;

　　抽取一行數據轉換到新表的多列樣例：

　　　　http_referer是獲取的帶參數請求路徑，其中非法字符用\做了轉義，根據路徑解析出地址，查詢條件等存入新表中，

drop table if exists t_ods_tmp_referurl;
 
create table t_ ods _tmp_referurl as
 
SELECT a.*,b.*
 
FROM ods_origin_weblog a LATERAL VIEW parse_url_tuple(regexp_replace(http_referer, "\"", ""), ‘HOST‘, ‘PATH‘,‘QUERY‘, ‘QUERY:id‘) b as host, path, query, query_id;

　　復制表，並將時間截取到日：

drop table if exists t_ods_tmp_detail;
 
create table t_ods_tmp_detail as
 
select b.*,substring(time_local,0,10) as daystr,
 
substring(time_local,11) as tmstr,
 
substring(time_local,5,2) as month,
 
substring(time_local,8,2) as day,
 
substring(time_local,11,2) as hour
 
From t_ ods _tmp_referurl b;

Hive的內置函數

hadoop 模式 n) mage rac 。。擴展性 key 說明定義： UDF(User-Defined-Function)，用戶自定義函數對數據進行處理。 UDTF(User-Defined Table-Generating Functions) 用來解決輸入

Hive的內置函數

定義：

用法：

hive的本地模式：

Hive已定義函數介紹：

1、字符串長度函數：length

2、字符串反轉函數：reverse

3、字符串連接函數：concat

4、帶分隔符字符串連接函數：concat_ws

5、字符串截取函數：substr,substring

6、字符串大小寫轉換

7、左右去除空格函數

8、正則表達式替換函數：regexp_replace

9、正則表達式解析函數：regexp_extract

10、URL解析函數：parse_url，parse_url_tuple（UDTF）

11、json解析函數：get_json_object

12、集合查找函數: find_in_set

13、行轉列：explode （posexplode Available as of Hive 0.13.0）

14、多行轉換：lateral view

抽取一行數據轉換到新表的多列樣例：

Hive的內置函數

[Hive_6] Hive 的內置函數應用

內置函數

python：類2——有關類和對象的BIF內置函數

ORACLE 內置函數之GREATEST和LEAST

C++ 內置函數

python內置函數(四)

內置函數1

14、內置函數、遞歸函數、匿名函數

Python學習之路6?函數，遞歸，內置函數

PHP通過內置函數memory_get_usage()獲取內存使用情況

[ Python - 2 ] 常見內置函數

Python標準庫：內置函數all(iterable)

Python全棧開發之4、內置函數、文件操作和遞歸

Python的內置函數

Python內置函數enumerate()

第七篇 python基礎之函數，遞歸，內置函數

Day3 - Python基礎3 函數、遞歸、內置函數

Day 22 生成器yield表達式及內置函數（一丟丟）

內置函數data時間格式轉換

Hive的內置函數

定義：

用法：

hive的本地模式：

Hive已定義函數介紹：

1、字符串長度函數：length

2、字符串反轉函數：reverse

3、字符串連接函數：concat

4、帶分隔符字符串連接函數：concat_ws

5、字符串截取函數：substr,substring

6、字符串大小寫轉換

7、左右去除空格函數

8、正則表達式替換函數：regexp_replace

9、正則表達式解析函數：regexp_extract

10、URL解析函數：parse_url，parse_url_tuple（UDTF）

11、json解析函數：get_json_object

12、集合查找函數: find_in_set

13、行轉列：explode （posexplode Available as of Hive 0.13.0）

14、多行轉換：lateral view

抽取一行數據轉換到新表的多列樣例：

相關推薦

　　1、字符串長度函數：length

　　2、字符串反轉函數：reverse

　　3、字符串連接函數：concat

　　4、帶分隔符字符串連接函數：concat_ws

　　5、字符串截取函數：substr,substring

　　6、字符串大小寫轉換

　　7、左右去除空格函數

　　8、正則表達式替換函數：regexp_replace

　　9、正則表達式解析函數：regexp_extract

　　10、URL解析函數：parse_url，parse_url_tuple（UDTF）

　　11、json解析函數：get_json_object

　　12、集合查找函數: find_in_set

　　13、行轉列：explode （posexplode Available as of Hive 0.13.0）

　　14、多行轉換：lateral view

　　抽取一行數據轉換到新表的多列樣例：