hive內建函式

阿新 • • 發佈：2018-12-31

• 正則表示式解析函式： regexp_extract 語法: regexp_extract(string subject, string pattern, int index) 返回值: string 說明：將字串subject按照pattern正則表示式的規則拆分，返回index指定的字元。舉例： hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 1) from dual; the hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) from dual; bar hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 0) from dual; foothebar 注意，在有些情況下要使用轉義字元，下面的等號要用雙豎線轉義，這是 java 正則表示式的規則。

select data_field, regexp_extract(data_field,'.*?bgStart\\=([^&]+)',1) as aaa, regexp_extract(data_field,'.*?contentLoaded_headStart\\=([^&]+)',1) as bbb, regexp_extract(data_field,'.*?AppLoad2Req\\=([^&]+)',1) as ccc from pt_nginx_loginlog_st where pt = '2012-03-26' limit 2; • URL 解析函式： parse_url

語法: parse_url(string urlString, string partToExtract [, string keyToExtract]) 返回值: string 說明：返回URL中指定的部分。partToExtract的有效值為：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. 舉例： hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') from dual; facebook.com hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') from dual; v1 • json 解析函式： get_json_object

語法: get_json_object(string json_string, string path) 返回值: string 說明：解析json的字串json_string,返回path指定的內容。如果輸入的json字串無效，那麼返回NULL。舉例： hive> select get_json_object('{"store": > {"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], > "bicycle":{"price":19.95,"color":"red"} > }, > "email":"[email protected]_for_json_udf_test.net", > "owner":"amy" > } > ','$.owner') from dual; amy • 空格字串函式： space 語法: space(int n) 返回值: string 說明：返回長度為n的字串舉例： hive> select space(10) from dual; hive> select length(space(10)) from dual; 10 • 重複字串函式： repeat 語法: repeat(string str, int n) 返回值: string 說明：返回重複n次後的str字串舉例： hive> select repeat('abc',5) from dual; abcabcabcabcabc • 首字元 ascii 函式： ascii 語法: ascii(string str) 返回值: int 說明：返回字串str第一個字元的ascii碼舉例： hive> select ascii('abcde') from dual; 97 • 左補足函式： lpad 語法: lpad(string str, int len, string pad) 返回值: string 說明：將str進行用pad進行左補足到len位舉例： hive> select lpad('abc',10,'td') from dual; tdtdtdtabc 注意：與 GP ， ORACLE 不同， pad 不能預設 • 右補足函式： rpad 語法: rpad(string str, int len, string pad) 返回值: string 說明：將str進行用pad進行右補足到len位舉例： hive> select rpad('abc',10,'td') from dual; abctdtdtdt • 分割字串函式 : split 語法: split(string str, string pat) 返回值: array 說明: 按照pat字串分割str，會返回分割後的字串陣列舉例： hive> select split('abtcdtef','t') from dual; ["ab","cd","ef"] • 集合查詢函式 : find_in_set 語法: find_in_set(string str, string strList) 返回值: int 說明: 返回str在strlist第一次出現的位置，strlist是用逗號分割的字串。如果沒有找該str字元，則返回0 舉例： hive> select find_in_set('ab','ef,ab,de') from dual; 2 hive> select find_in_set('at','ef,ab,de') from dual; 0 第七部分：集合統計函式 •個數統計函式: count •總和統計函式: sum •平均值統計函式: avg • 最小值統計函式: min •最大值統計函式: max • 個數統計函式 : count 語法: count(*), count(expr), count(DISTINCT expr[, expr_.]) 返回值: int 說明: count(*)統計檢索出的行的個數，包括NULL值的行；count(expr)返回指定欄位的非空值的個數；count(DISTINCT expr[, expr_.])返回指定欄位的不同的非空值的個數舉例： hive> select count(*) from dual; 20 hive> select count(distinct t) from dual; 10 • 總和統計函式 : sum 語法: sum(col), sum(DISTINCT col) 返回值: double 說明: sum(col)統計結果集中col的相加的結果；sum(DISTINCT col)統計結果中col不同值相加的結果舉例： hive> select sum(t) from dual; 100 hive> select sum(distinct t) from dual; 70 • 平均值統計函式 : avg 語法: avg(col), avg(DISTINCT col) 返回值: double 說明: avg(col)統計結果集中col的平均值；avg(DISTINCT col)統計結果中col不同值相加的平均值舉例： hive> select avg(t) from dual; 50 hive> select avg (distinct t) from dual; 30 • 最小值統計函式 : min 語法: min(col) 返回值: double 說明: 統計結果集中col欄位的最小值舉例： hive> select min(t) from dual; 20 • 最大值統計函式 : max 語法: maxcol) 返回值: double 說明: 統計結果集中col欄位的最大值舉例： hive> select max(t) from dual; 120 第八部分：符合型別構建操作 •Map型別構建: map •Struct型別構建: struct •array型別構建: array • Map 型別構建 : map 語法: map (key1, value1, key2, value2, …) 說明：根據輸入的key和value對構建map型別舉例： hive> Create table alex_test as select map('100','tom','200','mary') as t from dual; hive> describe alex_test; t map<string,string> hive> select t from alex_test; {"100":"tom","200":"mary"} • Struct 型別構建 : struct 語法: struct(val1, val2, val3, …) 說明：根據輸入的引數構建結構體struct型別舉例： hive> create table alex_test as select struct('tom','mary','tim') as t from dual; hive> describe alex_test; t struct<col1:string,col2:string,col3:string> hive> select t from alex_test; {"col1":"tom","col2":"mary","col3":"tim"} • array 型別構建 : array 語法: array(val1, val2, …) 說明：根據輸入的引數構建陣列array型別舉例： hive> create table alex_test as select array("tom","mary","tim") as t from dual; hive> describe alex_test; t array<string> hive> select t from alex_test; ["tom","mary","tim"] 第九部分：複雜型別訪問操作 •array型別訪問: A[n] •map型別訪問: M[key] •struct型別訪問: S.x • array 型別訪問 : A[n] 語法: A[n] 操作型別: A為array型別，n為int型別說明：返回陣列A中的第n個變數值。陣列的起始下標為0。比如，A是個值為['foo', 'bar']的陣列型別，那麼A[0]將返回'foo',而A[1]將返回'bar' 舉例： hive> create table alex_test as select array("tom","mary","tim") as t from dual; hive> select t[0],t[1],t[2] from alex_test; tom mary tim • map 型別訪問 : M[key] 語法: M[key] 操作型別: M為map型別，key為map中的key值說明：返回map型別M中，key值為指定值的value值。比如，M是值為{'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'}的map型別，那麼M['all']將會返回'foobar' 舉例： hive> Create table alex_test as select map('100','tom','200','mary') as t from dual; hive> select t['200'],t['100'] from alex_test; mary tom • struct 型別訪問 : S.x 語法: S.x 操作型別: S為struct型別說明：返回結構體S中的x欄位。比如，對於結構體struct foobar {int foo, int bar}，foobar.foo返回結構體中的foo欄位舉例： hive> create table alex_test as select struct('tom','mary','tim') as t from dual; hive> describe alex_test; t struct<col1:string,col2:string,col3:string> hive> select t.col1,t.col3 from alex_test; tom tim 第十部分：複雜型別長度統計函式 •Map型別長度函式: size(Map<K.V>) •array型別長度函式: size(Array<T>) •型別轉換函式 • Map 型別長度函式 : size(Map<K.V>) 語法: size(Map<K.V>) 返回值: int 說明: 返回map型別的長度舉例： hive> select size(map('100','tom','101','mary')) from dual; 2 • array 型別長度函式 : size(Array<T>) 語法: size(Array<T>) 返回值: int 說明: 返回array型別的長度舉例： hive> select size(array('100','101','102','103')) from dual; 4 • 型別轉換函式型別轉換函式: cast 語法: cast(expr as <type>) 返回值: Expected "=" to follow "type" 說明: 返回array型別的長度舉例： hive> select cast(1 as bigint) from dual; 1

hive內建函式

hive 內建函式

大資料入門教程系列之Hive內建函式及自定義函式

大資料系列之hive（八、hive內建函式全解）

hive內建函式

Hive 內建函式及自定義函式

Hive內建函式（測試函式小技巧）

Spark SQL 支援的Hive內建函式

Hive 內建函式和UDF函式

hive內建函式和自定義函式的使用

hive內建函式大全

Hive基本操作，DDL操作(建立表，修改表，顯示命令)，DML操作(Load Insert Select),Hive Join,Hive Shell引數(內建運算子、內建函式)等

[Hive_6] Hive 的內建函式應用

hive---常用內建函式總結

Hive常見內建函式及其使用

Hive內建聚合函式

hive的內建函式

hive內建UDTF函式

【十七】hive常用內建函式之String Functions

Hive 運算子 & 內建函式詳解 -- 適合關鍵詞查詢

HIVE 常見的內建函式

hive內建函式

相關推薦