Hive學習小記-(12)橫表與縱表的互相轉換***
需求說明:這是一個橫錶轉縱表與縱錶轉橫表的故事,有點類似行列轉換 行轉列:一個欄位的多行資料合進一個列,通常可用collect_set+concat_ws;列轉行:一個欄位的一列資料拆到多個行,通常用explode
橫錶轉縱表:
1.原橫表資料: cust_id1,jijin_bal,baoxian_bal,cunkuan_bal 轉成縱表目標資料: cust_id1,基金,bal cust_id1,保險,bal cust_id1,存款,bal
方法:concat_ws+lateral view explode +split --算是列轉行??其實是相當於把橫表變成縱表 參考:https://www.cnblogs.com/foolangirl/p/14145147.html
縱錶轉橫表
2.原縱表資料: cust_id1,基金,bal cust_id1,保險,bal cust_id1,存款,bal 轉成目標橫表資料: cust_id1,jijin_bal,baoxian_bal,cunkuan_bal
方法一:case when
方法二:先轉map:cust1,基金:bal,保險:bal,存款:bal ;再inline
轉map參考:https://www.jianshu.com/p/02c2b8906893
explode inline參考:https://blog.csdn.net/huobumingbai1234/article/details/80559944) !!!注意explode和inline的map型別
import pyspark from pyspark.sql import SparkSession sc=SparkSession.builder.master("local")\ .appName('hive_col_row')\ .config('spark.executor.memory','2g')\ .config('spark.driver.memory','2g')\ .enableHiveSupport()\ .getOrCreate()
sc.sql(''' create table test_youhua.zongbiao(id int,prod_nm string,bal float)''') sc.sql(''' insert overwrite table test_youhua.zongbiao values(1,'jijin',1.1),(1,'baoxian',1.2),(1,'cunkuan',1.3),(2,'jijin',2.67),(2,'baoxian',2.34),(2,'cunkuan',2.1) ''') sc.sql(''' select * from test_youhua.zongbiao ''').show()
+---+-------+----+ | id|prod_nm| bal| +---+-------+----+ | 1| jijin| 1.1| | 1|baoxian| 1.2| | 1|cunkuan| 1.3| | 2| jijin|2.67| | 2|baoxian|2.34| | 2|cunkuan| 2.1| +---+-------+----+
方法一:case when縱錶轉橫表
sc.sql(''' select id ,max(case when prod_nm='jijin' then bal else 0 end) as jijin_bal ,max(case when prod_nm='baoxian' then bal else 0 end) as baoxian_bal ,max(case when prod_nm='cunkuan' then bal else 0 end) as cunkuan_bal from test_youhua.zongbiao group by id ''').show()
+---+---------+-----------+-----------+ | id|jijin_bal|baoxian_bal|cunkuan_bal| +---+---------+-----------+-----------+ | 1| 1.1| 1.2| 1.3| | 2| 2.67| 2.34| 2.1| +---+---------+-----------+-----------+
方法二:先轉map:cust1,基金:bal,保險:bal,存款:bal ;再inline
轉map參考:https://blog.csdn.net/huobumingbai1234/article/details/80559944
inline參考:https://blog.csdn.net/weixin_42003671/article/details/88132666
inline不支援map:https://www.jianshu.com/p/02c2b8906893
lateral view inline與 lateral view explode功能類似
!!!注意explode和inline的map型別
sc.sql(''' select id ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) from test_youhua.zongbiao group by id ''').show()
執行結果,已經轉換成了map格式: 1 {"jijin":"1.1","baoxian":"1.2","cunkuan":"1.3"} 2 {"jijin":"2.67","baoxian":"2.34","cunkuan":"2.1"}
# 這個不行啊,首先inline作用於struct這裡map操作不了,explode和inline都是列轉行函式,都是將map欄位打散開的,相當於把map又做成縱表了
sc.sql(''' select map_tmp_tbl.id,c1,c2 from ( select id ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) as map_col from test_youhua.zongbiao group by id ) as map_tmp_tbl lateral view explode(map_tmp_tbl.map_col) t1 as c1,c2 ''').show()
+---+-------+----+ | id| c1| c2| +---+-------+----+ | 1| jijin| 1.1| | 1|cunkuan| 1.3| | 1|baoxian| 1.2| | 2|cunkuan| 2.1| | 2|baoxian|2.34| | 2| jijin|2.67| +---+-------+----+
直接select map_col不就OK了??轉成橫表成功!!
sc.sql(''' select map_tmp_tbl.id ,map_col['jijin'] as jijin_bal ,map_col['baoxian'] as baoxian_bal ,map_col['cunkuan'] as cunkuan_bal from ( select id ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) as map_col from test_youhua.zongbiao group by id ) as map_tmp_tbl ''').show()
+---+---------+-----------+-----------+ | id|jijin_bal|baoxian_bal|cunkuan_bal| +---+---------+-----------+-----------+ | 1| 1.1| 1.2| 1.3| | 2| 2.67| 2.34| 2.1| +---+---------+-----------+-----------+
原來最好存map型別是這個意思,後面想轉橫表區直接取map的key對應的value可以轉橫表,想轉縱表可以用explode轉縱表 eg: 這裡的map_tmp_tbl就是存成了map型別,可以看出由explode可以轉縱表,直接取map對應key的value值可以轉橫表
tips
前面轉map用了: str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) 這裡需要注意:collect_ws 這個函式會對元素陣列去重,如果不去重用 collect_list 這個函式 str_to_map 函式也會去重,如果傳入的鍵值對有重複,只保留一個。如果還是要傳入重複的鍵值對,可以用下面的函式: regexp_replace(concat('{"',cast( concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))) as string),'"}'),',','","') as map_col
但這時候是一個json串,而不是map!!!sc.sql(''' select id ,regexp_replace(regexp_replace(concat('{"',concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))),'"}'),',','","'),':','":"') as map_col from test_youhua.zongbiao group by id ''').show()
-- 跑出來的結果,但是這時候不是map格式,要用json串的形式來讀: 1 {"jijin":"1.1","baoxian":"1.2","cunkuan":"1.3"} 2 {"jijin":"2.67","baoxian":"2.34","cunkuan":"2.1"}
sc.sql(''' select map_tmp_tbl.id ,get_json_object(map_col,'$.jijin') ,get_json_object(map_col,'$.baoxian') ,get_json_object(map_col,'$.cunkuan') from ( select id ,regexp_replace(regexp_replace(concat('{"',concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))),'"}'),',','","'),':','":"') as map_col from test_youhua.zongbiao group by id ) as map_tmp_tbl ''').show()
+---+---------------------------------+-----------------------------------+-----------------------------------+ | id|get_json_object(map_col, $.jijin)|get_json_object(map_col, $.baoxian)|get_json_object(map_col, $.cunkuan)| +---+---------------------------------+-----------------------------------+-----------------------------------+ | 1| 1.1| 1.2| 1.3| | 2| 2.67| 2.34| 2.1| +---+---------------------------------+-----------------------------------+-----------------------------------+
通過解析json串的形式也可以!
參考:Hive學習小記-(5)表字段變動頻繁時用json格式
那其實提前存成map格式的好處是比較多的,一是加欄位方便,而是橫表縱錶轉換方便。