1. 程式人生 > >Hive實現交叉二維分析的小語句

Hive實現交叉二維分析的小語句

 

1. 梳理出你要的列和行維度

列維度: 每一週

行維度: 年級 + 學科 + 班型

2. 對資料按周增序進行聚合 (即根據列維度) ,生成list

concat_ws 和 collect_list (collect_set 會去重後再聚合)  順序隨機

sort_array 只能增序,要倒序排的話在子查詢裡新增一個輔助列來排序即可。

3. 依次取list的元素

即為 按周增序的指標結果

  select 
    term,
    kemu,
    course_applicable_user_type,
    split(hs,',')[0] lesson_order1,
    split(hs,',')[1] lesson_order2,
    split(hs,',')[2] lesson_order3,
    split(hs,',')[3] lesson_order4,
    split(hs,',')[4] lesson_order5
  from 
  (
    select 
           term,
           kemu,
           course_applicable_user_type,
           -- concat_ws(',', collect_list(cast(lesson_order as string))) as lesson_order_set,
           -- concat_ws(',', collect_list(cast(lesson_valid_rate as string))) as index_amount_set,    
           regexp_replace(
                  concat_ws(',',
                           sort_array (
                                   collect_list(
                                                concat_ws(':',
                                                          case when length(cast(lesson_order as string))=1 then concat('0',cast(lesson_order as string)) else cast(lesson_order as string) end,
                                                          cast(lesson_valid_rate as string)
                                                         )
                                                )
                                      )
                            ),'\\d\\d\:',''
                       )hs
    from
    (          
    select
      term,
      kemu,
      course_applicable_user_type,
      lesson_order,
      lesson_valid_rate 
    from tmp                
    )t
    group by term,kemu,course_applicable_user_type
   )t1