1. 程式人生 > >Hive 小練習

Hive 小練習

1.日誌的格式如下:

    pin|-|request_tm|-url|-|sku_id|-|amount

    分隔符為‘|-|’,

    資料樣例為:

    假設本地資料檔案為sample.txt,先將其匯入到hive的test庫的表t_sample中,並計算每個使用者的總消費金額,寫出詳細過程包括表結構。

create external table t_sample

(pin string,

request_tm string,

url string,

sku_id string,

amount string)

row format delimited fields terminated by '|-|';

    先清洗資料,改變其分割符變為‘\t’,存入本地檔案jd.txt中(注意split()方法也是需要轉義的)

create external table t_sample

(pin string,

request_tm string,

url string,

sku_id int,

amount double)

row format delimited fields terminated by "\t";

load data local inpath '/opt/module/datas/sample.txt' into table t_sample;

select * from t_sample;

select pin,sum(amount) from t_sample group by pin;

2.訂單詳情表ord_det(order_id訂單號,sku_id商品編號,sale_qtty銷售數量,dt日期分割槽),計算2016年1月1日商品銷量的Top100,並按銷量降級排序

建表語句:

123456    111111    100

234567    222222    200

345678    333333    300

456789    444444    400

567890    555555    500

create table ord_det(order_id string,sku_id string,sale_qtty int)

partitioned by (dt string)

row format delimited fields terminated by "\t";

load data local inpath '/opt/module/datas/ord_det.txt' into table ord_det partition(dt='20160101');

select sku_id,sum(sale_qtty) sale_count from ord_det where dt="20160101" group by sku_id order by sale_count desc limit 100;

3.場景題:北京市(資料量很大)學生成績分析.

成績的資料格式:時間,學校,年級,姓名,科目,成績樣例資料如下:

2013,北大,1,裘容絮,語文,97

2013,北大,1,慶眠拔,語文,52

2013,北大,1,烏灑籌,語文,85

2012,清華,0,欽堯,英語,61

2015,北理工,3,冼殿,物理,81

2016,北科,4,況飄索,化學,92

2014,北航,2,孔須,數學,70

2012,清華,0,王脊,英語,59

2014,北航,2,方部盾,數學,49

2018,北航,2,東門雹,數學,77

2018,北大,1,裘容絮,語文,97

2018,北大,1,慶眠拔,語文,52

2013,北大,1,烏灑籌,語文,85

2017,清華,0,欽堯,英語,61

2015,北理工,3,冼殿,物理,81

2017,北科,4,況飄索,化學,92

2014,北航,2,孔須,數學,70

2018,清華,0,王脊,英語,59

2014,北航,2,方部盾,數學,49

2014,北航,2,東門雹,數學,77

... ...

問題:

(1)如何設計儲存這些資料的表,寫出建表語句:

create table score_ori

(year int,

school string,

class string,

name string,

subject string,

score double)

row format delimited fields terminated by ",";

    匯入資料:

load data local inpath "/opt/module/datas/score.txt" into table score_ori;

    開啟動態分割槽:

set hive.exec.dynamic.partition=true;

    建立分割槽語句:

create table score_partition

(school string,

class string,

name string,

subject string,

score double)

partitioned by (year string)

row format delimited fields terminated by "\t";

    查詢匯入資料:

insert into table score_partition partition (year) select school,class,name,subject,score,year from score_ori;

(2)選出今年每個學校,每個年級,分數前三的科目.

select *

from(

select

school,

class,

subject,

score,

row_number() over(partition by school,class,subject order by score desc) rank_code

from score_partition

where year="2018"

) t

where t.rank_code <= 3;

(3)今年 清華 1年級 總成績大於200分的學生 以及學生數.

select

school,class,name,

sum(score) as total_score,

count(1)  over (partition by school,class) nct

from

score_partition

where

year="2018" and school="清華" and class=1

group by

school,class,name

having

total_score>100;

4、有一張很大的表:TRLOG,資料如下:

PLATFORM        USER_ID            CLICK_TIME                    CLICK_URL

WEB                12332321        2013-03-21 13:48:31.324        /home/

WEB                12332321        2013-03-21 13:48:32.954        /selectcat/er/

WEB                12332321        2013-03-21 13:48:46.365        /er/viewad/12.html

WEB                12332321        2013-03-21 13:48:53.651        /er/viewad/13.html

    建立原始表:

CREATE TABLE trlog

(platform string,

user_id int,

click_time string,

click_url string)

row format delimited fields terminated by "\t";

    匯入資料:

load data local inpath "/opt/module/datas/log.txt" into table trlog;

CREATE TABLE allog

(platform string,

user_id int,

seq int,

from_url string,

to_url string)

row format delimited fields terminated by "\t";

    查詢匯入資料:

insert into table allog

select

platform,

user_id,

row_number() over(partition by user_id order by click_time) seq,

lag(click_url,1) over(partition by user_id order by click_time) as from_url,

click_url as to_url

from

trlog;

    結果展示:

select * from allog;

+-----------+-----------+------+---------------------+---------------------+--+

| platform  |  user_id  | seq  |      from_url       |       to_url        |

+-----------+-----------+------+---------------------+---------------------+--+

| WEB       | 12332321  | 1    | NULL                | /selectcat/er/      |

| WEB       | 12332321  | 2    | /home/              | /er/viewad/12.html  |

| WEB       | 12332321  | 3    | /selectcat/er/      | /er/viewad/13.html  |

| WEB       | 12332321  | 4    | /er/viewad/12.html  | Exit!               |

+-----------+-----------+------+---------------------+---------------------+--+