Hive 小練習

阿新 • • 發佈：2019-01-29

1.日誌的格式如下：

pin|-|request_tm|-url|-|sku_id|-|amount

分隔符為‘|-|’,

資料樣例為:

假設本地資料檔案為sample.txt,先將其匯入到hive的test庫的表t_sample中，並計算每個使用者的總消費金額，寫出詳細過程包括表結構。

create external table t_sample

(pin string,

request_tm string,

url string,

sku_id string,

amount string)

row format delimited fields terminated by '|-|';

先清洗資料，改變其分割符變為‘\t’，存入本地檔案jd.txt中(注意split()方法也是需要轉義的)

create external table t_sample

(pin string,

request_tm string,

url string,

sku_id int,

amount double)

row format delimited fields terminated by "\t";

load data local inpath '/opt/module/datas/sample.txt' into table t_sample;

select * from t_sample;

select pin,sum(amount) from t_sample group by pin;

2.訂單詳情表ord_det(order_id訂單號，sku_id商品編號，sale_qtty銷售數量，dt日期分割槽)，計算2016年1月1日商品銷量的Top100，並按銷量降級排序

建表語句：

123456 111111 100

234567 222222 200

345678 333333 300

456789 444444 400

567890 555555 500

create table ord_det(order_id string,sku_id string,sale_qtty int)

partitioned by (dt string)

row format delimited fields terminated by "\t";

load data local inpath '/opt/module/datas/ord_det.txt' into table ord_det partition(dt='20160101');

select sku_id,sum(sale_qtty) sale_count from ord_det where dt="20160101" group by sku_id order by sale_count desc limit 100;

3.場景題：北京市(資料量很大)學生成績分析.

成績的資料格式:時間,學校,年級,姓名,科目,成績樣例資料如下:

2013,北大,1,裘容絮,語文,97

2013,北大,1,慶眠拔,語文,52

2013,北大,1,烏灑籌,語文,85

2012,清華,0,欽堯,英語,61

2015,北理工,3,冼殿,物理,81

2016,北科,4,況飄索,化學,92

2014,北航,2,孔須,數學,70

2012,清華,0,王脊,英語,59

2014,北航,2,方部盾,數學,49

2018,北航,2,東門雹,數學,77

2018,北大,1,裘容絮,語文,97

2018,北大,1,慶眠拔,語文,52

2013,北大,1,烏灑籌,語文,85

2017,清華,0,欽堯,英語,61

2015,北理工,3,冼殿,物理,81

2017,北科,4,況飄索,化學,92

2014,北航,2,孔須,數學,70

2018,清華,0,王脊,英語,59

2014,北航,2,方部盾,數學,49

2014,北航,2,東門雹,數學,77

... ...

問題:

（1）如何設計儲存這些資料的表，寫出建表語句：

create table score_ori

(year int,

school string,

class string,

name string,

subject string,

score double)

row format delimited fields terminated by ",";

匯入資料：

load data local inpath "/opt/module/datas/score.txt" into table score_ori;

開啟動態分割槽：

set hive.exec.dynamic.partition=true;

建立分割槽語句：

create table score_partition

(school string,

class string,

name string,

subject string,

score double)

partitioned by (year string)

row format delimited fields terminated by "\t";

查詢匯入資料：

insert into table score_partition partition (year) select school,class,name,subject,score,year from score_ori;

（2）選出今年每個學校,每個年級,分數前三的科目.

select *

from(

select

school,

class,

subject,

score,

row_number() over(partition by school,class,subject order by score desc) rank_code

from score_partition

where year="2018"

) t

where t.rank_code <= 3;

（3）今年清華 1年級總成績大於200分的學生以及學生數.

select

school,class,name,

sum(score) as total_score,

count(1) over (partition by school,class) nct

from

score_partition

where

year="2018" and school="清華" and class=1

group by

school,class,name

having

total_score>100;

4、有一張很大的表：TRLOG,資料如下：

PLATFORM USER_ID CLICK_TIME CLICK_URL

WEB 12332321 2013-03-21 13:48:31.324 /home/

WEB 12332321 2013-03-21 13:48:32.954 /selectcat/er/

WEB 12332321 2013-03-21 13:48:46.365 /er/viewad/12.html

WEB 12332321 2013-03-21 13:48:53.651 /er/viewad/13.html

建立原始表：

CREATE TABLE trlog

(platform string,

user_id int,

click_time string,

click_url string)

row format delimited fields terminated by "\t";

匯入資料：

load data local inpath "/opt/module/datas/log.txt" into table trlog;

CREATE TABLE allog

(platform string,

user_id int,

seq int,

from_url string,

to_url string)

row format delimited fields terminated by "\t";

查詢匯入資料：

insert into table allog

select

platform,

user_id,

row_number() over(partition by user_id order by click_time) seq,

lag(click_url,1) over(partition by user_id order by click_time) as from_url,

click_url as to_url

from

trlog;

結果展示：

select * from allog;

+-----------+-----------+------+---------------------+---------------------+--+

+-----------+-----------+------+---------------------+---------------------+--+

| WEB | 12332321 | 1 | NULL | /selectcat/er/ |

| WEB | 12332321 | 2 | /home/ | /er/viewad/12.html |

| WEB | 12332321 | 3 | /selectcat/er/ | /er/viewad/13.html |

| WEB | 12332321 | 4 | /er/viewad/12.html | Exit! |

+-----------+-----------+------+---------------------+---------------------+--+

Hive 小練習

1.日誌的格式如下： pin|-|request_tm|-url|-|sku_id|-|amount 分隔符為‘|-|’, 資料樣例為: 假設本地資料檔案為sample.txt,先將其匯入到hive的test庫的

【jQuery】動畫小練習

current href slow alt cart () chan move == 1.jQuery部分代碼如下 <script type="text/javascript"> $(function(){ var page = 1; var

PHP之連接mysql小練習

eset logs int echo mod reg title htm cal mysql　Test.sql 1 -- phpMyAdmin SQL Dump 2 -- version 4.6.6 3 -- https://www.phpmyadmin.net/

Python小練習更改版（更改一部分代碼，與錯誤）

print items isdigit 對不起 pro 轉換成 efault adl confirm 之前上傳的發現有部分代碼錯誤，重新上傳；更改了第一次的代碼與錯誤，增加了註釋與商店部分功能；沒有每天堅持更新博客，與初衷相差甚遠，堅持！每天進步一點點！

linux小練習

linux1、顯示/proc/meminfo文件中以大小s開頭的行：法一：法二：法三：法四：2、顯示/etc/passwd文件中不以/bin/bash結尾的行：3、顯示用戶rpc默認的shell程序：4、找出/etc/passwd中的兩位或三位數5、顯示CentOS7的/etc/grub2.cfg文件中，至

python小練習--屬性

class 開頭類方法文字沒有 __init__ 計算 div 技術分享箭頭這個作業： 1 class Box:#定義一個類名為Box，類名後不必有括號，類包含類屬性和類方法，這個類沒有定義類屬性 2 ‘‘‘這是一個計算體積的類‘‘‘#這是這個類

for和while循環小練習

() code 代碼 += 但是 app count while 都是用for和while循環，對數字列表/數字元組中的元素進行求和：用for實現>>> val = 0>>> l1 = [1,3,5,7,9]>>> f

python小練習①

spa 微信 logs draw .com odin height hub git 題目內容：將你的 QQ 頭像（或者微博頭像）右上角加上紅色的數字，類似於微信未讀信息數量那種提示效果。類似於圖中效果 #!usr/bin/env python #_*_coding:

python 小練習

判斷 [0 數字 for 任務麻煩保存開車完成一個環形的公路上有n個加油站，編號為0,1,2,...n-1, 每個加油站加油都有一個上限，保存在列表limit中，即limit[i]為第i個加油站加油的上限，而從第i個加油站開車開到第(i+1)%n個加油站需要co

7月9日小練習

練習 1.程序截圖：2.程序截圖：7月9日小練習

linux環境下關於顯示日期及修改密碼的小練習

一只小菜鳥的成長1.顯示1984-11-18是1984年的第幾天 2.顯示當前的日期 ##上面是兩種不同的表現形式## 3.在超級用戶下修改student用戶的密碼，並且student用戶在第一次登錄後強制修改密碼 ## passwd -e ## 強制修改密碼##註意：當使用root用戶修改其他用戶密

小練習：用socket實現Linux和Windows之間的通信

ren argc 漏洞 markdown tex sockets acc sas -m 在日常生活中，絕大部分人使用的機器通常是windows系統，可是對於研發人員，開發、編譯等工作往往是建立在linux機器上。其實。在服務器方面，Linux、UNIX和

Python小練習（1）

duyuheng python 比較價錢找出一個月中的天數計算三角的周長點在矩形內嗎？金融方面：比較價錢假設你購買大米時發現它有兩種包裝。你會別寫一個程序比較這兩種包裝的價錢。程序提示用戶輸入每種包裝的重量和價錢，然後顯示價錢更好的那種包裝。下面是個示例運行#!/usr/bin/env pytho

python 小練習 10

oot 每次 += ret 字母 val pre 字母表 bsp 給你一個十進制數a，將它轉換成b進制數,如果b>10,用大寫字母表示（10用A表示，等等） a為32位整數，2 <= b <= 16 如a=3,b = 2, 則輸出11 AC: dic

python 小練習 11

lis 取出 pre 練習全部例如 col sort 自己桌子上有一堆數量不超過20的果子，每個果子的重量都是不超過20的正整數，全部記錄在列表 L 裏面。小明和小紅決定平分它們，但是由於他們都太自私，沒有人願意對方比自己分得的總重量更多。而果子又不能切開，所以最後他

腳本和算術運算的小練習

linux練習1）寫一個腳本，計算/etc/passwd文件中的第10個用戶和第20個用戶的ID之和2）寫一個腳本，計算/etc/rc.d/rc.sysinit及/etc/rc.d/init.d/functions文件中所有空白行之和3）寫一個腳本，傳遞兩個文件路徑作為參數給腳本，計算兩個文件中所有空白行之和

vim編輯器小練習

linux1、復制/etc/grub2.cfg至/tmp目錄，用查找替換命令去刪除/tmp/grub2.cfg文件中的每一行行首的空白字符2、復制/etc/rc.d/init.d/functions至/tmp/目錄，用查找替換命令為/etc/rc.d/init.d/functions的每行開頭為空白字符的行的

磁盤管理和腳本交互小練習

linux練習1、創建20G的文件系統，塊大小2048,文件系統ext4，卷標TEST,要求此分區開機自動掛載至/tetsing目錄，且默認掛載屬性為acl2、創建5G文件系統，卷標HUGE，要求此分區開機自動掛載至/mogdata,文件系統類型ext33、寫一個腳本，完成如下功能：利用此前學到的if語句完

Python（入門小練習2）

python入門小練習用戶密碼登錄三次鎖定用戶密碼登錄三次鎖定案例需求1.輸入用戶名密碼 2.認證成功後顯示歡迎信息 3.輸錯三次後鎖定實現思路： 1.判斷用戶是否在黑名單，如果在黑名單提示賬號鎖定。 2.判斷用戶是否存在，如果不存在提示賬號不存在。 3.判斷賬號密碼是否正確，如果正確登

Python__小練習+對象之間的交互

class def print count amp 之間 ini int nic class Garen: camp=‘Demacia‘ def __init__(self,nickname,life_value=100,aggresivity=80):

Hive 小練習

相關推薦