影評專案(hive)
現有如此三份資料:
1、users.dat
資料格式為: 2::M::56::16::70072
對應欄位為:UserID BigInt,Gender String,Age Int,Occupation String,Zipcode String
對應欄位中文解釋:使用者id,性別,年齡,職業,郵政編碼
2、movies.dat
資料格式為: 2::Jumanji (1995)::Adventure|Children’s|Fantasy
對應欄位為:MovieID BigInt, Title String, Genres String
對應欄位中文解釋:電影ID,電影名字,電影型別
3、ratings.dat
資料格式為: 1::1193::5::978300760
對應欄位為:UserID BigInt, MovieID BigInt, Rating Double, Timestamped String
對應欄位中文解釋:使用者ID,電影ID,評分,評分時間戳
set hive.cli.print.current.db=true; //顯示當前庫
set hive.exec.mode.local.auto=true; //設定hive執行的本機模式
set hive.mapred.mode=nonstrict;
題目要求:
資料要求:
(1)寫shell指令碼清洗資料。(hive不支援解析多位元組的分隔符,也就是說hive只能解析’:’, 不支援解析’::’,所以用普通方式建表來使用是行不通的,要求對資料做一次簡單清洗)
#!/bin/bash echo "Wait for a moment" cd /home/movetest/ml-1m for i in $'*.dat' do echo $i sed -i "s/::/:/g" $i done echo "have finished!" 或者: #!/bin/bash echo "Wait for a moment" cd /home/movetest/ml-1m sed -i "s/::/:/g" `grep "qwe" -rl ./` echo "have finished!"
(2)使用Hive能解析的方式進行
注:建表時處理
Hive要求:
1、正確建表,匯入資料(三張表,三份資料),並驗證是否正確
create table users(UserID BigInt,Gender String,Age Int,Occupation String,Zipcode String) row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe' with serdeproperties('input.regex'='(.*)::(.*)::(.*)::(.*)::(.*)','output.format.string'='%1$s %2$s %3$s %4$s %5$s') stored as textfile; load data local inpath '/home/movetest/ml-1m/users.dat' INTO TABLE users; select * from users limit 10; create table movies(MovieID BigInt, Title String, Genres String) row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe' with serdeproperties('input.regex'='(.*)::(.*)::(.*)','output.format.string'='%1$s %2$s %3$s') stored as textfile; load data local inpath '/home/movetest/ml-1m/movies.dat' INTO TABLE movies; select * from movies limit 10; create table ratings(UserID BigInt, MovieID BigInt, Rating Double, Timestamped String) row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe' with serdeproperties('input.regex'='(.*)::(.*)::(.*)::(.*)','output.format.string'='%1$s %2$s %3$s %4$s') stored as textfile; load data local inpath '/home/movetest/ml-1m/ratings.dat' INTO TABLE ratings; select * from ratings limit 10;
2、求被評分次數最多的10部電影,並給出評分次數(電影名,評分次數)
分析:
表: movies ratings
要求的欄位:title count(userid)
select a.title, count(b.userid) counts
from movies a join ratings b
on a.movieid = b.movieid
group by a.title,b.movieid
order by counts desc
limit 10
;
American Beauty (1999) 3428
Star Wars 2991
Star Wars 2990
Star Wars 2883
Jurassic Park (1993) 2672
Saving Private Ryan (1998) 2653
Terminator 2 2649
Matrix, The (1999) 2590
Back to the Future (1985) 2583
Silence of the Lambs, The (1991) 2578
Time taken: 125.833 seconds, Fetched: 10 row(s)
3、分別求男性,女性當中評分最高的10部電影(性別,電影名,影評分)
分析:
表: users movies ratings
要求的欄位:gender title avg(rating)
select a.gender,a.title,avg(c.rating) avgs,count(c.rating) counts
from ratings c
join users a on c.userid = a.userid
join movies b on c.movieid = b.movieid
where a.gender = 'F'
group by b.movieid,b.title
order by avgs desc
limit 10
;
select collect_set(a.gender),collect_set(b.title),avg(c.rating) avgs,count(c.rating) counts
from ratings c
join users a on c.userid = a.userid
join movies b on c.movieid = b.movieid
where a.gender = 'F'
group by b.movieid,b.title
having counts >= 60
order by avgs desc
limit 10
;
["F"] ["Close Shave, A (1995)"] 4.644444444444445 180
["F"] ["Wrong Trousers, The (1993)"] 4.588235294117647 238
["F"] ["Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)"] 4.572649572649572 117
["F"] ["Wallace & Gromit"] 4.563106796116505 103
["F"] ["Schindler's List (1993)"] 4.56260162601626 615
["F"] ["Shawshank Redemption, The (1994)"] 4.539074960127592 627
["F"] ["Grand Day Out, A (1992)"] 4.537878787878788 132
["F"] ["To Kill a Mockingbird (1962)"] 4.536666666666667 300
["F"] ["Creature Comforts (1990)"] 4.513888888888889 72
["F"] ["Usual Suspects, The (1995)"] 4.513317191283293 413
結果是集合,改進:
select collect_set(a.gender)[0],collect_set(b.title)[0],avg(c.rating) avgs,count(c.rating) counts
from ratings c
join users a on c.userid = a.userid
join movies b on c.movieid = b.movieid
where a.gender = 'F'
group by b.movieid,b.title
having counts >= 60
order by avgs desc
limit 10
;
F Close Shave, A (1995) 4.644444444444445 180
F Wrong Trousers, The (1993) 4.588235294117647 238
F Sunset Blvd. (a.k.a. Sunset Boulevard) (1950) 4.572649572649572 117
F Wallace & Gromit 4.563106796116505 103
F Schindler's List (1993) 4.56260162601626 615
F Shawshank Redemption, The (1994) 4.539074960127592 627
F Grand Day Out, A (1992) 4.537878787878788 132
F To Kill a Mockingbird (1962) 4.536666666666667 300
F Creature Comforts (1990) 4.513888888888889 72
F Usual Suspects, The (1995) 4.513317191283293 413
select collect_set(a.gender)[0],collect_set(b.title)[0],avg(c.rating) avgs,count(c.rating) counts
from ratings c
join users a on c.userid = a.userid
join movies b on c.movieid = b.movieid
where a.gender = 'M'
group by b.movieid,b.title
having counts >= 60
order by avgs desc
limit 10
;
M Sanjuro (1962) 4.639344262295082 61
M Godfather, The (1972) 4.583333333333333 1740
M Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) 4.576628352490421 522
M Shawshank Redemption, The (1994) 4.560625 1600
M Raiders of the Lost Ark (1981) 4.520597322348094 1942
M Usual Suspects, The (1995) 4.518248175182482 1370
M Star Wars 4.495307167235495 2344
M Schindler's List (1993) 4.49141503848431 1689
M Paths of Glory (1957) 4.485148514851486 202
M Wrong Trousers, The (1993) 4.478260869565218 644
把他們拼接起來:
select x.* from(
select collect_set(a.gender)[0],collect_set(b.title)[0],avg(c.rating) avgs
from ratings c
join users a on c.userid = a.userid
join movies b on c.movieid = b.movieid
where a.gender = 'F'
group by b.movieid,b.title
having count(c.rating) >= 60
order by avgs desc
limit 10)x
union all
select y.* from(
select collect_set(a.gender)[0],collect_set(b.title)[0],avg(c.rating) avgs
from ratings c
join users a on c.userid = a.userid
join movies b on c.movieid = b.movieid
where a.gender = 'M'
group by b.movieid,b.title
having count(c.rating) >= 60
order by avgs desc
limit 10)y;
M Sanjuro (1962) 4.639344262295082
M Godfather, The (1972) 4.583333333333333
M Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) 4.576628352490421
M Shawshank Redemption, The (1994) 4.560625
M Raiders of the Lost Ark (1981) 4.520597322348094
M Usual Suspects, The (1995) 4.518248175182482
M Star Wars 4.495307167235495
M Schindler's List (1993) 4.49141503848431
M Paths of Glory (1957) 4.485148514851486
M Wrong Trousers, The (1993) 4.478260869565218
F Close Shave, A (1995) 4.644444444444445
F Wrong Trousers, The (1993) 4.588235294117647
F Sunset Blvd. (a.k.a. Sunset Boulevard) (1950) 4.572649572649572
F Wallace & Gromit 4.563106796116505
F Schindler's List (1993) 4.56260162601626
F Shawshank Redemption, The (1994) 4.539074960127592
F Grand Day Out, A (1992) 4.537878787878788
F To Kill a Mockingbird (1962) 4.536666666666667
F Creature Comforts (1990) 4.513888888888889
F Usual Suspects, The (1995) 4.513317191283293
4、求movieid = 2116這部電影各年齡段(因為年齡就只有7個,就按這個7個分就好了)的平均影評(年齡段,影評分)
分析:
表: users ratings
要求欄位: age avg(rating)
select u.age,avg(r.rating)
from ratings r
join users u on u.userid = r.userid
where r.movieid = 2116
group by u.age
order by u.age;
1 3.2941176470588234
18 3.3580246913580245
25 3.436548223350254
35 3.2278481012658227
45 2.8275862068965516
50 3.32
56 3.5
5、求最喜歡看電影(影評次數最多)的那位女性評最高分的10部電影的平均影評分(觀影者,電影名,影評分)
(1)求最喜歡看電影(影評次數最多)的那位女性
select a.uid from
(select uid ,count(*)c from film_view where sex='F' group by uid
order by c desc limit 1)a;
+--------+
| a.uid |
+--------+
| 1150 |
+--------+
(2)求那位女性評最高分的10部電影
select u.uid,r.title,r.rating from film_view r
join
(select a.uid from
(select uid ,count(*)c from film_view where sex='F' group by uid
order by c desc limit 1)a)u
on r.uid = u.uid
order by r.rating desc limit 10;
改寫為:
select a.uid,r.title,r.rating from film_view r
join
(select uid ,count(*)c from film_view where sex='F' group by uid
order by c desc limit 1)a
on r.uid = a.uid
order by r.rating desc limit 10;
+--------+----------------------------------------------------+-----------+
| u.uid | r.title | r.rating |
+--------+----------------------------------------------------+-----------+
| 1150 | Close Shave, A (1995) | 5.0 |
| 1150 | Night on Earth (1991) | 5.0 |
| 1150 | Trust (1990) | 5.0 |
| 1150 | Rear Window (1954) | 5.0 |
| 1150 | Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) | 5.0 |
| 1150 | Being John Malkovich (1999) | 5.0 |
| 1150 | Roger & Me (1989) | 5.0 |
| 1150 | It Happened One Night (1934) | 5.0 |
| 1150 | Crying Game, The (1992) | 5.0 |
| 1150 | Duck Soup (1933) | 5.0 |
+--------+----------------------------------------------------+-----------+
(3)求10部電影的平均影評分(觀影者,電影名,影評分)
—大表連小表用時:188s
select aa.uid,bb.* from
(select f.title,avg(f.rating)avgrate from film_view f
group by f.title)bb
join
(select u.uid,r.title,r.rating from film_view r
join
(select a.uid from
(select uid ,count(*)c from film_view where sex='F' group by uid
order by c desc limit 1)a)u
on r.uid = u.uid
order by r.rating desc limit 10)aa
on aa.title = bb.title;
+---------+----------------------------------------------------+---------------------+
| aa.uid | bb.title | bb.avgrate |
+---------+----------------------------------------------------+---------------------+
| 1150 | Being John Malkovich (1999) | 4.125390450691656 |
| 1150 | Close Shave, A (1995) | 4.52054794520548 |
| 1150 | Crying Game, The (1992) | 3.7314890154597236 |
| 1150 | Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) | 4.4498902706656915 |
| 1150 | Duck Soup (1933) | 4.21043771043771 |
| 1150 | It Happened One Night (1934) | 4.280748663101604 |
| 1150 | Night on Earth (1991) | 3.747422680412371 |
| 1150 | Rear Window (1954) | 4.476190476190476 |
| 1150 | Roger & Me (1989) | 4.0739348370927315 |
| 1150 | Trust (1990) | 4.188888888888889 |
+---------+----------------------------------------------------+---------------------+
---小表連大表用時:236s結果一致
select aa.uid,bb.* from
(select u.uid,r.title,r.rating from film_view r
join
(select a.uid from
(select uid ,count(*)c from film_view where sex='F' group by uid
order by c desc limit 1)a)u
on r.uid = u.uid
order by r.rating desc limit 10)aa
join
(select f.title,avg(f.rating)avgrate from film_view f
group by f.title)bb
on aa.title = bb.title;
6、求好片(評分>=4.0)最多的那個年份的最好看的10部電影
(1)獲取電影年份欄位,在電影名字的後6位是年份
select mid,title,substring(title,-5,4)year from movies limit 5;
+------+-------+
| mid | _c1 |
+------+-------+
| 1 | 1995 |
| 2 | 1995 |
| 3 | 1995 |
| 4 | 1995 |
| 5 | 1995 |
(2)組合movies和ratings表
create view moive_6_v as
select r.rating,m.* from ratings r
join
(select mid,title,substring(title,-5,4)year from movies)m
on r.mid = m.mid
limit 5;
+-----------+--------+-----------------------------------------+---------+
| r.rating | m.mid | m.title | m.year |
+-----------+--------+-----------------------------------------+---------+
| 5.0 | 1193 | One Flew Over the Cuckoo's Nest (1975) | 1975 |
| 3.0 | 661 | James and the Giant Peach (1996) | 1996 |
| 3.0 | 914 | My Fair Lady (1964) | 1964 |
| 4.0 | 3408 | Erin Brockovich (2000) | 2000 |
| 5.0 | 2355 | Bug's Life, A (1998) | 1998 |
+-----------+--------+-----------------------------------------+---------+
(3)獲取評分大於4的最多的那個年份
create view moive_6_v_a as
select f.year,f.title,avg(f.rating) avgr from moive_6_v f
group by f.year,f.title;
select m.year,count(*)n from moive_6_v_a m
where m.avgr >= 4
group by m.year
order by n desc
limit 5;
+---------+-----+
| m.year | n |
+---------+-----+
| 1998 | 27 |
| 1995 | 25 |
| 1996 | 24 |
| 1999 | 20 |
| 1994 | 20 |
+---------+-----+
(4)求那個年份的最好看的10部電影
select rr.title,rr.year,rr.avgrate,rr.cc from
(select mm.title,mm.year,avg(rating)avgrate,count(*)cc
from
(select r.rating,m.* from ratings r
join
(select mid,title,substring(title,-5,4)year from movies)m
on r.mid = m.mid)mm
group by mm.year,mm.title having cc >=50
order by avgrate desc)rr
join
(select m.year,count(*)n from moive_6_v_a m
where m.avgr >= 4
group by m.year
order by n desc
limit 1)yy
on rr.year = yy.year
limit 10;
+---------------------------------------------+----------+---------------------+--------+
| rr.title | rr.year | rr.avgrate | rr.cc |
+---------------------------------------------+----------+---------------------+--------+
| Saving Private Ryan (1998) | 1998 | 4.337353938937053 | 2653 |
| Celebration, The (Festen) (1998) | 1998 | 4.3076923076923075 | 117 |
| Central Station (Central do Brasil) (1998) | 1998 | 4.283720930232558 | 215 |
| 42 Up (1998) | 1998 | 4.2272727272727275 | 88 |
| American History X (1998) | 1998 | 4.2265625 | 640 |
| Run Lola Run (Lola rennt) (1998) | 1998 | 4.224813432835821 | 1072 |
| Shakespeare in Love (1998) | 1998 | 4.127479949345715 | 2369 |
| After Life (1998) | 1998 | 4.088235294117647 | 102 |
| Get Real (1998) | 1998 | 4.088235294117647 | 68 |
| Elizabeth (1998) | 1998 | 4.029850746268656 | 938 |
+---------------------------------------------+----------+---------------------+--------+
7、求1997年上映的電影中,評分最高的10部Comedy類電影
(1)求1997年上映的電影
select title,rating,genres from film_view
where substring(title,-5,4)=1997
limit 10;
(2)求1997年上映的電影Comedy類電影
select title,rating,genres from film_view
where substring(title,-5,4)=1997 and
(lcase(genres) like '%comedy%')
limit 10;
+---------------------------------------+---------+------------------------------------------------+
| title | rating | genres |
+---------------------------------------+---------+------------------------------------------------+
| Hercules (1997) | 4.0 | Adventure|Animation|Children's|Comedy|Musical |
| As Good As It Gets (1997) | 5.0 | Comedy|Drama |
| Full Monty, The (1997) | 2.0 | Comedy |
| Beverly Hills Ninja (1997) | 3.0 | Action|Comedy |
| Men in Black (1997) | 3.0 | Action|Adventure|Comedy|Sci-Fi |
| Liar Liar (1997) | 3.0 | Comedy |
| Love and Death on Long Island (1997) | 3.0 | Comedy|Drama |
| Grosse Pointe Blank (1997) | 3.0 | Comedy|Crime |
| Men in Black (1997) | 4.0 | Action|Adventure|Comedy|Sci-Fi |
| Billy's Hollywood Screen Kiss (1997) | 4.0 | Comedy|Romance |
+---------------------------------------+---------+------------------------------------------------+
(3)評分最高的10部
select mm.* ,f.genres from
(select m.title,avg(m.rating)avgrate,count(*)cc from
(select title,rating,genres from film_view
where substring(title,-5,4)=1997 and
(lcase(genres) like '%comedy%'))m
group by m.title having cc >= 50
order by avgrate desc
limit 10)mm
join movies f
on mm.title = f.title;
+----------------------------------------------------+---------------------+--------+---------------------------------+
| mm.title | mm.avgrate | mm.cc | f.genres |
+----------------------------------------------------+---------------------+--------+---------------------------------+
| Life Is Beautiful (La Vita � bella) (1997) | 4.329861111111111 | 1152 | Comedy|Drama |
| Big One, The (1997) | 4.0 | 102 | Comedy|Documentary |
| As Good As It Gets (1997) | 3.9501404494382024 | 1424 | Comedy|Drama |
| Full Monty, The (1997) | 3.872393661384487 | 1199 | Comedy |
| My Life in Pink (Ma vie en rose) (1997) | 3.825870646766169 | 201 | Comedy|Drama |
| Grosse Pointe Blank (1997) | 3.813380281690141 | 1136 | Comedy|Crime |
| Men in Black (1997) | 3.739952718676123 | 2538 | Action|Adventure|Comedy|Sci-Fi |
| Austin Powers: International Man of Mystery (1997) | 3.7103734439834026 | 1205 | Comedy |
| Billy's Hollywood Screen Kiss (1997) | 3.6710526315789473 | 76 | Comedy|Romance |
| Liar Liar (1997) | 3.5 | 666 | Comedy |
+----------------------------------------------------+---------------------+--------+---------------------------------+
8、該影評庫中各種型別電影中評價最高的5部電影(型別,電影名,平均影評分)
難點:每個型別取5個
(1)將電影型別裂變
建立新的movies資料表;
create table newmovies(mid int, title string,genres array<string>)row format
delimited fields terminated by '\t' collection items terminated by ','stored as textfile;
將資料插入
insert into table newmovies select mid,title,split(genres,'\\|') from movies;
裂變:
create table nnmovies(mid int, title string, genres string)row
format delimited fields terminated by '\t';
insert into table nnmovies select mid, title, tpf.key from newmovies t
lateral view explode(t.genres) tpf as key;
(map裂變:select id,name, tpf.mykey as key, tpf.myvalue as value
from cdt t lateral view explode(t.piaofang) tpf as mykey, myvalue;)
(2)拼接形成檢視
create view film_view3 as
(select r.*,m.title,m.genres
from ratings r
join nnmovies m on r.mid = m.mid);
(3)各種型別電影中評價最高的5部電影(型別,電影名,平均影評分)
<1>建立檢視,電影按照型別平均分分類
create view movie_rate as select a.mid,a.title,a.genres,avg(rating)rate
from film_view3 a group by a.genres,a.mid,a.title;
<2>使用row_number函式將每個型別新增序號
create view movie_rate_order as
select t.*,row_number() over (distribute by genres sort by rate desc) rn
from movie_rate t order by t.genres,t.rate desc;
<3>通過每組的序號,取出前5(選擇10個結果顯示)
select m.* from movie_rate_order m where rn <6
order by m.genres,m.rate desc limit 10;
+--------+----------------------------------------------------+------------+--------------------+-------+
| m.mid | m.title | m.genres | m.rate | m.rn |
+--------+----------------------------------------------------+------------+--------------------+-------+
| 2905 | Sanjuro (1962) | Action | 4.608695652173913 | 1 |
| 2019 | Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) | Action | 4.560509554140127 | 2 |
| 858 | Godfather, The (1972) | Action | 4.524966261808367 | 3 |
| 1198 | Raiders of the Lost Ark (1981) | Action | 4.477724741447892 | 4 |
| 260 | Star Wars: Episode IV - A New Hope (1977) | Action | 4.453694416583082 | 5 |
| 3172 | Ulysses (Ulisse) (1954) | Adventure | 5.0 | 1 |
| 2905 | Sanjuro (1962) | Adventure | 4.608695652173913 | 2 |
| 1198 | Raiders of the Lost Ark (1981) | Adventure | 4.477724741447892 | 3 |
| 260 | Star Wars: Episode IV - A New Hope (1977) | Adventure | 4.453694416583082 | 4 |
| 1204 | Lawrence of Arabia (1962) | Adventure | 4.401925391095066 | 5 |
+--------+----------------------------------------------------+------------+--------------------+-------+
9、各年評分最高的電影型別(年份,型別,影評分)
(1)新建帶年份、型別檢視
create view movie_y_g as
(select r.*,m.title,m.genres,substring(m.title,-5,4)year
from ratings r
join nnmovies m on r.mid = m.mid);
(2)建立評分檢視
create view movie_y_g_r as
select m.year,m.genres,avg(m.rating)rate,count(*)cc from movie_y_g m
group by m.year,m.genres having cc >= 50
order by m.year,rate desc;
(3)給不同年份不同型別電影加row_number
create view movie_y_g_r_l as
select f.*,row_number() over(distribute by genres sort by rate desc)rn
from movie_y_g_r f order by f.genres,f.rate desc;
(4)取每組的第一值
select mm.* from movie_y_g_r_l mm
where mm.rn < 2
order by mm.year;
+----------+--------------+---------------------+--------+--------+
| mm.year | mm.genres | mm.rate | mm.cc | mm.rn |
+----------+--------------+---------------------+--------+--------+
| 1927 | Comedy | 4.368932038834951 | 206 | 1 |
| 1931 | Drama | 4.387453874538745 | 271 | 1 |
| 1939 | Children's | 4.182008368200837 | 1912 | 1 |
| 1941 | Film-Noir | 4.395973154362416 | 1043 | 1 |
| 1942 | Romance | 4.412822049131217 | 1669 | 1 |
| 1949 | Mystery | 4.452083333333333 | 480 | 1 |
| 1949 | Thriller | 4.452083333333333 | 480 | 1 |
| 1952 | Musical | 4.2836218375499335 | 751 | 1 |
| 1961 | Western | 4.404651162790698 | 215 | 1 |
| 1962 | Adventure | 4.3997821350762525 | 918 | 1 |
| 1963 | Sci-Fi | 4.334664005322688 | 1503 | 1 |
| 1963 | War | 4.425109064469219 | 2063 | 1 |
| 1972 | Crime | 4.4660907127429805 | 2315 | 1 |
| 1974 | Horror | 4.021985343104597 | 1501 | 1 |
| 1977 | Fantasy | 4.453694416583082 | 2991 | 1 |
| 1977 | Action | 4.303571428571429 | 3584 | 1 |
| 1981 | Documentary | 4.274193548387097 | 62 | 1 |
| 1993 | Animation | 4.0367534456355285 | 1306 | 1 |
+----------+--------------+---------------------+--------+--------+
10、每個地區(郵政編碼)最高評分的電影名,把結果存入HDFS(地區,電影名,影評分)
(1)內連線ratings表、user表和movies表並且建立檢視,作為備用
create view film_view2 as
(select r.*,u.zcode,m.title,m.genres
from ratings r
join users u on r.uid = u.uid
join movies m on r.mid = m.mid);
(2) 按地區、電影名求平均分
create view movie_z_r as
select m.zcode,m.title,avg(m.rating)rate,count(*)cc
from film_view2 m
group by m.zcode,m.title having cc >= 5
order by m.zcode,rate desc;
(3)新增序號
create view movie_z_r_l as
select f.*,row_number() over(distribute by zcode sort by rate desc)rn
from movie_z_r f
order by f.zcode,f.rate desc;
(4)取最高值
create view movie_z_r_l_m as
select * from movie_z_r_l
where rn < 2
order by zcode;
+----------------------+--------------------------------------------+---------------------+-------------------+-------------------+
| movie_z_r_l_m.zcode | movie_z_r_l_m.title | movie_z_r_l_m.rate | movie_z_r_l_m.cc | movie_z_r_l_m.rn |
+----------------------+--------------------------------------------+---------------------+-------------------+-------------------+
| 01002 | Star Wars: Episode IV - A New Hope (1977) | 4.4 | 5 | 1 |
| 01060 | American Beauty (1999) | 4.8 | 5 | 1 |
| 02115 | Shawshank Redemption, The (1994) | 4.8 | 5 | 1 |
| 02134 | Star Wars: Episode IV - A New Hope (1977) | 4.6 | 5 | 1 |
| 02135 | Princess Bride, The (1987) | 4.6 | 5 | 1 |
+----------------------+--------------------------------------------+---------------------+-------------------+-------------------+
(5)將結果存入HDFS
insert directory '/movie/' select * from movie_z_r_l_m;