用pandas分析百萬電影資料
阿新 • • 發佈:2019-02-12
用pandas分析電影資料
Lift is short, use Python.
用Python做資料分析,pandas是Python資料分析的重要包,其他重要的包:numpy、matplotlib .
安裝pandas(Linux, Mac, Windows皆同):
pip install pandas
下載資料檔案解壓,包含如下4個檔案:
- users.dat 使用者資料
- movies.dat 電影資料
- ratings.dat 評分資料
- README 檔案解釋
檢視README檔案,可知源資料檔案的格式:
- users.dat (UserID::Gender::Age::Occupation::Zip-code)
- movies.dat (MovieID::Title::Genres)
- ratings.dat (UserID::MovieID::Rating::Timestamp)
特別解釋:Occupation使用者職業,Zip-code郵編, Timestamp時間戳, Genres電影型別(更多解釋可以檢視README檔案).
檔案中各每條資料的分割符是 ::
環境:
- OS:Windows
- Language:Python3.4
- 編輯器:Jupyter
用pandas讀取資料.
匯入必要的標頭檔案:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
讀取資料,先定義欄位名,因為源資料中無欄位名,只有用’::’分割的每條資料.
user_names = ['user_id', 'gender', 'age', 'occupation', 'zip'] #使用者表的資料欄位名
讀取資料,注意原始檔的地址.
users = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\users.dat', sep='::', header=None, names=user_names)
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
if __name__ == '__main__':
上面有個警告,可以不管,即:載入資料是用的python engine 而不是 c engine.(更多請google)
檢視有多少個數據.
前5行資料.
print(len(users))
users.head()
6040
user_id | gender | age | occupation | zip | |
---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 |
1 | 2 | M | 56 | 16 | 70072 |
2 | 3 | M | 25 | 15 | 55117 |
3 | 4 | M | 45 | 7 | 02460 |
4 | 5 | M | 25 | 20 | 55455 |
同理將movies,ratings資料讀進來.
ratings_names = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\ratings.dat', sep='::', header=None, names=ratings_names)
movies_names = ['movie_id', 'title', 'genres']
movies = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\movies.dat', sep='::', header=None, names=movies_names)
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
from ipykernel import kernelapp as app
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
載入資料需要一點點時間,應為資料有上百萬條.
檢視ratings表,movies表.
print(len(ratings))
ratings.head()
1000209
user_id | movie_id | rating | timestamp | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 |
1 | 1 | 661 | 3 | 978302109 |
2 | 1 | 914 | 3 | 978301968 |
3 | 1 | 3408 | 4 | 978300275 |
4 | 1 | 2355 | 5 | 978824291 |
print(len(movies))
movies.head()
3883
movie_id | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Animation|Children’s|Comedy |
1 | 2 | Jumanji (1995) | Adventure|Children’s|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
電影的評分的資料有1百萬多個.
將3個表合併為一個表data .
data = pd.merge(pd.merge(users, ratings), movies)
print(len(data))
data.head()
1000209
user_id | gender | age | occupation | zip | movie_id | rating | timestamp | title | genres | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 | 1193 | 5 | 978300760 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
1 | 2 | M | 56 | 16 | 70072 | 1193 | 5 | 978298413 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
2 | 12 | M | 25 | 12 | 32793 | 1193 | 4 | 978220179 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
3 | 15 | M | 25 | 7 | 22903 | 1193 | 4 | 978199279 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
4 | 17 | M | 50 | 1 | 95350 | 1193 | 5 | 978158471 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
檢視使用者id為1,對所有電影的評分.
data[data.user_id==1]
user_id | gender | age | occupation | zip | movie_id | rating | timestamp | title | genres | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 | 1193 | 5 | 978300760 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
1725 | 1 | F | 1 | 10 | 48067 | 661 | 3 | 978302109 | James and the Giant Peach (1996) | Animation|Children’s|Musical |
2250 | 1 | F | 1 | 10 | 48067 | 914 | 3 | 978301968 | My Fair Lady (1964) | Musical|Romance |
2886 | 1 | F | 1 | 10 | 48067 | 3408 | 4 | 978300275 | Erin Brockovich (2000) | Drama |
4201 | 1 | F | 1 | 10 | 48067 | 2355 | 5 | 978824291 | Bug’s Life, A (1998) | Animation|Children’s|Comedy |
5904 | 1 | F | 1 | 10 | 48067 | 1197 | 3 | 978302268 | Princess Bride, The (1987) | Action|Adventure|Comedy|Romance |
8222 | 1 | F | 1 | 10 | 48067 | 1287 | 5 | 978302039 | Ben-Hur (1959) | Action|Adventure|Drama |
8926 | 1 | F | 1 | 10 | 48067 | 2804 | 5 | 978300719 | Christmas Story, A (1983) | Comedy|Drama |
10278 | 1 | F | 1 | 10 | 48067 | 594 | 4 | 978302268 | Snow White and the Seven Dwarfs (1937) | Animation|Children’s|Musical |
11041 | 1 | F | 1 | 10 | 48067 | 919 | 4 | 978301368 | Wizard of Oz, The (1939) | Adventure|Children’s|Drama|Musical |
12759 | 1 | F | 1 | 10 | 48067 | 595 | 5 | 978824268 | Beauty and the Beast (1991) | Animation|Children’s|Musical |
13819 | 1 | F | 1 | 10 | 48067 | 938 | 4 | 978301752 | Gigi (1958) | Musical |
14006 | 1 | F | 1 | 10 | 48067 | 2398 | 4 | 978302281 | Miracle on 34th Street (1947) | Drama |
14386 | 1 | F | 1 | 10 | 48067 | 2918 | 4 | 978302124 | Ferris Bueller’s Day Off (1986) | Comedy |
15859 | 1 | F | 1 | 10 | 48067 | 1035 | 5 | 978301753 | Sound of Music, The (1965) | Musical |
16741 | 1 | F | 1 | 10 | 48067 | 2791 | 4 | 978302188 | Airplane! (1980) | Comedy |
18472 | 1 | F | 1 | 10 | 48067 | 2687 | 3 | 978824268 | Tarzan (1999) | Animation|Children’s |
18914 | 1 | F | 1 | 10 | 48067 | 2018 | 4 | 978301777 | Bambi (1942) | Animation|Children’s |
19503 | 1 | F | 1 | 10 | 48067 | 3105 | 5 | 978301713 | Awakenings (1990) | Drama |
20183 | 1 | F | 1 | 10 | 48067 | 2797 | 4 | 978302039 | Big (1988) | Comedy|Fantasy |
21674 | 1 | F | 1 | 10 | 48067 | 2321 | 3 | 978302205 | Pleasantville (1998) | Comedy |
22832 | 1 | F | 1 | 10 | 48067 | 720 | 3 | 978300760 | Wallace & Gromit: The Best of Aardman Animatio… | Animation |
23270 | 1 | F | 1 | 10 | 48067 | 1270 | 5 | 978300055 | Back to the Future (1985) | Comedy|Sci-Fi |
25853 | 1 | F | 1 | 10 | 48067 | 527 | 5 | 978824195 | Schindler’s List (1993) | Drama|War |
28157 | 1 | F | 1 | 10 | 48067 | 2340 | 3 | 978300103 | Meet Joe Black (1998) | Romance |
28501 | 1 | F | 1 | 10 | 48067 | 48 | 5 | 978824351 | Pocahontas (1995) | Animation|Children’s|Musical|Romance |
28883 | 1 | F | 1 | 10 | 48067 | 1097 | 4 | 978301953 | E.T. the Extra-Terrestrial (1982) | Children’s|Drama|Fantasy|Sci-Fi |
31152 | 1 | F | 1 | 10 | 48067 | 1721 | 4 | 978300055 | Titanic (1997) | Drama|Romance |
32698 | 1 | F | 1 | 10 | 48067 | 1545 | 4 | 978824139 | Ponette (1996) | Drama |
32771 | 1 | F | 1 | 10 | 48067 | 745 | 3 | 978824268 | Close Shave, A (1995) | Animation|Comedy|Thriller |
33428 | 1 | F | 1 | 10 | 48067 | 2294 | 4 | 978824291 | Antz (1998) | Animation|Children’s |
34073 | 1 | F | 1 | 10 | 48067 | 3186 | 4 | 978300019 | Girl, Interrupted (1999) | Drama |
34504 | 1 | F | 1 | 10 | 48067 | 1566 | 4 | 978824330 | Hercules (1997) | Adventure|Animation|Children’s|Comedy|Musical |
34973 | 1 | F | 1 | 10 | 48067 | 588 | 4 | 978824268 | Aladdin (1992) | Animation|Children’s|Comedy|Musical |
36324 | 1 | F | 1 | 10 | 48067 | 1907 | 4 | 978824330 | Mulan (1998) | Animation|Children’s |
36814 | 1 | F | 1 | 10 | 48067 | 783 | 4 | 978824291 | Hunchback of Notre Dame, The (1996) | Animation|Children’s|Musical |
37204 | 1 | F | 1 | 10 | 48067 | 1836 | 5 | 978300172 | Last Days of Disco, The (1998) | Drama |
37339 | 1 | F | 1 | 10 | 48067 | 1022 | 5 | 978300055 | Cinderella (1950) | Animation|Children’s|Musical |
37916 | 1 | F | 1 | 10 | 48067 | 2762 | 4 | 978302091 | Sixth Sense, The (1999) | Thriller |
40375 | 1 | F | 1 | 10 | 48067 | 150 | 5 | 978301777 | Apollo 13 (1995) | Drama |
41626 | 1 | F | 1 | 10 | 48067 | 1 | 5 | 978824268 | Toy Story (1995) | Animation|Children’s|Comedy |
43703 | 1 | F | 1 | 10 | 48067 | 1961 | 5 | 978301590 | Rain Man (1988) | Drama |
45033 | 1 | F | 1 | 10 | 48067 | 1962 | 4 | 978301753 | Driving Miss Daisy (1989) | Drama |
45685 | 1 | F | 1 | 10 | 48067 | 2692 | 4 | 978301570 | Run Lola Run (Lola rennt) (1998) | Action|Crime|Romance |
46757 | 1 | F | 1 | 10 | 48067 | 260 | 4 | 978300760 | Star Wars: Episode IV - A New Hope (1977) | Action|Adventure|Fantasy|Sci-Fi |
49748 | 1 | F | 1 | 10 | 48067 | 1028 | 5 | 978301777 | Mary Poppins (1964) | Children’s|Comedy|Musical |
50759 | 1 | F | 1 | 10 | 48067 | 1029 | 5 | 978302205 | Dumbo (1941) | Animation|Children’s|Musical |
51327 | 1 | F | 1 | 10 | 48067 | 1207 | 4 | 978300719 | To Kill a Mockingbird (1962) | Drama |
52255 | 1 | F | 1 | 10 | 48067 | 2028 | 5 | 978301619 | Saving Private Ryan (1998) | Action|Drama|War |
54908 | 1 | F | 1 | 10 | 48067 | 531 | 4 | 978302149 | Secret Garden, The (1993) | Children’s|Drama |
55246 | 1 | F | 1 | 10 | 48067 | 3114 | 4 | 978302174 | Toy Story 2 (1999) | Animation|Children’s|Comedy |
56831 | 1 | F | 1 | 10 | 48067 | 608 | 4 | 978301398 | Fargo (1996) | Crime|Drama|Thriller |
59344 | 1 | F | 1 | 10 | 48067 | 1246 | 4 | 978302091 | Dead Poets Society (1989) | Drama |
不同性別對不同電影的平均評分.
mean_ratings_by_gender = data.pivot_table(values='rating',index='title',columns='gender', aggfunc='mean')
mean_ratings_by_gender.head(10)#檢視前10條資料
gender | F | M |
---|---|---|
title | ||
$1,000,000 Duck (1971) | 3.375000 | 2.761905 |
‘Night Mother (1986) | 3.388889 | 3.352941 |
‘Til There Was You (1997) | 2.675676 | 2.733333 |
‘burbs, The (1989) | 2.793478 | 2.962085 |
…And Justice for All (1979) | 3.828571 | 3.689024 |
1-900 (1994) | 2.000000 | 3.000000 |
10 Things I Hate About You (1999) | 3.646552 | 3.311966 |
101 Dalmatians (1961) | 3.791444 | 3.500000 |
101 Dalmatians (1996) | 3.240000 | 2.911215 |
12 Angry Men (1957) | 4.184397 | 4.328421 |
mean_ratings_by_gender增加一列,男女的平均評分差.
mean_ratings_by_gender['diff'] = mean_ratings_by_gender.F - mean_ratings_by_gender.M
mean_ratings_by_gender.head()
gender | F | M | diff |
---|---|---|---|
title | |||
$1,000,000 Duck (1971) | 3.375000 | 2.761905 | 0.613095 |
‘Night Mother (1986) | 3.388889 | 3.352941 | 0.035948 |
‘Til There Was You (1997) | 2.675676 | 2.733333 | -0.057658 |
‘burbs, The (1989) | 2.793478 | 2.962085 | -0.168607 |
…And Justice for All (1979) | 3.828571 | 3.689024 | 0.139547 |
哪些電影是男女評分差異最大的(男性評分高女生評分低,女性高男性低).
mean_ratings_by_gender.sort_values(by='diff',ascending=True).head()
#男高女低
gender | F | M | diff |
---|---|---|---|
title | |||
Tigrero: A Film That Was Never Made (1994) | 1.0 | 4.333333 | -3.333333 |
Neon Bible, The (1995) | 1.0 | 4.000000 | -3.000000 |
Enfer, L’ (1994) | 1.0 | 3.750000 | -2.750000 |
Stalingrad (1993) | 1.0 | 3.593750 | -2.593750 |
Killer: A Journal of Murder (1995) | 1.0 | 3.428571 | -2.428571 |
mean_ratings_by_gender.sort_values(by='diff',ascending=False).head()
#女高男低
gender | F | M | diff |
---|---|---|---|
title | |||
James Dean Story, The (1957) | 4.000000 | 1.000000 | 3.000000 |
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919) | 4.000000 | 1.000000 | 3.000000 |
Country Life (1994) | 5.000000 | 2.000000 | 3.000000 |
Babyfever (1994) | 3.666667 | 1.000000 | 2.666667 |
Woman of Paris, A (1923) | 5.000000 | 2.428571 | 2.571429 |
不同電影的評分次數.
total_rating_by_title = data.groupby('title').size()
total_rating_by_title #第一列是電影標題,第二列是評分次數
title
$1,000,000 Duck (1971) 37
'Night Mother (1986) 70
'Til There Was You (1997) 52
'burbs, The (1989) 303
...And Justice for All (1979) 199
1-900 (1994) 2
10 Things I Hate About You (1999) 700
101 Dalmatians (1961) 565
101 Dalmatians (1996) 364
12 Angry Men (1957) 616
13th Warrior, The (1999) 750
187 (1997) 55
2 Days in the Valley (1996) 286
20 Dates (1998) 139
20,000 Leagues Under the Sea (1954) 575
200 Cigarettes (1999) 181
2001: A Space Odyssey (1968) 1716
2010 (1984) 470
24 7: Twenty Four Seven (1997) 5
24-hour Woman (1998) 9
28 Days (2000) 505
3 Ninjas: High Noon On Mega Mountain (1998) 47
3 Strikes (2000) 4
301, 302 (1995) 9
39 Steps, The (1935) 253
400 Blows, The (Les Quatre cents coups) (1959) 187
42 Up (1998) 88
52 Pick-Up (1986) 140
54 (1998) 259
7th Voyage of Sinbad, The (1958) 258
...
Wrongfully Accused (1998) 123
Wyatt Earp (1994) 270
X-Files: Fight the Future, The (1998) 996
X-Men (2000) 1511
X: The Unknown (1956) 12
Xiu Xiu: The Sent-Down Girl (Tian yu) (1998) 69
Yankee Zulu (1994) 2
Yards, The (1999) 77
Year My Voice Broke, The (1987) 27
Year of Living Dangerously (1982) 391
Year of the Horse (1997) 4
Yellow Submarine (1968) 399
Yojimbo (1961) 215
You Can't Take It With You (1938) 77
You So Crazy (1994) 13
You've Got Mail (1998) 838
Young Doctors in Love (1982) 79
Young Frankenstein (1974) 1193
Young Guns (1988) 562
Young Guns II (1990) 369
Young Poisoner's Handbook, The (1995) 79
Young Sherlock Holmes (1985) 379
Young and Innocent (1937) 10
Your Friends and Neighbors (1998) 109
Zachariah (1971) 2
Zed & Two Noughts, A (1985) 29
Zero Effect (1998) 301
Zero Kelvin (Kj鎟lighetens kj鴗ere) (1995) 2
Zeus and Roxanne (1997) 23
eXistenZ (1999) 410
dtype: int64
評分次數最多的10部電影.
top_10_total_rating = total_rating_by_title.sort_values(ascending=False).head(10)
top_10_total_rating
title
American Beauty (1999) 3428
Star Wars: Episode IV - A New Hope (1977) 2991
Star Wars: Episode V - The Empire Strikes Back (1980) 2990
Star Wars: Episode VI - Return of the Jedi (1983) 2883
Jurassic Park (1993) 2672
Saving Private Ryan (1998) 2653
Terminator 2: Judgment Day (1991) 2649
Matrix, The (1999) 2590
Back to the Future (1985) 2583
Silence of the Lambs, The (1991) 2578
dtype: int64
可以看出,評分次數最多的電影一般是我們比較熟知的電影,一般可認為是熱門電影.
再來看看評分最高的10大電影(注:最高分為5.0)
mean_ratings_by_title = data.pivot_table(values='rating',index='title',aggfunc='mean')
top_10_mean_ratings = mean_ratings_by_title.sort_values(ascending=False).head(10)
top_10_mean_ratings
title
Gate of Heavenly Peace, The (1995) 5.0
Lured (1947) 5.0
Ulysses (Ulisse) (1954) 5.0
Smashing Time (1967) 5.0
Follow the Bitch (1998) 5.0
Song of Freedom (1936) 5.0
Bittersweet Motel (2000) 5.0
Baby, The (1973) 5.0
One Little Indian (1973) 5.0
Schlafes Bruder (Brother of Sleep) (1995) 5.0
Name: rating, dtype: float64
評分人數最多的10部電影的平均評分.
mean_ratings_by_title[top_10_total_rating.index]
title
American Beauty (1999) 4.317386
Star Wars: Episode IV - A New Hope (1977) 4.453694
Star Wars: Episode V - The Empire Strikes Back (1980) 4.292977
Star Wars: Episode VI - Return of the Jedi (1983) 4.022893
Jurassic Park (1993) 3.763847
Saving Private Ryan (1998) 4.337354
Terminator 2: Judgment Day (1991) 4.058513
Matrix, The (1999) 4.315830
Back to the Future (1985) 3.990321
Silence of the Lambs, The (1991) 4.351823
Name: rating, dtype: float64
可以瞭解到評論人數最多的10部電影在平均評分最高的10大中排名並不高,評分高的電影有一部分是我們不熟知的電影,是不是資料有問題呢?其實不是,
假如有某部爛片,去觀影的人很少,這很少的人給了很高的評分,所以導致一些評論人數很少但平均評分和高的電影.
如若不信,請看資料,評分最高的10大電影的評論次數
total_rating_by_title[top_10_mean_ratings.index]
title
Gate of Heavenly Peace, The (1995) 3
Lured (1947) 1
Ulysses (Ulisse) (1954) 1
Smashing Time (1967) 2
Follow the Bitch (1998) 1
Song of Freedom (1936) 1
Bittersweet Motel (2000) 1
Baby, The (1973) 1
One Little Indian (1973) 1
Schlafes Bruder (Brother of Sleep) (1995) 1
dtype: int64
現在來重新統計10大熱門電影,此處認為熱門電影至少有1000人評論。
統計出熱門電影
hot_movie = total_rating_by_title[total_rating_by_title>1000]
print(len(hot_movie))
hot_movie
207
title
2001: A Space Odyssey (1968) 1716
Abyss, The (1989) 1715
African Queen, The (1951) 1057
Air Force One (1997) 1076
Airplane! (1980) 1731
Aladdin (1992) 1351
Alien (1979) 2024
Aliens (1986) 1820
Amadeus (1984) 1382
American Beauty (1999) 3428
American Pie (1999) 1389
American President, The (1995) 1033
Animal House (1978) 1207
Annie Hall (1977) 1334
Apocalypse Now (1979) 1176
Apollo 13 (1995) 1251
Arachnophobia (1990) 1367
Armageddon (1998) 1110
As Good As It Gets (1997) 1424
Austin Powers: International Man of Mystery (1997) 1205
Austin Powers: The Spy Who Shagged Me (1999) 1434
Babe (1995) 1751
Back to the Future (1985) 2583
Back to the Future Part II (1989) 1158
Back to the Future Part III (1990) 1148
Batman (1989) 1431
Batman Returns (1992) 1031
Beauty and the Beast (1991) 1060
Beetlejuice (1988) 1495
Being John Malkovich (1999) 2241
...
Superman (1978) 1222
Talented Mr. Ripley, The (1999) 1331
Taxi Driver (1976) 1240
Terminator 2: Judgment Day (1991) 2649
Terminator, The (1984) 2098
Thelma & Louise (1991) 1417
There's Something About Mary (1998) 1371
This Is Spinal Tap (1984) 1118
Thomas Crown Affair, The (1999) 1089
Three Kings (1999) 1021
Time Bandits (1981) 1010
Titanic (1997) 1546
Top Gun (1986) 1010
Total Recall (1990) 1996
Toy Story (1995) 2077
Toy Story 2 (1999) 1585
True Lies (1994) 1400
Truman Show, The (1998) 1005
Twelve Monkeys (1995) 1511
Twister (1996) 1110
Untouchables, The (1987) 1127
Usual Suspects, The (1995) 1783
Wayne's World (1992) 1120
When Harry Met Sally... (1989) 1568
Who Framed Roger Rabbit? (1988) 1799
Willy Wonka and the Chocolate Factory (1971) 1313
Witness (1985) 1046
Wizard of Oz, The (1939) 1718
X-Men (2000) 1511
Young Frankenstein (1974) 1193
dtype: int64
#熱門電影的評分
hot_movie_mean_rating = mean_ratings_by_title[hot_movie.index]
print(len(hot_movie_mean_rating))
hot_movie_mean_rating
207
title
2001: A Space Odyssey (1968) 4.068765
Abyss, The (1989) 3.683965
African Queen, The (1951) 4.251656
Air Force One (1997) 3.588290
Airplane! (1980) 3.971115
Aladdin (1992) 3.788305
Alien (1979) 4.159585
Aliens (1986) 4.125824
Amadeus (1984) 4.251809
American Beauty (1999) 4.317386
American Pie (1999) 3.709863
American President, The (1995) 3.793804
Animal House (1978) 4.053024
Annie Hall (1977) 4.141679
Apocalypse Now (1979) 4.243197
Apollo 13 (1995) 4.073541
Arachnophobia (1990) 3.002926
Armageddon (1998) 3.191892
As Good As It Gets (1997) 3.950140
Austin Powers: International Man of Mystery (1997) 3.710373
Austin Powers: The Spy Who Shagged Me (1999) 3.388424
Babe (1995) 3.891491
Back to the Future (1985) 3.990321
Back to the Future Part II (1989) 3.343696
Back to the Future Part III (1990) 3.242160
Batman (1989) 3.600978
Batman Returns (1992) 2.976722
Beauty and the Beast (1991) 3.885849
Beetlejuice (1988) 3.567893
Being John Malkovich (1999) 4.125390
...
Superman (1978) 3.536825
Talented Mr. Ripley, The (1999) 3.503381
Taxi Driver (1976) 4.183871
Terminator 2: Judgment Day (1991) 4.058513
Terminator, The (1984) 4.152050
Thelma & Louise (1991) 3.680311
There's Something About Mary (1998) 3.904449
This Is Spinal Tap (1984) 4.179785
Thomas Crown Affair, The (1999) 3.641873
Three Kings (1999) 3.807052
Time Bandits (1981) 3.694059
Titanic (1997) 3.583441
Top Gun (1986) 3.686139
Total Recall (1990) 3.682365
Toy Story (1995) 4.146846
Toy Story 2 (1999) 4.218927
True Lies (1994) 3.634286
Truman Show, The (1998) 3.861692
Twelve Monkeys (1995) 3.945731
Twister (1996) 3.173874
Untouchables, The (1987) 4.007986
Usual Suspects, The (1995) 4.517106
Wayne's World (1992) 3.600893
When Harry Met Sally... (1989) 4.073342
Who Framed Roger Rabbit? (1988) 3.679822
Willy Wonka and the Chocolate Factory (1971) 3.861386
Witness (1985) 3.996176
Wizard of Oz, The (1939) 4.247963
X-Men (2000) 3.820649
Young Frankenstein (1974) 4.250629
Name: rating, dtype: float64
#評論人數>=1000的10大評分最高電影
top_10_rating_movie = hot_movie_mean_rating.sort_values(ascending=False).head(10)
top_10_rating_movie
title
Shawshank Redemption, The (1994) 4.554558
Godfather, The (1972) 4.524966
Usual Suspects, The (1995) 4.517106
Schindler's List (1993) 4.510417
Raiders of the Lost Ark (1981) 4.477725
Rear Window (1954) 4.476190
Star Wars: Episode IV - A New Hope (1977) 4.453694
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) 4.449890
Casablanca (1942) 4.412822
Sixth Sense, The (1999) 4.406263
Name: rating, dtype: float64
%matplotlib inline #在ipython(或jupyter)中使用此命令,其他則不必
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(1,11)
y = top_10_rating_movie.values
name = top_10_rating_movie.index
#畫出影象
plt.plot(x, y, 'r-o')
#添加註釋
for i in range(10):
plt.text(x[i], y[i], name[i])
#設定座標範圍
plt.xlim(0, 15)
plt.ylim(4.4, 4.56)
#設定座標標題
#plt.xlabel('Rank')
#plt.ylabel=('Rating')
#plt.show() #非ipython使用者使用此命令
這圖太醜,獻上下圖:
import matplotlib.pyplot as plt
import numpy as np
plt.rcdefaults()
people = name
y_pos = np.arange(len(people))
performance = y
error = np.random.rand(len(people))
plt.barh(y_pos, performance, xerr=error, align='center', alpha=0.4)
plt.yticks(y_pos, people)
#plt.xlabel('Rating')
#plt.title('Rank')
#plt.show() #非ipython使用者使用此命令
)