1. 程式人生 > 其它 >《利用Python進行資料分析》筆記---第2章--MovieLens 1M資料集

《利用Python進行資料分析》筆記---第2章--MovieLens 1M資料集

寫在前面的話:

例項中的所有資料都是在GitHub上下載的,打包下載即可。
地址是: [ http://github.com/pydata/pydata-book ](http://github.com/pydata/pydata-
book)

還有一定要說明的:

我使用的是Python2.7,書中的程式碼有一些有錯誤,我使用自己的2.7版本調通。

    # coding: utf-8
    import pandas as pd
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\users.dat', sep='::', header=None, names=unames)
    rnmaes = ['user_id','movie_id','rating','timestamp']
    ratings = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\\ratings.dat', sep='::', header=None, names=rnmaes)
    mnames = ['movie_id','title','genres']
    movies = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\movies.dat', sep='::', header=None, names=mnames)
    
    users[:5]
    ratings[:5]
    movies[:5]
    
    ratings
    
    data = pd.merge(pd.merge(ratings, users), movies)
    data.ix[0]
    mean_rating = data.pivot_table('rating', index='title', columns='gender', aggfunc='mean')
    mean_rating[:5]
    ratings_by_title = data.groupby('title').size()
    ratings_by_title[:10]
    
    active_titles = ratings_by_title.index[ratings_by_title >= 250]
    active_titles
    
    mean_rating = mean_rating.ix[active_titles]
    mean_rating
    
    top_female_rating = mean_rating.sort_index(by='F', ascending=False)
    top_female_rating[:10]
    
    mean_rating['diff'] = mean_rating['M'] - mean_rating['F']
    sorted_by_diff = mean_rating.sort_index(by='diff')
    sorted_by_diff[:15]
    
    sorted_by_diff[::-1][:15]
    
    ratings_std_by_title = data.groupby('title')['rating'].std()
    ratings_std_by_title = ratings_by_title.ix[active_titles]
    ratings_std_by_title.order(ascending=False)[:10]
    ratings_std_by_title
[/code]


![在這裡插入圖片描述](https://img-blog.csdnimg.cn/20210608151750993.gif)