1. 程式人生 > >基於用戶的協同過濾電影推薦user-CF python

基於用戶的協同過濾電影推薦user-CF python

port result log title ont ted 文件 [1] int

協同過濾包括基於物品的協同過濾和基於用戶的協同過濾,本文基於電影評分數據做基於用戶的推薦

主要做三個部分:1、讀取數據;2、構建用戶與用戶的相似度矩陣;3、進行推薦;

查看數據u.data

主要用到前3列分別指 用戶編號user_id、電影編號item_id、用戶對電影的打分score

這個文件構建item-用戶的倒排表用於構建用戶和用戶的相似度矩陣,構建用戶-item的倒排表用於推薦

[email protected]:~/workspace/jupyter_project/recommendation$ head  ./data/u.data
196 242 3   881250949
186 302 3   891717742
22  377 1   878887116
244 51  2   880606923
166 346 1   886397596
298 474 4   884182806
115 265 2   881171488
253 465 5   891628467
305 451 3   886324817
6   86  3   883603013

查看數據u.item

主要用到前兩列:第一列是電影id item_id 第二列是電影名稱

這個文件主要用於推薦結果展示

[email protected]:~/workspace/jupyter_project/recommendation$ head  ./data/u.item
1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|0|1|0|0|0|0|0|0|0|0|0|0
5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|1|0|0
6|Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)|01-Jan-1995||http://us.imdb.com/Title?Yao+a+yao+yao+dao+waipo+qiao+(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
7|Twelve Monkeys (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Twelve%20Monkeys%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|1|0|0|0
8|Babe (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Babe%20(1995)|0|0|0|0|1|1|0|0|1|0|0|0|0|0|0|0|0|0|0
9|Dead Man Walking (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Dead%20Man%20Walking%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
10|Richard III (1995)|22-Jan-1996||http://us.imdb.com/M/title-exact?Richard%20III%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|1|0

代碼如下

# coding: utf-8

# In[64]:

#讀取數據
def read_data(udata,uitem):
    user_movies = {}#item - > user  用於構建相似度矩陣
    user_item = {}#user -> item ->score 最後用於推薦
    movies = {}
    for line in open(udata):
        user,item,score = line.split("\t")[:3]
        user_movies.setdefault(item,{})
        user_movies[item][user] 
= int(score) user_item.setdefault(user,{}) user_item[user][item]= int(score) for line in open(uitem,encoding = "ISO-8859-1"): item,name = line.split("|")[:2] movies.setdefault(item) movies[item] = name return user_movies,movies,user_item # user_movies,movies,user_item = read_data("./data/u.data","./data/u.item") # In[62]: import math #建立用戶相似度矩陣 def user_similarity(user_movies): C ={}#用於存放相似度矩陣 N = {}#用於存放每個人評價的電影數 for item , user_score in user_movies.items(): for user in user_score.keys(): N.setdefault(user,0) N[user] += 1 C.setdefault(user,{}) for user2 in user_score.keys(): if user == user2: continue C[user].setdefault(user2,0) C[user][user2] +=1 W = {}#存放最終的相似度矩陣 for user,user_score in C.items(): W.setdefault(user,{}) for user2,score in user_score.items(): W[user][user2] = C[user][user2]/math.sqrt(N[user]*N[user]) return W # W=user_similarity(user_movies) # In[63]: # def Recommend(user,user_item,W,N,M): rank = {} #存放推薦計算結果 user=user #N 用戶相關性最大的前N個用戶; #M代表推薦最終的M個結果 for user2,w_score in sorted(W[user].items(),key = lambda x:x[1],reverse = True)[:N]: for item,score in sorted(user_item[user2].items()): if item in user_item[user].keys(): continue rank.setdefault(item,{}) rank[item] = w_score*math.log(score) return sorted(rank.items(),key = lambda x:x[1],reverse = True)[:M] # In[65]: if __name__ == "__main__": print ("#導入數據") user_movies,movies,user_item = read_data("./data/u.data","./data/u.item") print("#計算相似度矩陣") W = user_similarity(user_movies) print ("#計算推薦結果") result = Recommend("1",user_item,W,2,10) print ("#結果展示") print ("你可能會喜歡") for line in result: print (movies[line[0]])

基於用戶的協同過濾電影推薦user-CF python