推薦系統-基於鄰域的演算法
阿新 • • 發佈:2019-01-04
最近在看項亮的《推薦系統實踐》,文章只有只有程式碼片段,沒有完整的程式碼。所以在原有程式碼之上,根據書籍介紹的內容,還原了部分程式碼。
UserCF演算法(基於使用者的協同過濾演算法):
令
餘弦相似度計算:
得到使用者之間的興趣相似度之後,UserCF演算法會給使用者推薦和他興趣最相似的K個使用者喜歡的物品。如下公式度量UserCF演算法中使用者u對物品i的感興趣程度:
其中,
程式碼如下:
# -*- coding: utf-8 -*-
"""
Created on Sun Dec 31 12:46:42 2017
@author: lanlandetian
"""
import math
import operator
'''
#W is the similarity matrix
def UserSimilarity(train):
W = dict()
for u in train.keys():
for v in train.keys():
if u == v:
continue
W[u][v] = len(train[u] & train[v])
W[u][v] /= math.sqrt(len(train[u]) * len(train[v]) * 1.0)
return W
'''
def UserSimilarity(train):
# build inverse table for item_users
item_users = dict()
for u,items in train.items():
for i in items.keys():
if i not in item_users:
item_users[i] = set()
item_users[i].add(u)
#calculate co-rated items between users
C = dict()
N = dict()
for i,users in item_users.items():
for u in users:
N.setdefault(u,0)
N[u] += 1
C.setdefault(u,{})
for v in users:
if u == v:
continue
C[u].setdefault(v,0)
C[u][v] += 1
#calculate finial similarity matrix W
W = C.copy()
for u, related_users in C.items():
for v, cuv in related_users.items():
W[u][v] = cuv / math.sqrt(N[u] * N[v])
return W
def Recommend(user,train,W,K = 3):
rank = dict()
interacted_items = train[user]
for v, wuv in sorted(W[user].items(), key = operator.itemgetter(1), \
reverse = True)[0:K]:
for i, rvi in train[v].items():
#we should filter items user interacted before
if i in interacted_items:
continue
rank.setdefault(i,0)
rank[i] += wuv * rvi
return rank
def Recommendation(users, train, W, K = 3):
result = dict()
for user in users:
rank = Recommend(user,train,W,K)
R = sorted(rank.items(), key = operator.itemgetter(1), \
reverse = True)
result[user] = R
return result
使用者相似度的改進(UserCF_IIF演算法):
兩個使用者對於冷門物品的的行為更能說明他們興趣的相似度。因此,改進的使用者相似度公式如下:
該公式中,
程式碼如下與UserCF類似。
ItemCF演算法:
令
在得到物品的相似度後,ItemCF通過如下公式計算使用者u對物品i的興趣:
其中,
程式碼如下:
# -*- coding: utf-8 -*-
"""
Created on Sun Dec 31 13:09:26 2017
@author: lanlandetian
"""
import math
import operator
def ItemSimilarity(train):
#calculate co-rated users between items
#構建使用者-物品表
C =dict()
N = dict()
for u,items in train.items():
for i in items:
N.setdefault(i,0)
N[i] += 1
C.setdefault(i,{})
for j in items:
if i == j:
continue
C[i].setdefault(j,0)
C[i][j] += 1
#calculate finial similarity matrix W
W = C.copy()
for i,related_items in C.items():
for j,cij in related_items.items():
W[i][j] = cij / math.sqrt(N[i] * N[j])
return W
def Recommend(user_id,train, W,K = 3):
rank = dict()
ru = train[user_id]
for i,pi in ru.items():
for j,wij in sorted(W[i].items(), \
key = operator.itemgetter(1), reverse = True)[0:K]:
if j in ru:
continue
rank.setdefault(j,0)
rank[j] += pi * wij
return rank
#class Node:
# def __init__(self):
# self.weight = 0
# self.reason = dict()
#
#def Recommend(user_id,train, W,K =3):
# rank = dict()
# ru = train[user_id]
# for i,pi in ru.items():
# for j,wij in sorted(W[i].items(), \
# key = operator.itemgetter(1), reverse = True)[0:K]:
# if j in ru:
# continue
# if j not in rank:
# rank[j] = Node()
# rank[j].reason.setdefault(i,0)
# rank[j].weight += pi * wij
# rank[j].reason[i] = pi * wij
# return rank
def Recommendation(users, train, W, K = 3):
result = dict()
for user in users:
rank = Recommend(user,train,W,K)
R = sorted(rank.items(), key = operator.itemgetter(1), \
reverse = True)
result[user] = R
return result
改進的物品相似度(UserCF_IUF):
活躍使用者對物品相似度的貢獻應該小於不活躍的使用者,應該增加IUF
引數來修正物品相似度的計算公式:
程式碼與ItemCF類似。
此外,書中是使用dict表示資料集的。所以,我在github中是實現了整個演算法的流程,包括資料讀取,和最後的交叉驗證。
github網址如下:
https://github.com/1092798448/RecSys.git