1. 程式人生 > >推薦系統-基於鄰域的演算法

推薦系統-基於鄰域的演算法

最近在看項亮的《推薦系統實踐》,文章只有只有程式碼片段,沒有完整的程式碼。所以在原有程式碼之上,根據書籍介紹的內容,還原了部分程式碼。
UserCF演算法(基於使用者的協同過濾演算法):
N(u)表示使用者u的正反饋的物品集合,令N(v)表示使用者v的正反饋物品集合。那麼Jaccard相似度為:

wuv=|N(u)N(v)||N(u)N(v)|
餘弦相似度計算:
wuv=|N(u)N(v)||N(u)||N(v)|
得到使用者之間的興趣相似度之後,UserCF演算法會給使用者推薦和他興趣最相似的K個使用者喜歡的物品。如下公式度量UserCF演算法中使用者u對物品i的感興趣程度:
p
(u,i)=vS(u,K)N(i)wuvrvi

其中,S(u,K)表示和使用者u興趣最接近的K個使用者。在隱式反饋中, rvi=1
程式碼如下:
# -*- coding: utf-8 -*-
"""
Created on Sun Dec 31 12:46:42 2017

@author: lanlandetian
"""
import math
import operator


'''
#W is the similarity matrix
def UserSimilarity(train):
    W = dict()
    for u in train.keys():
        for v in train.keys():
            if u == v:
                continue
            W[u][v] = len(train[u] & train[v])
            W[u][v] /= math.sqrt(len(train[u]) * len(train[v]) * 1.0)
    return W
'''
def UserSimilarity(train): # build inverse table for item_users item_users = dict() for u,items in train.items(): for i in items.keys(): if i not in item_users: item_users[i] = set() item_users[i].add(u) #calculate co-rated items between users
C = dict() N = dict() for i,users in item_users.items(): for u in users: N.setdefault(u,0) N[u] += 1 C.setdefault(u,{}) for v in users: if u == v: continue C[u].setdefault(v,0) C[u][v] += 1 #calculate finial similarity matrix W W = C.copy() for u, related_users in C.items(): for v, cuv in related_users.items(): W[u][v] = cuv / math.sqrt(N[u] * N[v]) return W def Recommend(user,train,W,K = 3): rank = dict() interacted_items = train[user] for v, wuv in sorted(W[user].items(), key = operator.itemgetter(1), \ reverse = True)[0:K]: for i, rvi in train[v].items(): #we should filter items user interacted before if i in interacted_items: continue rank.setdefault(i,0) rank[i] += wuv * rvi return rank def Recommendation(users, train, W, K = 3): result = dict() for user in users: rank = Recommend(user,train,W,K) R = sorted(rank.items(), key = operator.itemgetter(1), \ reverse = True) result[user] = R return result

使用者相似度的改進(UserCF_IIF演算法):
兩個使用者對於冷門物品的的行為更能說明他們興趣的相似度。因此,改進的使用者相似度公式如下:

wuv=iN(u)N(v)1log(1+|N(i)|)|N(u)||N(v)|
該公式中,1log(1+|N(i)|)懲罰了熱門物品對於相似度的影響。
程式碼如下與UserCF類似。

ItemCF演算法:
N(i)表示與物品i互動過的使用者的結合。則物品i和物品j的相似度為

wij=|N(i)N(j)||N(i)||N(j)|
在得到物品的相似度後,ItemCF通過如下公式計算使用者u對物品i的興趣:
p(u,i)=jN(u)S(i,K)wijrui
其中,S(i,K)表示與物品i最相近的K個物品的集合。

程式碼如下:

# -*- coding: utf-8 -*-
"""
Created on Sun Dec 31 13:09:26 2017

@author: lanlandetian
"""

import math
import operator


def ItemSimilarity(train):
    #calculate co-rated users between items
    #構建使用者-物品表
    C =dict()
    N = dict()
    for u,items in train.items():
        for i in items:
            N.setdefault(i,0)
            N[i] += 1
            C.setdefault(i,{})
            for j in items:
                if i == j:
                    continue
                C[i].setdefault(j,0)
                C[i][j] += 1

    #calculate finial similarity matrix W
    W = C.copy()
    for i,related_items in C.items():
        for j,cij in related_items.items():
            W[i][j] = cij / math.sqrt(N[i] * N[j])
    return W


def Recommend(user_id,train, W,K = 3):
    rank = dict()
    ru = train[user_id]
    for i,pi in ru.items():
        for j,wij in sorted(W[i].items(), \
                           key = operator.itemgetter(1), reverse = True)[0:K]:
            if j in ru:
                continue
            rank.setdefault(j,0)
            rank[j] += pi * wij
    return rank


#class Node:
#    def __init__(self):
#        self.weight = 0
#        self.reason = dict()
#    
#def Recommend(user_id,train, W,K =3):
#    rank = dict()
#    ru = train[user_id]
#    for i,pi in ru.items():
#        for j,wij in sorted(W[i].items(), \
#                           key = operator.itemgetter(1), reverse = True)[0:K]:
#            if j in ru:
#                continue
#            if j not in rank:
#                rank[j] = Node()
#            rank[j].reason.setdefault(i,0)
#            rank[j].weight += pi * wij
#            rank[j].reason[i] = pi * wij
#    return rank

def Recommendation(users, train, W, K = 3):
    result = dict()
    for user in users:
        rank = Recommend(user,train,W,K)
        R = sorted(rank.items(), key = operator.itemgetter(1), \
                   reverse = True)
        result[user] = R
    return result

改進的物品相似度(UserCF_IUF):
活躍使用者對物品相似度的貢獻應該小於不活躍的使用者,應該增加IUF
引數來修正物品相似度的計算公式:

wi,j=uN(i)N(j)1log(1+|N(u)|)|N(i)||N(j)|
程式碼與ItemCF類似。

此外,書中是使用dict表示資料集的。所以,我在github中是實現了整個演算法的流程,包括資料讀取,和最後的交叉驗證。
github網址如下:
https://github.com/1092798448/RecSys.git