基於物品(使用者)的推薦演算法
mapreduce
用mapreduce計算框架實現了3個小demo: wordcount、基於物品的推薦演算法(itemCF)和基於使用者的推薦演算法(userCF)
程式碼連線: https://github.com/marvelousgirl/mapreduce
itemCF步驟:
step1: 根據使用者行為列表構建評分矩陣
map輸入:key:LongWritable型別,每一行的起始偏移量 value: Text型別 userID,itemID,score
map輸出:key:Text型別 itemID value: Text型別 userID_score
reduce輸入:key:Text型別 itemID value: Text型別 <userID1_score, userID2_score, userID2_score, …>
reduce輸出:key:Text型別 itemID value: Text型別 userID1_score,userID2_score,userID3_score
step2: 利用step1得到的評分矩陣,構建物品與物品的相似度矩陣,此處的相似度度量方法採用餘弦相似度
此外,評分矩陣還要作為快取,在setup方法裡實現
map輸入:key:LongWritable型別,每一行的起始偏移量 value: Text型別 itemID userID1_score,userID2_score,userID3_score
map輸出:key:Text型別,itemID value:Text型別 itemID1_sim
reduce輸入:key:Text型別,itemID value:Text型別 <itemID1_sim,itemID3_sim,…>
reduce輸出:key:Text型別 itemID value: Text型別 itemID1_sim,itemID3_sim,itemID5_sim
step3: 將評分矩陣轉置
map輸入:key:LongWritable型別,每一行的起始偏移量 value: Text型別 itemID userID1_score,userID2_score,userID3_score
map輸出:key:Text型別 userID value: Text型別 itemID_score
reduce輸入:key: Text型別 userID value: Text型別 <itemID1_score,itemID3_score,…>
reduce輸出:key: Text型別 userID value: Text型別 itemID1_score,itemID3_score,itemID2_score
step4: 物品與物品的相似度矩陣 * 轉置後的評分矩陣
此時,轉置後的評分矩陣要作為快取,在setup方法裡實現
map輸入:key:LongWritable型別,每一行的起始偏移量 Text型別 itemID itemID1_sim,itemID3_sim,itemID5_sim
map輸出:key:Text型別 itemID value: Text型別 userID_score
reduce輸入:key:Text型別 itemID value: Text型別 <userID1_score, userID2_score,…>
reduce輸出:key:Text型別 itemID value: Text型別 userID1_score, userID2_score,userID3_score
step5: 根據評分矩陣,將使用者已有過行為的商品忽略
此時,評分矩陣作為快取,在setup方法裡實現
map輸入:key:LongWritable型別,每一行的起始偏移量 value: Text型別 itemID userID1_score, userID2_score,userID3_score
map輸出:key:Text型別 userID value: Text型別 itemID_score
reduce輸入:key:Text型別 userID value: Text型別 <itemID1_score, itemID3_score,…>
reduce輸出:key:Text型別 userID value: Text型別 itemID1_score,itemID3_score,itemID5_score
userCF:
和itemCF的邏輯是一樣的,區別在於以userID作為行