利用SVD矩陣分解進行k次交叉實驗和Top—N推薦
阿新 • • 發佈:2019-02-19
如果上一節沒看的,請先看上一節Surprise專案的使用。本文利用開源GitHub專案Surprise。
上一節說到具體的安裝和一些方法的屬性,本節將以SVD為例具體的程式碼demo的實現。
先說下如何利用Surprise載入本地資料集進行k次交叉實驗,如果看下API,其實非常簡單,體現了Surprise的強大,下面為程式碼:
# -*- coding: utf-8 -*- """ Created on Mon Aug 7 13:09:08 2017 @author: Jipon """ from surprise import SVD from surprise import Dataset from surprise import evaluate, print_perf from surprise import dataset #載入本地資料集進行3次交叉實驗 #每行資料型別為user item rating,依據空格來分割 reader=dataset.Reader(line_format='user item rating', sep=' ') data =Dataset.load_from_file('C:\\Users\\Jipon\\Desktop\\surprise\\train.txt',reader) #定義3次交叉實驗,如果不定義這句預設為5次 data.split(n_folds=3) # We'll use the famous SVD algorithm. algo = SVD() # Evaluate performances of our algorithm on the dataset. perf = evaluate(algo, data, measures=['RMSE', 'MAE']) print_perf(perf)
上面程式碼展示了利用SVD載入本地資料集進行推薦(資料集和程式碼連結在本文末尾),評估方法為RMSE和MAE,官方文件評價指標沒有準確度和召回率,如果我們需要這兩個評價指標可以自己定義,具體請參考官網。
在做推薦系統的過程中我們經常使用Top—N方法進行推薦,具體程式碼如下:
# -*- coding: utf-8 -*- """ Created on Tue Aug 8 13:27:08 2017 @author: Jipon """ from collections import defaultdict from surprise import SVD from surprise import Dataset from surprise import dataset def get_top_n(predictions, n=10): '''Return the top-N recommendation for each user from a set of predictions. Args: predictions(list of Prediction objects): The list of predictions, as returned by the test method of an algorithm. n(int): The number of recommendation to output for each user. Default is 10. Returns: A dict where keys are user (raw) ids and values are lists of tuples: [(raw item id, rating estimation), ...] of size n. ''' # First map the predictions to each user.,這句預設的list型別 top_n = defaultdict(list) #uid為使用者id,iid為專案id,true_r為真實的概率,est為分解後的估值 for uid, iid, true_r, est, _ in predictions: top_n[uid].append((iid, est)) # Then sort the predictions for each user and retrieve the k highest ones. for uid, user_ratings in top_n.items(): user_ratings.sort(key=lambda x: x[1], reverse=True) top_n[uid] = user_ratings[:n] return top_n # 載入資料集 reader=dataset.Reader(line_format='user item rating', sep=' ') data =Dataset.load_from_file('C:\\Users\\Jipon\\Desktop\\surprise\\train.txt',reader) trainset = data.build_full_trainset() algo = SVD() algo.train(trainset) #推薦不在訓練資料集裡得Top—N個數據 # Than predict ratings for all pairs (u, i) that are NOT in the training set. testset = trainset.build_anti_testset() predictions = algo.test(testset) top_n = get_top_n(predictions, n=2) # Print the recommended items for each user for uid, user_ratings in top_n.items(): print(uid, [iid for (iid, _) in user_ratings])
實驗結果為:
當然,然後你就可以用推薦的Top-N資料進行準確度和召回率的計算了。
了。是不是非常簡單?
上述程式碼和資料集連結:
https://github.com/Jipon/SVDTest