SVD、SVD++和Asymmetric SVD 以及例項
阿新 • • 發佈:2019-02-02
這裡是關於SVD、SVD++和Asymmetric SVD 相關資料彙總,以及一個使用surprise編寫SVD的例項。
1. 資料彙總
上面兩篇論文所用的資料集為:The Netflix data。也就是隻有使用者對電影的評分。
2. 使用SVD的例項:
import zipfile
from surprise import Reader, Dataset, SVD, evaluate
下載和解壓資料
# Unzip ml-100k.zip
zipfile = zipfile.ZipFile('D:/LiangYiHuai/kaggle/music-recommendation-data/ml-100k.zip' , 'r')
zipfile.extractall()
zipfile.close()
讀取資料
u_data = 'D:/LiangYiHuai/kaggle/music-recommendation-data/ml-100k/u.data';
# Prepare the data to be used in Surprise
reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_file(u_data, reader=reader)
把資料切分成5份,其中4份的用於訓練,其他一份用於交叉認證,以生成資訊報告,比如’RMSE’, ‘MAE’。如果不顯示分割的話,預設也會分割成5份。
例如10折交叉驗證(10-fold cross validation),將資料集分成十份,輪流將其中9份做訓練1份做驗證,10次的結果的均值作為對演算法精度的估計,一般還需要進行多次10折交叉驗證求均值,例如:10次10折交叉驗證,以求更精確一點。
# Split the dataset into 5 folds and choose the algorithm
data.split(n_folds=5)
algo = SVD()
訓練
# Train and test reporting the RMSE and MAE scores
evaluate(algo, data, measures=['RMSE' , 'MAE'])
# Retrieve the trainset.
trainset = data.build_full_trainset()
algo.train(trainset)
預測。
# Predict a certain item
userid = str(196)
itemid = str(302)
actual_rating = 4
print(algo.predict(userid, itemid, actual_rating))
3. 完整的程式碼為:
import zipfile
from surprise import Reader, Dataset, SVD, evaluate
# Unzip ml-100k.zip
# zipfile = zipfile.ZipFile('D:/LiangYiHuai/kaggle/music-recommendation-data/ml-100k.zip', 'r')
# zipfile.extractall()
# zipfile.close()
u_data = 'D:/LiangYiHuai/kaggle/music-recommendation-data/ml-100k/u.data';
# Read data into an array of strings
with open(u_data) as f:
all_lines = f.readlines()
# Prepare the data to be used in Surprise
reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_file(u_data, reader=reader)
# Split the dataset into 5 folds and choose the algorithm
data.split(n_folds=5)
algo = SVD()
# Train and test reporting the RMSE and MAE scores
evaluate(algo, data, measures=['RMSE', 'MAE'])
# Retrieve the trainset.
trainset = data.build_full_trainset()
algo.train(trainset)
# Predict a certain item
userid = str(196)
itemid = str(302)
actual_rating = 4
print(algo.predict(userid, itemid, actual_rating))
結束。