1. 程式人生 > >SVD、SVD++和Asymmetric SVD 以及例項

SVD、SVD++和Asymmetric SVD 以及例項

這裡是關於SVD、SVD++和Asymmetric SVD 相關資料彙總,以及一個使用surprise編寫SVD的例項。

1. 資料彙總

上面兩篇論文所用的資料集為:The Netflix data。也就是隻有使用者對電影的評分。

2. 使用SVD的例項:

import zipfile
from surprise import Reader, Dataset, SVD, evaluate

下載和解壓資料

# Unzip ml-100k.zip
zipfile = zipfile.ZipFile('D:/LiangYiHuai/kaggle/music-recommendation-data/ml-100k.zip'
, 'r') zipfile.extractall() zipfile.close()

讀取資料

u_data = 'D:/LiangYiHuai/kaggle/music-recommendation-data/ml-100k/u.data';

# Prepare the data to be used in Surprise
reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_file(u_data, reader=reader)

把資料切分成5份,其中4份的用於訓練,其他一份用於交叉認證,以生成資訊報告,比如’RMSE’, ‘MAE’。如果不顯示分割的話,預設也會分割成5份。

例如10折交叉驗證(10-fold cross validation),將資料集分成十份,輪流將其中9份做訓練1份做驗證,10次的結果的均值作為對演算法精度的估計,一般還需要進行多次10折交叉驗證求均值,例如:10次10折交叉驗證,以求更精確一點。

# Split the dataset into 5 folds and choose the algorithm
data.split(n_folds=5)
algo = SVD()

訓練

# Train and test reporting the RMSE and MAE scores
evaluate(algo, data, measures=['RMSE'
, 'MAE'])
# Retrieve the trainset. trainset = data.build_full_trainset() algo.train(trainset)

預測。

# Predict a certain item
userid = str(196)
itemid = str(302)
actual_rating = 4
print(algo.predict(userid, itemid, actual_rating))

3. 完整的程式碼為:

import zipfile
from surprise import Reader, Dataset, SVD, evaluate

# Unzip ml-100k.zip
# zipfile = zipfile.ZipFile('D:/LiangYiHuai/kaggle/music-recommendation-data/ml-100k.zip', 'r')
# zipfile.extractall()
# zipfile.close()

u_data = 'D:/LiangYiHuai/kaggle/music-recommendation-data/ml-100k/u.data';

# Read data into an array of strings
with open(u_data) as f:
    all_lines = f.readlines()

# Prepare the data to be used in Surprise
reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_file(u_data, reader=reader)

# Split the dataset into 5 folds and choose the algorithm
data.split(n_folds=5)
algo = SVD()

# Train and test reporting the RMSE and MAE scores
evaluate(algo, data, measures=['RMSE', 'MAE'])

# Retrieve the trainset.
trainset = data.build_full_trainset()
algo.train(trainset)

# Predict a certain item
userid = str(196)
itemid = str(302)
actual_rating = 4
print(algo.predict(userid, itemid, actual_rating))

結束。