Python之推薦演算法庫Surprise
Surprise is an easy-to-use Python scikit for recommender systems.
- 幫助文件 https://surprise.readthedocs.io/en/stable/
- 安裝方法:pip install surprise
- 可能會出現安裝失敗:error: Microsoft Visual C++ 14.0 is required. Get it with
“Microsoft Visual C++ Build Tools” - 如果失敗去這個網址下載即可配置環境:https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/
1 Getting Started
1.1 Basic usage
Automatic cross-validation
from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate
# 載入movielens資料集
data = Dataset.load_builtin('ml-100k')
# SVD例項化
algo = SVD()
# 5折驗證,並輸出結果
cross_validate(algo, data, measures= ['RMSE', 'MAE'], cv=5, verbose=True)
Dataset ml-100k could not be found. Do you want to download it? [Y/n] y Trying to download dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip... Done! Dataset ml-100k has been saved to C:\Users\Administrator/.surprise_data/ml-100k Evaluating RMSE, MAE of algorithm SVD on 5 split(s). Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean Std RMSE (testset) 0.9429 0.9262 0.9353 0.9313 0.9427 0.9357 0.0065 MAE (testset) 0.7420 0.7306 0.7381 0.7343 0.7415 0.7373 0.0044 Fit time 6.75 6.65 6.81 6.97 6.79 6.79 0.10 Test time 0.29 0.28 0.31 0.24 0.28 0.28 0.03 {'fit_time': (6.748954772949219, 6.648886442184448, 6.814781904220581, 6.970685958862305, 6.785797357559204), 'test_mae': array([0.74200524, 0.73058076, 0.73807502, 0.73425662, 0.74150664]), 'test_rmse': array([0.94290798, 0.92623843, 0.9352968 , 0.93130338, 0.94273246]), 'test_time': (0.2868227958679199, 0.2778284549713135, 0.3148069381713867, 0.23685264587402344, 0.28182458877563477)}
Train-test split and the fit() method
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
# 載入資料
data = Dataset.load_builtin('ml-100k')
# 25%的資料用於測試
trainset, testset = train_test_split(data, test_size=.25)
# 例項化
algo = SVD()
algo.fit(trainset)
predictions = algo.test(testset)
# RMSE
accuracy.rmse(predictions)
RMSE: 0.9392
0.9391726088618421
Train on a whole trainset and the predict() method
我們也可以簡單地將演算法擬合到整個資料集,而不是執行交叉驗證。 這可以通過使用build_full_trainset()方法來完成,該方法將構建一個trainset物件:
from surprise import KNNBasic
from surprise import Dataset
# 載入資料
data = Dataset.load_builtin('ml-100k')
# 恢復訓練集
trainset = data.build_full_trainset()
# 例項化協同過濾、訓練
algo = KNNBasic()
algo.fit(trainset)
Computing the msd similarity matrix...
Done computing similarity matrix.
<surprise.prediction_algorithms.knns.KNNBasic at 0x1f5faae4278>
uid = str(196) # 原始使用者id
iid = str(302) # 原始物品ID
# 預測使用者對物品的評分
pred = algo.predict(uid, iid, r_ui=4, verbose=True)
user: 196 item: 302 r_ui = 4.00 est = 4.06 {'actual_k': 40, 'was_impossible': False}
1.2 Use a custom dataset
演算法類名 說明
random_pred.NormalPredictor 根據訓練集的分佈特徵隨機給出一個預測值
baseline_only.BaselineOnly 給定使用者和Item,給出基於baseline的估計值
knns.KNNBasic 最基礎的協同過濾
knns.KNNWithMeans 將每個使用者評分的均值考慮在內的協同過濾實現
knns.KNNBaseline 考慮基線評級的協同過濾
matrix_factorization.SVD SVD實現
matrix_factorization.SVDpp SVD++,即LFM+SVD
matrix_factorization.NMF 基於矩陣分解的協同過濾
slope_one.SlopeOne 一個簡單但精確的協同過濾演算法
co_clustering.CoClustering 基於協同聚類的協同過濾演算法
相似度度量標準 度量標準說明
cosine 計算所有使用者(或物品)對之間的餘弦相似度。
msd 計算所有使用者(或物品)對之間的均方差異相似度。
pearson 計算所有使用者(或物品)對之間的Pearson相關係數。
pearson_baseline 計算所有使用者(或物品)對之間的(縮小的)Pearson相關係數,使用基線進行居中而不是平均值。
評估準則 準則說明
rmse 計算RMSE(均方根誤差)。
mae 計算MAE(平均絕對誤差)。
fcp 計算FCP(協調對的分數)。
import os
from surprise import BaselineOnly # 給定使用者和Item,給出基於baseline的估計值
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import cross_validate
# 路徑
file_path = os.path.expanduser('~/.surprise_data/ml-100k/ml-100k/u.data')
# 'user item rating timestamp', '\t' 分割.
reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_file(file_path, reader=reader)
# 使用資料
cross_validate(BaselineOnly(), data, verbose=True)
# 25%的資料用於測試
trainset, testset = train_test_split(data, test_size=.25)
blo = BaselineOnly()
blo.fit(trainset)
blo.predict(196, 302, 4, verbose=True)
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Evaluating RMSE, MAE of algorithm BaselineOnly on 5 split(s).
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean Std
RMSE (testset) 0.9470 0.9402 0.9467 0.9442 0.9418 0.9440 0.0027
MAE (testset) 0.7528 0.7427 0.7502 0.7480 0.7474 0.7482 0.0034
Fit time 0.23 0.28 0.33 0.25 0.24 0.27 0.04
Test time 0.28 0.32 0.25 0.19 0.24 0.26 0.04
Estimating biases using als...
3.8799791205908227
import pandas as pd
from surprise import NormalPredictor
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import cross_validate
# 製造資料
ratings_dict = {'itemID': [1, 1, 1, 2, 2],
'userID': [9, 32, 2, 45, 'user_foo'],
'rating': [3, 2, 4, 3, 1]}
df = pd.DataFrame(ratings_dict)
# 設定rating為1到5
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['userID', 'itemID', 'rating']], reader)
print(type(data))
# 評估
cross_validate(NormalPredictor(), data, cv=3)
<class 'surprise.dataset.DatasetAutoFolds'>
{'fit_time': (0.0, 0.0, 0.0),
'test_mae': array([1.82749483, 1.36961054, 1.08665964]),
'test_rmse': array([2.42042007, 1.3756825 , 1.08665964]),
'test_time': (0.0, 0.0009999275207519531, 0.0)}
1.3 Use cross-validation iterators
對於交叉驗證,我們可以使用cross_validate()函式為我們完成所有艱苦的工作。 但是為了更好地控制,我們還可以實現交叉驗證迭代器,並使用迭代器的split()方法和演算法的test()方法對每個拆分進行預測。
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import KFold
# 載入資料
data = Dataset.load_builtin('ml-100k')
# 3折交叉驗證
kf = KFold(n_splits=3)
algo = SVD()
for trainset, testset in kf.split(data):
# 訓練、預測
algo.fit(trainset)
predictions = algo.test(testset)
# 評估
accuracy.rmse(predictions, verbose=True) # verbose: 如果True, 會列印結果.
RMSE: 0.9460
RMSE: 0.9494
RMSE: 0.9457
movielens-100K資料集已經提供了5個訓練和測試檔案(u1.base,u1.test … u5.base,u5.test)。
surprise可以通過使用surprise.model_selection.split.PredefinedKFold物件來處理這種情況:
from surprise import SVD
from surprise import Dataset
from surprise import Reader
from surprise import accuracy
from surprise.model_selection import PredefinedKFold
files_dir = os.path.expanduser('~/.surprise_data/ml-100k/ml-100k/')
reader = Reader('ml-100k')
# [(u1.base, u1.test), (u2.base, u2.test), ... (u5.base, u5.test)]
train_file = files_dir + 'u%d.base'
test_file = files_dir + 'u%d.test'
folds_files = [(train_file % i, test_file % i) for i in (1, 2, 3, 4, 5)]
data = Dataset.load_from_folds(folds_files, reader=reader)
pkf = PredefinedKFold()
algo = SVD()
for trainset, testset in pkf.split(data):
algo.fit(trainset)
predictions = algo.test(testset)
accuracy.rmse(predictions, verbose=True)
# print(predictions)
1.4 Tune algorithm parameters with GridSearchCV
cross_validate()函式報告給定引數集的交叉驗證過程的準確度度量。
如果想知道哪個引數組合產生最佳結果,GridSearchCV類就可以解決問題。
給定引數的字典,該類詳盡地嘗試所有引數組合並報告任何精度測量的最佳引數(在不同的分裂上取平均值)。
from surprise import SVD
from surprise import Dataset
from surprise.model_selection import GridSearchCV
data = Dataset.load_builtin('ml-100k')
param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
'reg_all': [0.4, 0.6]}
gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)
gs.fit(data)
print(gs.best_score['rmse']) # 分數
print(gs.best_params['rmse']) # 引數
import pandas as pd
results_df = pd.DataFrame.from_dict(gs.cv_results)
results_df
0.9642869135146698
{'reg_all': 0.4, 'n_epochs': 10, 'lr_all': 0.005}
mean_fit_time | mean_test_mae | mean_test_rmse | mean_test_time | param_lr_all | param_n_epochs | param_reg_all | params | rank_test_mae | rank_test_rmse | split0_test_mae | split0_test_rmse | split1_test_mae | split1_test_rmse | split2_test_mae | split2_test_rmse | std_fit_time | std_test_mae | std_test_rmse | std_test_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.496731 | 0.806236 | 0.997472 | 0.511683 | 0.002 | 5 | 0.4 | {'reg_all': 0.4, 'n_epochs': 5, 'lr_all': 0.002} | 7 | 7 | 0.807283 | 0.997869 | 0.807397 | 0.999162 | 0.804027 | 0.995386 | 0.034429 | 0.001562 | 0.001567 | 0.062379 |
1 | 1.403456 | 0.782359 | 0.974123 | 0.497358 | 0.005 | 5 | 0.4 | {'reg_all': 0.4, 'n_epochs': 5, 'lr_all': 0.005} | 2 | 2 | 0.783014 | 0.974045 | 0.784169 | 0.976183 | 0.779894 | 0.972142 | 0.003297 | 0.001806 | 0.001650 | 0.062849 |
2 | 2.811914 | 0.786120 | 0.978227 | 0.492694 | 0.002 | 10 | 0.4 | {'reg_all': 0.4, 'n_epochs': 10, 'lr_all': 0.002} | 4 | 4 | 0.786966 | 0.978410 | 0.787427 | 0.979666 | 0.783967 | 0.976606 | 0.003398 | 0.001534 | 0.001256 | 0.064478 |
3 | 2.794590 | 0.773040 | 0.964287 | 0.537333 | 0.005 | 10 | 0.4 | {'reg_all': 0.4, 'n_epochs': 10, 'lr_all': 0.005} | 1 | 1 | 0.773070 | 0.963666 | 0.775167 | 0.966541 | 0.770884 | 0.962653 | 0.008335 | 0.001749 | 0.001647 | 0.012490 |
4 | 1.410455 | 0.814898 | 1.003614 | 0.484698 | 0.002 | 5 | 0.6 | {'reg_all': 0.6, 'n_epochs': 5, 'lr_all': 0.002} | 8 | 8 | 0.816255 | 1.004197 | 0.815952 | 1.005326 | 0.812487 | 1.001319 | 0.005308 | 0.001710 | 0.001687 | 0.060862 |
5 | 1.470082 | 0.793487 | 0.983101 | 0.542994 | 0.005 | 5 | 0.6 | {'reg_all': 0.6, 'n_epochs': 5, 'lr_all': 0.005} | 5 | 5 | 0.794289 | 0.983202 | 0.795240 | 0.985284 | 0.790931 | 0.980816 | 0.023524 | 0.001848 | 0.001825 | 0.058165 |
6 | 2.980475 | 0.796703 | 0.986454 | 0.527671 | 0.002 | 10 | 0.6 | {'reg_all': 0.6, 'n_epochs': 10, 'lr_all': 0.002} | 6 | 6 | 0.797903 | 0.986878 | 0.797934 | 0.988105 | 0.794272 | 0.984379 | 0.087440 | 0.001719 | 0.001550 | 0.018768 |
7 | 2.823572 | 0.784945 | 0.974213 | 0.494693 | 0.005 | 10 | 0.6 | {'reg_all': 0.6, 'n_epochs': 10, 'lr_all': 0.005} | 3 | 3 | 0.785202 | 0.973794 | 0.787113 | 0.976659 | 0.782519 | 0.972187 | 0.003396 | 0.001884 | 0.001850 | 0.057241 |
# 選擇最優引數對應模型
algo = gs.best_estimator['rmse']
algo.fit(data.build_full_trainset())
<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1f58bf98940>
algo.predict(193, 302, 4, verbose=True)
user: 193 item: 302 r_ui = 4.00 est = 3.53 {'was_impossible': False}
Prediction(uid=193, iid=302, r_ui=4, est=3.52986, details={'was_impossible': False})
1.5 Command line usage
在命令列中使用
surprise -algo SVD -params “{‘n_epochs’: 5, ‘verbose’: True}” -load-builtin ml-100k -n-folds 3
surprise -h
2 Using prediction algorithms
Surprise提供了一堆內建演算法。 所有演算法都來自AlgoBase基類,其中實現了一些關鍵方法(例如predict,fit和test)。 可以在prediction_algorithms包文件中找到可用預測演算法的列表和詳細資訊。
每個演算法都是全域性Surprise名稱空間的一部分,因此您只需要從Surprise包中匯入它們的名稱
from surprise import KNNBasic
algo = KNNBasic()
這些演算法中的一些可以使用 baseline estimates,一些可以使用similarity measure。
2.1 Baselines estimates configuration
\sum_{r_{ui} \in R_{train}} \left(r_{ui} - (\mu + b_u + b_i)\right)^2 +
\lambda \left(b_u^2 + b_i^2 \right)
可以通過兩種不同的方式估算基線:
使用隨機梯度下降(SGD)。
使用交替最小二乘法(ALS)。
print('Using ALS')
bsl_options = {'method': 'als',
'n_epochs': 5,
'reg_u': 12,
'reg_i': 5
}
algo = BaselineOnly(bsl_options=bsl_options)
Using ALS
print('Using SGD')
bsl_options = {'method': 'sgd',
'learning_rate': .00005,
}
algo = BaselineOnly(bsl_options=bsl_options)
Using SGD
bsl_options = {'method': 'als',
'n_epochs': 20,
}
sim_options = {'name': 'pearson_baseline'}
algo = KNNBasic(bsl_options=bsl_options, sim_options=sim_options)
2.2 Similarity measure configuration
許多演算法使用相似性度量來估計評級。 它們的配置方式與基線評級類似:只需在建立演算法時傳遞sim_options引數即可。 此引數是包含以下(所有可選)鍵的字典:
‘name’:相似性模組中定義的相似性名稱。 預設為’MSD’。
‘user_based’:是否在使用者之間或專案之間計算相似性。 這對預測演算法的效能有很大影響。 預設為True。
‘min_support’:公共項的最小數量(當’user_based’為’True’時)或最小公共使用者數(當’user_based’為’False’時),相似度不為零。 簡單地說,如果| Iuv | <min_support(u,v)則sim(u,v)= 0。 物品也一樣。
‘shrinkage’:要應用的收縮引數(僅與pearson_baseline相似性相關)。 預設值為100。
sim_options = {'name': 'cosine',
'user_based': False # compute similarities between items
}
algo = KNNBasic(sim_options=sim_options)
sim_options = {'name': 'pearson_baseline',
'shrinkage': 0 # no shrinkage
}
algo = KNNBasic(sim_options=sim_options)
3 How to build your own prediction algorithm
如何使用Surprise構建自定義預測演算法
建立自己的預測演算法非常簡單:演算法只不過是一個派生自AlgoBase的類,它具有估計方法。
這是predict()方法呼叫的方法。 它接受一個內部使用者id,一個內部項ID,並返回估計值。
from surprise import AlgoBase
from surprise import Dataset
from surprise.model_selection import cross_validate
import numpy as np
class MyOwnAlgorithm(AlgoBase):
def __init__(self):
AlgoBase.__init__(self)
def fit(self, trainset):
AlgoBase.fit(self, trainset)
self.the_mean = np.mean([r for (_, _, r) in
self.trainset.all_ratings()])
return self
def estimate(self, u, i):
sum_means = self.trainset.global_mean
div = 1
if self.trainset.knows_user(u):
sum_means += np.mean([r for (_, r) in self.trainset.ur[u]])
div += 1
if self.trainset.knows_item(i):
sum_means += np.mean([r for (_, r) in self.trainset.ir[i]])
div += 1
return sum_means / div
data = Dataset.load_builtin('ml-100k')
algo = MyOwnAlgorithm()
cross_validate(algo, data, verbose=True)
Evaluating RMSE, MAE of algorithm MyOwnAlgorithm on 5 split(s).
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean Std
RMSE (testset) 1.0179 1.0165 1.0175 1.0216 1.0156 1.0178 0.0021
MAE (testset) 0.8380 0.8356 0.8376 0.8414 0.8364 0.8378 0.0020
Fit time 0.04 0.06 0.06 0.07 0.08 0.06 0.01
Test time 2.94 2.86 2.95 3.05 3.05 2.97 0.07
{'fit_time': (0.03598380088806152,
0.06396150588989258,
0.05696725845336914,
0.06996297836303711,
0.07695245742797852),
'test_mae': array([0.83803386, 0.83556254, 0.83764556, 0.84141284, 0.83639388]),
'test_rmse': array([1.01792507, 1.01651414, 1.0175074 , 1.02157154, 1.01555266]),
'test_time': (2.9401426315307617,
2.862196445465088,
2.9531378746032715,
3.045079231262207,
3.051081657409668)}
prediction 不可用時
from surprise import PredictionImpossible
class MyOwnAlgorithm(AlgoBase):
def __init__(self, sim_options={}, bsl_options={}):
AlgoBase.__init__(self, sim_options=sim_options,
bsl_options=bsl_options)
def fit(self, trainset):
AlgoBase.fit(self, trainset)
self.bu, self.bi = self.compute_baselines()
self.sim = self.compute_similarities()
return self
def estimate(self, u, i):
if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)):
raise PredictionImpossible('User and/or item is unkown.')
# 計算u和v之間的相似性,其中v表示評價專案i的所有其他使用者。
neighbors = [(v, self.sim[u, v]) for (v, r) in self.trainset.ir[i]]
# 根據相似度排序
neighbors = sorted(neighbors, key=lambda x: x[1], reverse=True)
print('The 3 nearest neighbors of user', str(u), 'are:')
for v, sim_uv in neighbors[:3]:
print('user {0:} with sim {1:1.2f}'.format(v, sim_uv))
4 prediction_algorithms package
https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html
5 The model_selection package
https://surprise.readthedocs.io/en/stable/model_selection.html
Cross validation iterators 使用前需要例項化
KFold A basic cross-validation iterator.
RepeatedKFold Repeated KFold cross validator.
ShuffleSplit A basic cross-validation iterator with random trainsets and testsets.
LeaveOneOut Cross-validation iterator where each user has exactly one rating in the testset.
PredefinedKFold A cross-validation iterator to when a dataset has been loaded with the load_from_folds method.
Cross validation
surprise.model_selection.validation.cross_validate(algo, data, measures=[u’rmse’, u’mae’], cv=None, return_train_measures=False, n_jobs=1, pre_dispatch=u’2*n_jobs’, verbose=False)
Parameter search
surprise.model_selection.search.RandomizedSearchCV(algo_class, param_distributions, n_iter=10, measures=[u’rmse’, u’mae’], cv=None, refit=False, return_train_measures=False, n_jobs=1, pre_dispatch=u’2*n_jobs’, random_state=None, joblib_verbose=0)
surprise.model_selection.search.GridSearchCV(algo_class, param_grid, measures=[u’rmse’, u’mae’], cv=None, refit=False, return_train_measures=False, n_jobs=1, pre_dispatch=u’2*n_jobs’, joblib_verbose=0)
6 similarities module
https://surprise.readthedocs.io/en/stable/similarities.html#
cosine: Compute the cosine similarity between all pairs of users (or items).
msd: Compute the Mean Squared Difference similarity between all pairs of users (or items).
pearson: Compute the Pearson correlation coefficient between all pairs of users (or items).
pearson_baseline: Compute the (shrunk) Pearson correlation coefficient between all pairs of users (or items) using baselines for centering instead of means.
7 accuracy module
https://surprise.readthedocs.io/en/stable/accuracy.html
rmse: Compute RMSE (Root Mean Squared Error).
mae: Compute MAE (Mean Absolute Error).
fcp: Compute FCP (Fraction of Concordant Pairs).
8 dataset module
https://surprise.readthedocs.io/en/stable/dataset.html
Dataset.load_builtin: Load a built-in dataset.
Dataset.load_from_file: Load a dataset from a (custom) file.
Dataset.load_from_folds: Load a dataset where folds (for cross-validation) are predefined by some files.
Dataset.folds: Generator function to iterate over the folds of the Dataset.
DatasetAutoFolds.split: Split the dataset into folds for future cross-validation.
9 Trainset class
https://surprise.readthedocs.io/en/stable/trainset.html
.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, offset, raw2inner_id_users, raw2inner_id_items)
lobal_mean
The mean of all ratings μ.
all_items()
Generator function to iterate over all items.
Yields: Inner id of items.
all_ratings()
Generator function to iterate over all ratings.
Yields: A tuple (uid, iid, rating) where ids are inner ids (see this note).
all_users()
Generator function to iterate over all users.
Yields: Inner id of users.
build_anti_testset(fill=None)
Return a list of ratings that can be used as a testset in the test() method.
The ratings are all the ratings that are not in the trainset, i.e. all the ratings rui where the user u is known, the item i is known, but the rating rui is not in the trainset. As rui is unknown, it is either replaced by the fill value or assumed to be equal to the mean of all ratings global_mean.
Parameters: fill (float) – The value to fill unknown ratings. If None the global mean of all ratings global_mean will be used.
Returns: A list of tuples (uid, iid, fill) where ids are raw ids.
build_testset()
Return a list of ratings that can be used as a testset in the test() method.
The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the all_ratings() generator. This is useful in cases where you want to to test your algorithm on the trainset.
global_mean
Return the mean of all ratings.
It’s only computed once.
knows_item(iid)
Indicate if the item is part of the trainset.
An item is part of the trainset if the item was rated at least once.
Parameters: iid (int) – The (inner) item id. See this note.
Returns: True if item is part of the trainset, else False.
knows_user(uid)
Indicate if the user is part of the trainset.
A user is part of the trainset if the user has at least one rating.
Parameters: uid (int) – The (inner) user id. See this note.
Returns: True if user is part of the trainset, else False.
to_inner_iid(riid)
Convert an item raw id to an inner id.
Parameters: riid (str) – The item raw id.
Returns: The item inner id.
Return type: int
Raises: ValueError – When item is not part of the trainset.
to_inner_uid(ruid)
Convert a user raw id to an inner id.
Parameters: ruid (str) – The user raw id.
Returns: The user inner id.
Return type: int
Raises: ValueError – When user is not part of the trainset.
to_raw_iid(iiid)
Convert an item inner id to a raw id.
Parameters: iiid (int) – The item inner id.
Returns: The item raw id.
Return type: str
Raises: ValueError – When iiid is not an inner id.
to_raw_uid(iuid)
Convert a user inner id to a raw id.
Parameters: iuid (int) – The user inner id.
Returns: The user raw id.
Return type: str
Raises: ValueError – When iuid is not an inner id
使用者和專案具有原始ID和內部ID。 一些方法將使用/返回原始id(例如predict()方法),而另一些方法將使用/返回內部id。
原始ID是評級檔案或pandas資料框中定義的ID。 它們可以是字串或數字。 請注意,如果從作為標準方案的檔案中讀取評級,則將它們表示為字串。 重要的是要知道您是否正在使用例如 predict()或其他接受原始id作為引數的方法。
在trainset建立時,每個原始id都對映到一個名為inner id的唯一整數,這更適合於Surprise操作。 原始ID和內部ID之間的轉換可以使用
10 Reader
https://surprise.readthedocs.io/en/stable/reader.html
surprise.reader.Reader(name=None, line_format=u’user item rating’, sep=None, rating_scale=(1, 5), skip_lines=0)
name (string, optional) – If specified, a Reader for one of the built-in datasets is returned and any other parameter is ignored. Accepted values are ‘ml-100k’, ‘ml-1m’, and ‘jester’. Default is None.
line_format (string) – The fields names, in the order at which they are encountered on a line. Please note that line_format is always space-separated (use the sep parameter). Default is ‘user item rating’.
sep (char) – the separator between fields. Example : ‘;’.
rating_scale (tuple, optional) – The rating scale used for every rating. Default is (1, 5).
skip_lines (int, optional) – Number of lines to skip at the beginning of the file. Default is 0.
11 evaluate module
https://surprise.readthedocs.io/en/stable/evaluate.html
1 surprise.evaluate.GridSearch(algo_class, param_grid, measures=[u’rmse’, u’mae’], n_jobs=1, pre_dispatch=u’2*n_jobs’, seed=None, ver