關於CV調參GridSearchCV
阿新 • • 發佈:2019-02-08
首先需要介紹的工具是sklearn的模型選擇API(GridSearchCV)
第一節:GridSearchCV函式的用法
一個簡單的例子如下:sklearn.grid_search.GridSearchCV( estimator, # 是你要訓練的模型booster param_grid, # 字典型別的params 需要選擇的超參 scoring=None, # 評判效能好壞的標準 如auc: scoring='roc_auc' fit_params=None, n_jobs=1, # 並行執行的作業數 iid=True, refit=True, cv=None, # 訓練集與驗證集的比值,相當於nfold=5 verbose=0, pre_dispatch='2*n_jobs', error_score='raise' )
其中 print clf.best_params_ 會列印如下資訊,這是最優引數#-*- coding:utf-8 -*- import pandas as pd from sklearn import svm, datasets from sklearn.model_selection import GridSearchCV from sklearn.metrics import classification_report iris = datasets.load_iris() # 待調整的超引數 parameters = { 'kernel':('linear', 'rbf'), 'C':[1, 2, 4], 'gamma':[0.125, 0.25, 0.5 ,1, 2, 4] } svr = svm.SVC() # 模型 clf = GridSearchCV( svr, parameters, n_jobs=4 ) clf.fit(iris.data, iris.target) #你和模型 # clf.cv_results_ 是選擇引數的日誌資訊 cv_result = pd.DataFrame.from_dict( clf.cv_results_ ) with open('./data/cv_result.csv','wb') as f: cv_result.to_csv(f) print 'The parameters of the best model are: ' print clf.best_params_ # 打印出最合適的模型引數 y_pred_array = clf.predict(iris.data) # 預測結果 print classification_report( y_true=iris.target, y_pred=y_pred_array )
{'kernel': 'linear', 'C': 2, 'gamma': 0.125}
print classification_report( y_true=iris.target, y_pred=y_pred_array ) 會列印如下結果
precision recall f1-score support 0 1.00 1.00 1.00 50 1 1.00 0.94 0.97 50 2 0.94 1.00 0.97 50 avg / total 0.98 0.98 0.98 150
第二節:lightGBM使用GridSearchCV調參
LGBMRegressor可以呼叫的引數配置如下lightgbm.sklearn.LGBMRegressor(
boosting_type='gbdt',
num_leaves=31,
max_depth=-1,
learning_rate=0.1,
n_estimators=10,
max_bin=255,
subsample_for_bin=50000,
objective='regression',
min_split_gain=0,
min_child_weight=5,
min_child_samples=10,
subsample=1,
subsample_freq=1,
colsample_bytree=1,
reg_alpha=0,
reg_lambda=0,
seed=0,
nthread=-1,
silent=True,
huber_delta=1.0,
gaussian_eta=1.0,
fair_c=1.0,
poisson_max_delta_step=0.7,
drop_rate=0.1,
skip_drop=0.5,
max_drop=50,
uniform_drop=False,
xgboost_dart_mode=False
)