超引數的選擇、格點搜尋與交叉驗證

阿新 • • 發佈：2019-01-12

超引數的選擇

1. 超引數有哪些

　　與超引數對應的是引數。引數是可以在模型中通過BP（反向傳播）進行更新學習的引數，例如各種權值矩陣，偏移量等等。超引數是需要進行程式設計師自己選擇的引數，無法學習獲得。
　　常見的超引數有模型（SVM，Softmax，Multi-layer Neural Network,…)，迭代演算法（Adam,SGD,…)，學習率（learning rate)(不同的迭代演算法還有各種不同的超引數，如beta1,beta2等等，但常見的做法是使用預設值，不進行調參）,正則化方程的選擇(L0,L1,L2)，正則化係數，dropout的概率等等。

2. 確定調節範圍

　　超引數的種類多，調節範圍大，需要先進行簡單的測試確定調參範圍。

2.1. 模型

　　模型的選擇很大程度上取決於具體的實際問題，但必須通過幾項基本測試。
　　首先，模型必須可以正常執行，即程式碼編寫正確。可以通過第一個epoch的loss估計，即估算第一個epoch的loss，並與實際結果比較。注意此過程需要設定正則項係數為0，因為正則項引入的loss難以估算。
　　其次，模型必須可以對於小資料集過擬合，即得到loss接近於0，accuracy接近於1的模型。否則應該嘗試其他或者更復雜的模型。
　　最後，如果val_acc與acc相差很小，可能是因為模型複雜度不夠，需要嘗試更為複雜的模型。

2.2. 學習率

loss基本不變：學習率過低
loss震動明顯或者溢位：學習率過高
根據以上兩條原則，可以得到學習率的大致範圍。

2.3. 正則項係數

val_acc與acc相差較大：正則項係數過小
loss逐漸增大：正則項係數過大
根據以上兩條原則，可以得到正則項係數的大致範圍。

3. 交叉驗證

　　對於訓練集再次進行切分，得到訓練集以及驗證集。通過訓練集訓練得到的模型，在驗證集驗證，從而確定超引數。（選取在驗證集結果最好的超引數）

3.1. 先粗調，再細調

　　先通過數量少，間距大的粗調確定細調的大致範圍。然後在小範圍內部進行間距小，數量大的細調。

3.2. 嘗試在對數空間內進行調節

　　即在對數空間內部隨機生成測試引數，而不是在原空間生成，通常用於學習率以及正則項係數等的調節。出發點是該超引數的指數項對於模型的結果影響更顯著；而同階的資料之間即便原域相差較大，對於模型結果的影響反而不如不同階的資料差距大。

3.3. 隨機搜尋引數值，而不是格點搜尋

random layout

通過隨機搜尋，可以更好的發現趨勢。圖中所示的是通過隨機搜尋可以發現數據在某一維上的變化更加明顯，得到明顯的趨勢。

網格搜尋

網格搜尋（Grid Search）名字非常大氣，但是用簡答的話來說就是你手動的給出一個模型中你想要改動的所用的引數，程式自動的幫你使用窮舉法來將所用的引數都執行一遍。決策樹中我們常常將最大樹深作為需要調節的引數；AdaBoost中將弱分類器的數量作為需要調節的引數。

評分方法

為了確定搜尋引數，也就是手動設定的調節的變數的值中，那個是最好的，這時就需要使用一個比較理想的評分方式（這個評分方式是根據實際情況來確定的可能是accuracy、f1-score、f-beta、pricise、recall等）

交叉驗證

有了好的評分方式，但是只用一次的結果就能說明某組的引數組合比另外的引數組合好嗎？這顯然是不嚴謹的。所以就有了交叉驗證這一概念。下面以K折交叉驗證為例介紹這一概念。

綜合格點搜素與交叉驗證得到GridSearchCV

class sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch=‘2*n_jobs’, error_score=’raise’, return_train_score=’warn’)

注意：scoring=None 預設為None， cv=None預設為None

Parameters:

estimator : estimator object.

This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.

param_grid : dict or list of dictionaries

Dictionary with parameters names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.

scoring : string, callable, list/tuple, dict or None, default: None

For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.

NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each.

If None, the estimator’s default scorer (if available) is used.

fit_params : dict, optional

Parameters to pass to the fit method.

Deprecated since version 0.19: fit_params as a constructor argument was deprecated in version 0.19 and will be removed in version 0.21. Pass fit parameters to the fitmethod instead.

n_jobs : int, default=1

Number of jobs to run in parallel.

pre_dispatch : int, or string, optional

Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs

An int, giving the exact number of total jobs that are spawned

A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’

iid : boolean, default=True

If True, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds.

cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

None, to use the default 3-fold cross validation,

integer, to specify the number of folds in a (Stratified)KFold,

An object to be used as a cross-validation generator.

An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

refit : boolean, or string, default=True

Refit an estimator using the best found parameters on the whole dataset.

For multiple metric evaluation, this needs to be a string denoting the scorer is used to find the best parameters for refitting the estimator at the end.

The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance.

Also for multiple metric evaluation, the attributes best_index_, best_score_ and best_parameters_ will only be available if refit is set and all of them will be determined w.r.t this specific scorer.

See scoring parameter to know more about multiple metric evaluation.

verbose : integer

Controls the verbosity: the higher, the more messages.

error_score : ‘raise’ (default) or numeric

Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.

return_train_score : boolean, optional

If False, the cv_results_ attribute will not include training scores.

Current default is 'warn', which behaves as True in addition to raising a warning when a training score is looked up. That default will be changed to False in 0.21. Computing training scores is used to get insights on how different parameter settings impact the overfitting/underfitting trade-off. However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that yield the best generalization performance.

Attributes:

cv_results_ : dict of numpy (masked) ndarrays

A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

For instance the below given table

param_kernel param_gamma param_degree split0_test_score … rank_t…
‘poly’ – 2 0.8 … 2
‘poly’ – 3 0.7 … 4
‘rbf’ 0.1 – 0.8 … 3
‘rbf’ 0.2 – 0.9 … 1
will be represented by a cv_results_ dict of:
{
'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],
                             mask = [False False False False]...)
'param_gamma': masked_array(data = [-- -- 0.1 0.2],
                            mask = [ True  True False False]...),
'param_degree': masked_array(data = [2.0 3.0 -- --],
                             mask = [False False  True  True]...),
'split0_test_score'  : [0.8, 0.7, 0.8, 0.9],
'split1_test_score'  : [0.82, 0.5, 0.7, 0.78],
'mean_test_score'    : [0.81, 0.60, 0.75, 0.82],
'std_test_score'     : [0.02, 0.01, 0.03, 0.03],
'rank_test_score'    : [2, 4, 3, 1],
'split0_train_score' : [0.8, 0.9, 0.7],
'split1_train_score' : [0.82, 0.5, 0.7],
'mean_train_score'   : [0.81, 0.7, 0.7],
'std_train_score'    : [0.03, 0.03, 0.04],
'mean_fit_time'      : [0.73, 0.63, 0.43, 0.49],
'std_fit_time'       : [0.01, 0.02, 0.01, 0.01],
'mean_score_time'    : [0.007, 0.06, 0.04, 0.04],
'std_score_time'     : [0.001, 0.002, 0.003, 0.005],
'params'             : [{'kernel': 'poly', 'degree': 2}, ...],
}
NOTE

The key 'params' is used to store a list of parameter settings dicts for all the parameter candidates.

The mean_fit_time, std_fit_time, mean_score_time and std_score_time are all in seconds.

For multi-metric evaluation, the scores for all the scorers are available in the cv_results_dict at the keys ending with that scorer’s name ('_<scorer_name>') instead of '_score'shown above. (‘split0_test_precision’, ‘mean_train_precision’ etc.)

best_estimator_ : estimator or dict

Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if refit=False.

See refit parameter for more information on allowed values.

best_score_ : float

Mean cross-validated score of the best_estimator

For multi-metric evaluation, this is present only if refit is specified.

best_params_ : dict

Parameter setting that gave the best results on the hold out data.

For multi-metric evaluation, this is present only if refit is specified.

best_index_ : int

The index (of the cv_results_ arrays) which corresponds to the best candidate parameter setting.

The dict at search.cv_results_['params'][search.best_index_] gives the parameter setting for the best model, that gives the highest mean score (search.best_score_).

For multi-metric evaluation, this is present only if refit is specified.

scorer_ : function or a dict

Scorer function used on the held out data to choose the best parameters for the model.

For multi-metric evaluation, this attribute holds the validated scoring dict which maps the scorer key to the scorer callable.

n_splits_ : int

The number of cross-validation splits (folds/iterations).

這裡附上scoring對應的評分準則

Examples

>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import GridSearchCV
>>> iris = datasets.load_iris()
>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svc = svm.SVC()
>>> clf = GridSearchCV(svc, parameters)
>>> clf.fit(iris.data, iris.target)
...                             
GridSearchCV(cv=None, error_score=...,
       estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
                     decision_function_shape='ovr', degree=..., gamma=...,
                     kernel='rbf', max_iter=-1, probability=False,
                     random_state=None, shrinking=True, tol=...,
                     verbose=False),
       fit_params=None, iid=..., n_jobs=1,
       param_grid=..., pre_dispatch=..., refit=..., return_train_score=...,
       scoring=..., verbose=...)
>>> sorted(clf.cv_results_.keys())
...                             
['mean_fit_time', 'mean_score_time', 'mean_test_score',...
 'mean_train_score', 'param_C', 'param_kernel', 'params',...
 'rank_test_score', 'split0_test_score',...
 'split0_train_score', 'split1_test_score', 'split1_train_score',...
 'split2_test_score', 'split2_train_score',...
 'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]

超引數的選擇、格點搜尋與交叉驗證

超引數的選擇

1. 超引數有哪些

2. 確定調節範圍

2.1. 模型

2.2. 學習率

2.3. 正則項係數

3. 交叉驗證

3.1. 先粗調，再細調

3.2. 嘗試在對數空間內進行調節

3.3. 隨機搜尋引數值，而不是格點搜尋

網格搜尋

評分方法

交叉驗證

超引數的選擇、格點搜尋與交叉驗證

Python機器學習庫sklearn網格搜尋與交叉驗證

超引數的選擇與交叉驗證

改善深層神經網路：超引數除錯、正則化以及優化_課程筆記_第一、二、三週

吳恩達改善深層神經網路引數：超引數除錯、正則化以及優化——優化演算法

改善深層神經網路：超引數除錯、正則化以及優化優化演算法第二週

吳恩達改善深層神經網路：超引數除錯、正則化以及優化第一週

改善深層神經網路——超引數除錯、Batch正則化和程式框架（7）

【scikit-learn】交叉驗證及其用於引數選擇、模型選擇、特徵選擇的例子

吳恩達《深度學習-改善深層神經網路》3--超引數除錯、正則化以及優化

《吳恩達深度學習工程師系列課程之——改善深層神經網路：超引數除錯、正則化以及優化》學習筆記

deeplearning.ai第二課第三週：超引數除錯、BN層

scikit-learn中交叉驗證及其用於引數選擇、模型選擇、特徵選擇的例子

吳恩達deeplearning.ai課程《改善深層神經網路：超引數除錯、正則化以及優化》____學習筆記（第一週）

Coursera吳恩達《優化深度神經網路》課程筆記（3）-- 超引數除錯、Batch正則化和程式設計框架

第2次課改善深層神經網路：超引數優化、正則化以及優化

吳恩達deep learning筆記第二課改善深層神經網路：超引數除錯、正則化以及優化

C++整型、浮點型與字符串型相互轉換

機器學習系列之偏差、方差與交叉驗證

機器學習：驗證數據集與交叉驗證

超引數的選擇、格點搜尋與交叉驗證

超引數的選擇

1. 超引數有哪些

2. 確定調節範圍

2.1. 模型

2.2. 學習率

2.3. 正則項係數

3. 交叉驗證

3.1. 先粗調，再細調

3.2. 嘗試在對數空間內進行調節

3.3. 隨機搜尋引數值，而不是格點搜尋

網格搜尋

評分方法

交叉驗證

相關推薦