Grid Search for Tensorflow Deep Learning
Grid Search是Hyperparameter Tuning階段傳統的除錯方法,也稱為Exaustive Search。在除錯前,需要選定演算法,需要確定待調引數,以及待調引數的可能取值。Grid Search演算法會在引數空間中窮盡嘗試所有組合,選出表現最佳的引數。
Outline:
本文會用Sklearn的Boston房價預測作為資料集
先做一次‘調包俠’,用Sklearn內建的GridSearchCV來實現
然後在Tensorflow下實現一次,主要說明Deep Learning環境下的Grid Search操作
Boston House Price Problem:
經典資料集之一,內建在sklearn.datasets裡,我們直接載入:
from sklearn import datasets boston = datasets.load_boston() X = boston["data"] Y = boston["target"] print(X.shape) print(Y.shape)
我們可以知道,一共有506條資料,13列(attributes),要求預測房價的數額(Regression Problem)。為了簡化過程,我們不去深究各個attribute的意義了。
Grid Search in Sklearn:
sklearn.model_selection.GridSearchCV實現了該功能。在下面的例子中,我們選取了SVR作為model來預測Boston房價。主要tuning的引數有2個,kernel型別(linear or RBF)以及C(=1 or 10)。預設採用5-fold Cross-Validation來評估模型。程式碼如下:
model = SVR(gamma='scale') parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} reg = GridSearchCV(model, parameters, cv=5) reg.fit(X, Y) sorted(reg.cv_results_.keys()) print(reg.cv_results_)
從列印的結果中,我們可以看到各個parameter組合所構建的模型評分,最終Linear Kernel&C=1勝出。
{... 'params': [{'C': 1, 'kernel': 'linear'}, {'C': 1, 'kernel': 'rbf'}, {'C': 10, 'kernel': 'linear'}, {'C': 10, 'kernel': 'rbf'}], 'split0_test_score': array([ 0.77285459, 0.12029639, 0.77953306, -0.04157249]), 'split1_test_score': array([ 0.72771739, -0.08134385, 0.72810716, 0.01592944]), 'split2_test_score': array([ 0.56131914, -0.79967714, 0.63566857, -0.38338425]), 'split3_test_score': array([0.15056451, 0.09037651, 0.02786433, 0.25941567]), 'split4_test_score': array([ 0.08212844, -0.90391602, -0.07224368, -0.62731013]), 'mean_test_score': array([ 0.45953725, -0.31399285, 0.42049685, -0.15515943]), 'std_test_score': array([0.289307 , 0.44498376, 0.36516833, 0.31248031]), 'rank_test_score': array([1, 4, 2, 3]), 'split0_train_score': array([0.70714979, 0.39723582, 0.70149448, 0.70558716]), 'split1_train_score': array([0.68986786, 0.39850963, 0.68696465, 0.68704436]), 'split2_train_score': array([0.62838757, 0.37872469, 0.64670086, 0.66406787]), 'split3_train_score': array([0.82850586, 0.38276233, 0.82941506, 0.73598928]), 'split4_train_score': array([0.69005814, 0.29652628, 0.69148868, 0.64436246]), 'mean_train_score': array([0.70879385, 0.37075175, 0.71121274, 0.68741023]), 'std_train_score': array([0.06558667, 0.03791872, 0.06197584, 0.03190121]) }
至於為什麼要選擇SVR模型,而不是選擇Neural Network,從而利於與下章的Tensorflow程式碼做對比。原因在於,隨意組合的Neural Network引數,很有可能導致模型無法收斂,甚至把Cost Function推向Nan,從而丟擲異常。So...
Grid Search in Tensorflow Deep Learning:
在下面的例子中,我們自己定義迴圈函式來遍歷所有的Parameter組合,構建不同的Model並做Model的表現評估。我們首先定義兩個功能函式。第一個需要確定Parameter Scope,model_configs函式來生成Configuration List:
def model_configs(): # define scope of configs learning_rate = [0.0001,0.01] layer1_nodes = [16,32] layer2_nodes = [8,4] # create configs configs = list() for i in learning_rate: for j in layer1_nodes: for k in layer2_nodes: cfg = [i,j,k] configs.append(cfg) print('Total configs: %d' % len(configs)) return configs
第二個是在Tensorflow中增加神經網路Hidden Layer的函式add_layer:
def add_layer(name1,inputs,in_size,out_size,activation_function=None): Weights = tf.get_variable(name1,[in_size, out_size], \ initializer = tf.contrib.layers.xavier_initializer()) biases = tf.Variable(tf.zeros([1, out_size]) + 0.1) Wx_plus_b = tf.matmul(inputs, Weights) + biases if activation_function is None: outputs = Wx_plus_b else: outputs = activation_function(Wx_plus_b) return outputs
最後是主流程控制程式,整體來講是遍歷Configure List裡面的所有Parameter組合,來構建Tensorflow Neural Network。在訓練中,使用MSE作為Cost Function,使用Adam Optimizer來做優化。在訓練好的Models上,使用20%的測試集&MSE來評估各個Model的表現。
cfg_list = model_configs() error_list = [] for cfg in cfg_list: #unzip hyperparameters learning_rate,layer1_nodes,layer2_nodes = cfg; X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, shuffle=True) #define model tf.reset_default_graph() tf_x = tf.placeholder(tf.float32, [None, 13]) tf_y = tf.placeholder(tf.int32, [None, 1]) l1 = add_layer('l1',tf_x,13, layer1_nodes, activation_function=tf.nn.relu) l2 = add_layer('l2',l1, layer1_nodes, layer2_nodes, activation_function=tf.nn.relu) pred = add_layer('out',l2, layer2_nodes, 1, activation_function=tf.nn.relu) with tf.name_scope('loss'): loss = tf.losses.mean_squared_error(tf_y, pred) #sigmoid_cross_entropy_with_logits(labels=tf_y, logits=pred) tf.summary.scalar("loss",tensor=loss) train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss) init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) sess = tf.Session() sess.run(init_op) for j in range(0,10000): sess.run(train_op,{tf_x:X_train,tf_y:y_train.reshape([y_train.shape[0],1])}) cost_ = sess.run(loss, {tf_x:X_train,tf_y:y_train.reshape([y_train.shape[0],1])}) loss = sess.run(loss, feed_dict={tf_x: X_test, tf_y:y_test.reshape([y_test.shape[0],1])}) print('test loss: %.2f'% loss) error_list.append(loss) sess.close() print(cfg_list) print(error_list)
[[0.0001, 16, 8], [0.0001, 16, 4], [0.0001, 32, 8], [0.0001, 32, 4], [0.01, 16, 8], [0.01, 16, 4], [0.01, 32, 8], [0.01, 32, 4]] [659.03925, 15.627606, 34.55378, 598.14703, 579.9314, 10.684119, 25.026648, 103.17941]
最後,個人感覺Naive Grid Search是不太適合Deep Learning問題的。首先,我們看到在例子中,模型是假定的、層數是假定的、Activation是Relu,Optimizer也只使用了Adam,還有例如初始化方法,Cost Function選擇,Regularization等林林總總的引數可以被調節。可以說,需要Data Scientist來制定Deep Learning的Optimization計劃,然後固定幾個引數,再使用Grid Search來遍歷搜尋。並且,如此多的引數維度,遍歷所有引數組合是不可能的,所以應該引入更智慧的調參方式。