臺大李巨集毅
1:Regression-Case Study
為什麼在Loss function中,只考慮對w的正則化,而不考慮對b的正則化?
因為b是一條水平線,b對Loss function是否平滑幾乎不產生影響。
1-Regression Demo
Ada-Gradient時會詳細講解這個技巧:小的learning rate導致要很多次迭代才能達到最優解,大的learning rate有可能會有巨幅震盪,也無法達到最優解。有一個調參的技巧,就是對w和b剋制化的learning rate。
lr = 1
....................................
lr_b = 0
lr_w = 0
....................................
lr_b = lr_b + b_grad ** 2
lr_w = lr_w + w_grad ** 2
.................................
# update parameters.
b = b - lr/np.sqrt(lr_b)* b_grad
w = w- lr/np.sqrt(lr_w)* w_grad
2:Where does the error come from?
error due to “bias” and error due to “variance”。
簡單的model(model set比較小,這個小的model set可能根本不包含真實的target model),bias大,variance小;
複雜的model(model set比較大,這個大的model set可能就包含真實的target model),bias小,variance大。
如果error來自於variance很大,那麼就是overfitting;
如果error來自於bias很大,那麼就是underfitting;
What to do with large bias?
1、Diagnosis:
(1) If your model cannot even fit the training examples, then you have large bias.----> Underfitting.
(2) If you can fit the training data, but large error on testing data, then you probably have large variance. ----> Overfitting.
2、For bias, redesign your model:
(1) Add more features as input;
(2) A more complex model
What to do with large variance?
1、 More data(very effective, but not always practical)可以自己做訓練資料,例如翻轉、加噪聲等。
2、 Regularization (希望引數變化較小,曲線變平滑),但是可能會使你的model set 不包含target model,可能會傷害bias。