Caffe中learning rate 和 weight decay 的理解

阿新 • • 發佈：2019-01-05

The learning rate is a parameter that determines how much an updating step influences the current value of the weights. While weight decay is an additional term in the weight update rule that causes the weights to exponentially decay to zero, if no other update is scheduled.

So let's say that we have a cost or error function E

(w) that we want to minimize. Gradient descent tells us to modify the weights w in the direction of steepest descent in E:

wi←wi−η∂E∂wi, where η is the learning rate, and if it's large you will have a correspondingly large modification of the weights wi (in general it shouldn't be too large, otherwise you'll overshoot the local minimum in your cost function).

In order to effectively limit the number of free parameters in your model so as to avoid over-fitting, it is possible to regularize the cost function. An easy way to do that is by introducing a zero mean Gaussian prior over the weights, which is equivalent to changing the cost function to E

˜(w)=E(w)+λ2w2. In practice this penalizes large weights and effectively limits the freedom in your model. The regularization parameter λ determines how you trade off the original cost E with the large weights penalization.

Applying gradient descent to this new cost function we obtain:

wi←wi−η∂E∂

wi−ηλwi. The new term −ηλwi coming from the regularization causes the weight to decay in proportion to its size.

Caffe中learning rate 和 weight decay 的理解

Caffe中learning rate 和 weight decay 的理解

ORACLE中的Server_name和SID概念理解

淺談caffe中train_val.prototxt和deploy.prototxt文件的區別

Java中關於WeakReference和WeakHashMap的理解

MPLS ×××中的RD和RT的理解

Java中的equals 和hashCode的理解

淺談caffe中train_val.prototxt和deploy.prototxt檔案的區別

SwingUtilities類中的invokeLater()和invokeAndWait()方法理解

CSS中相對定位和絕對定位理解

Lua中記憶體管理和釋放的理解

Faster R-CNN中的RPN和anchor機制理解

eCongnition中NN分類和規則分類理解

Caffe中deploy.prototxt 和 train_val.prototxt 區別

javascript中的prototype和proto的理解

Batch 、weight decay、momentum、normalization和正則化的一些理解和借鑑

權重衰減（weight decay）與學習率衰減（learning rate decay）

Adam和學習率衰減（learning rate decay）

理解angular中的module和injector，即依賴註入

css中絕對定位和相對定位，文檔流的理解

iptables中DNAT、SNAT和MASQUERADE的理解

Caffe中learning rate 和 weight decay 的理解

相關推薦