Caffe中learning rate 和 weight decay 的理解
The learning rate is a parameter that determines how much an updating step influences the current value of the weights. While weight decay is an additional term in the weight update rule that causes the weights to exponentially decay to zero, if no other update is scheduled.
So let's say that we have a cost or error function E
In order to effectively limit the number of free parameters in your model so as to avoid over-fitting, it is possible to regularize the cost function. An easy way to do that is by introducing a zero mean Gaussian prior over the weights, which is equivalent
to changing the cost function to E
Applying gradient descent to this new cost function we obtain:
wi←wi−η∂E∂