1. 程式人生 > >caffe 在已有模型上繼續訓練

caffe 在已有模型上繼續訓練

一、

caffe 支援在別人的模型上繼續訓練。 下面是給的例子

caffe-master0818\examples\imagenet\resume_training.sh

#!/usr/bin/env sh

./build/tools/caffe train \
    --solver=models/bvlc_reference_caffenet/solver.prototxt \
    --snapshot=models/bvlc_reference_caffenet/caffenet_train_10000.solverstate.h5

二 ,caffe同樣支援多次降學習率訓練

比如  caffe-master0818\examples\cifar10\train_full.sh

#!/usr/bin/env sh

TOOLS=./build/tools

$TOOLS/caffe train \
    --solver=examples/cifar10/cifar10_full_solver.prototxt

# reduce learning rate by factor of 10
$TOOLS/caffe train \
    --solver=examples/cifar10/cifar10_full_solver_lr1.prototxt \                           // 這裡學習率了lr1配置檔案 
    --snapshot=examples/cifar10/cifar10_full_iter_60000.solverstate.h5

# reduce learning rate by factor of 10
$TOOLS/caffe train \
    --solver=examples/cifar10/cifar10_full_solver_lr2.prototxt \                           // <span style="font-family: Arial, Helvetica, sans-serif;"> 這裡學習率了lr1配置檔案</span>
    --snapshot=examples/cifar10/cifar10_full_iter_65000.solverstate.h5

另一個模型訓練
#!/usr/bin/env sh

TOOLS=./build/tools

$TOOLS/caffe train \
  --solver=examples/cifar10/cifar10_quick_solver.prototxt

# reduce learning rate by factor of 10 after 8 epochs
$TOOLS/caffe train \
  --solver=examples/cifar10/cifar10_quick_solver_lr1.prototxt \
  --snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.h5

這樣就不用每次都手動降學習率了。 

對於大的模型來說,多次降學習率還是很重要的, 實驗結果表明, 當第一次學習率 不再下降的時候,再次降學習率。能夠進一步降低損失函式

學習率太跳不到最低點,太小跳不出區域性最優點, 所以剛開始學習率要大一些, 防止進入區域性最優點。

-------------------------------------------------------------------------------------------------------------------------------------------

三、利用配置檔案,配置

當然降學習率也可以通過配置檔案配置。

http://caffe.berkeleyvision.org/tutorial/solver.html     caffe 官方例子

To use a learning rate policy like this, you can put the following lines somewhere in your solver prototxt file:

base_lr: 0.01     # begin training at a learning rate of 0.01 = 1e-2

lr_policy: "step" # learning rate policy: drop the learning rate in "steps"
                  # by a factor of gamma every stepsize iterations

gamma: 0.1        # drop the learning rate by a factor of 10
                  # (i.e., multiply it by a factor of gamma = 0.1)

stepsize: 100000  # drop the learning rate every 100K iterations

max_iter: 350000  # train for 350K iterations total

momentum: 0.9

Under the above settings, we’ll always use momentum μ=0.9. We’ll begin training at a base_lr of α=0.01=102 for the first 100,000 iterations, then multiply the learning rate by gamma (γ) and train at α=αγ=(0.01)(0.1)=0.001=103 for iterations 100K-200K, then at α′′=104 for iterations 200K-300K, and finally train until iteration 350K (since we havemax_iter: 350000) at α′′′=105.

Note that the momentum setting μ effectively multiplies the size of your updates by a factor of 11μ after many iterations of training, so if you increase μ, it may be a good idea to decrease αaccordingly (and vice versa).

For example, with μ=0.9, we have an effective update size multiplier of 110.9=10. If we increased the momentum to μ=0.99, we’ve increased our update size multiplier to 100, so we should drop α (base_lr) by a factor of 10.

Note also that the above settings are merely guidelines, and they’re definitely not guaranteed to be optimal (or even work at all!) in every situation. If learning diverges (e.g., you start to see very large or NaN or inf loss values or outputs), try dropping the base_lr (e.g., base_lr: 0.001) and re-training, repeating this until you find a base_lr value that works.

[1] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural NetworksAdvances in Neural Information Processing Systems, 2012.