機器學習引數設定與預訓練模型設定
使用tensorlayer時,出現了大量相關的引數設定,通用的引數設定如下:
task = 'dcgan' flags = tf.app.flags flags.DEFINE_string('task','dcgan','this task name') flags.DEFINE_integer("epoch", 200, "Epoch to train [100]") flags.DEFINE_float("learning_rate", 0.0002, "Learning rate of for adam [0.0002]") flags.DEFINE_float("beta1", 0.5, "Momentum term of adam [0.5]") flags.DEFINE_float("weight_decay", 1e-5, "Weight decay for l2 loss") flags.DEFINE_float("pool_size", 50, 'size of image buffer that stores previously generated images, default: 50') flags.DEFINE_integer("train_size", 3000, "The size of train images [np.inf]") flags.DEFINE_integer("batch_size", 1, "The number of batch images [1] if we use InstanceNormLayer !") flags.DEFINE_integer("image_size", 256, "The size of image to use (will be center cropped) [256]") flags.DEFINE_integer("gf_dim", 32, "Size of generator filters in first layer") flags.DEFINE_integer("df_dim", 64, "Size of discriminator filters in first layer") # flags.DEFINE_integer("class_embedding_size", 5, "Size of class embedding") flags.DEFINE_integer("output_size", 256, "The size of the output images to produce [64]") flags.DEFINE_integer("sample_size", 64, "The number of sample images [64]") flags.DEFINE_integer("c_dim", 3, "Dimension of image color. [3]") flags.DEFINE_integer("sample_step", 500, "The interval of generating sample. [500]") flags.DEFINE_integer("save_step", 200, "The interval of saveing checkpoints. [200]") flags.DEFINE_string("dataset_dir", "spring2snow", "The name of dataset [horse2zebra, apple2orange, sunflower2daisy and etc]") flags.DEFINE_string("checkpoint_dir", "/home/liuwenjie/deep_save/{}/ckpt".format(task), "Directory name to save the checkpoints [checkpoint]") flags.DEFINE_string("sample_dir", '/home/liuwenjie/deep_save/{}/samples'.format(task), "Directory name to save the image samples [samples]") flags.DEFINE_string("direction", "forward", "The direction of generator [forward, backward]") flags.DEFINE_string("test_dir", "/home/liuwenjie/deep_save/{}/test".format(task), "The direction of test") flags.DEFINE_boolean("is_train", True, "True for training, False for testing [False]") flags.DEFINE_boolean("is_crop", False, "True for training, False for testing [False]") # flags.DEFINE_boolean("visualize", False, "True for visualizing, False for nothing [False]") FLAGS = flags.FLAGS
通過一個網路的學習,我發現大多數網路都需要進行以上的定義,以下一一解讀:
1. task:(dcgan_spring2winter_18.3.1)設定任務名稱,一般設計成為 {model name}-{data name}-{time},這樣每次訓練都可以獲得唯一確定的ckpt和image,保證多工不重複
2. n_ epoch:(200)設定task整批資料集執行的輪次
3. learning_rate :(0.002)一般起始的learning rate 設定為0.01,在神經網路的loss停止不下降的時候,lr下降一個數量級
4. beta1 :(0.5)設定在0.9或0.5左右使用,主要是給Adam作為引數
5. weight_decay:(1e-5)下降權重,一般不設定
6. pool_size:(50)
7. train_size:(len<train_dataset)訓練資料集,一般都小於訓練集,訓練時需要len(min(min(train_A,tran_label),train_size))防止錯誤
8. batch_size:(1-10)一次放入網路的圖片數量
9. mf_dim:(64)網路的第一個卷積層使用的卷積核個數
10.output_size:(256)輸出圖片的數量
11.sample_size:(64)sample的圖片的數量
12.c_dim:(3)圖片的通道數目
13.sample_step:(500)執行多少步儲存一次sample
14.save_step:(10)執行多少次進行一次step的儲存工作
15.dataset_dir:(/home/deep_save/{}/datasetdir.format(task))檔案儲存位置
16.ckpt_dir:(/home/deep_save/{}/ckpt_dir.format(task))ckpt儲存位置
17.sample_dir:(/home/deep_save/{}/sample_dir.format(task))sample儲存的資料夾
18.direction::(forward,backword)正向傳播,反向傳播
19.test_dir:(./test)一般將test放在顯眼的位置
20.is_train:train或test
21.is_crop:是否需要將輸入剪裁
在使用機器學習的過程中,通常需要使用預訓練模型
最基礎的幾個預訓練模型都是使用ImageNet進行訓練的resnet,cgg16,googleNet.無論是做分類,分割或者是做檢測問題,都需要使用這三種的一個,不然都很難獲得一個較為良好的訓練結果.以tensorlayer中使用vgg16為例,演示一下如何使用tensorlayer逐步對每個層進行引數載入:
#! /usr/bin/python
# -*- coding: utf-8 -*-
"""
VGG-16 for ImageNet.
Introduction
----------------
VGG is a convolutional neural network model proposed by K. Simonyan and A. Zisserman
from the University of Oxford in the paper “Very Deep Convolutional Networks for
Large-Scale Image Recognition” . The model achieves 92.7% top-5 test accuracy in ImageNet,
which is a dataset of over 14 million images belonging to 1000 classes.
Download Pre-trained Model
----------------------------
- Model weights in this example - vgg16_weights.npz : http://www.cs.toronto.edu/~frossard/post/vgg16/
- Caffe VGG 16 model : https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md
- Tool to convert the Caffe models to TensorFlow's : https://github.com/ethereon/caffe-tensorflow
Note
------
- For simplified CNN layer see "Convolutional layer (Simplified)"
in read the docs website.
- When feeding other images to the model be sure to properly resize or crop them
beforehand. Distorted images might end up being misclassified. One way of safely
feeding images of multiple sizes is by doing center cropping, as shown in the
following snippet:
>>> image_h, image_w, _ = np.shape(img)
>>> shorter_side = min(image_h, image_w)
>>> scale = 224. / shorter_side
>>> image_h, image_w = np.ceil([scale * image_h, scale * image_w]).astype('int32')
>>> img = imresize(img, (image_h, image_w))
>>> crop_x = (image_w - 224) / 2
>>> crop_y = (image_h - 224) / 2
>>> img = img[crop_y:crop_y+224,crop_x:crop_x+224,:]
"""
import os
import time
import numpy as np
from scipy.misc import imread, imresize
import tensorflow as tf
import tensorlayer as tl
from tensorlayer.layers import *
# try:
# from tensorlayer.models.imagenet_classes import *
# except Exception as e:
# raise Exception(
# "{} / download the file from: https://github.com/zsdonghao/tensorlayer/tree/master/example/data".format(e)
# )
def conv_layers(net_in):
with tf.name_scope('preprocess'):
# Notice that we include a preprocessing layer that takes the RGB image
# with pixels values in the range of 0-255 and subtracts the mean image
# values (calculated over the entire ImageNet training set).
mean = tf.constant([123.68, 116.779, 103.939], dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean')
net_in.outputs = net_in.outputs - mean
# conv1
net = Conv2dLayer(net_in, act=tf.nn.relu, shape=[3, 3, 3, 64], strides=[1, 1, 1, 1], padding='SAME', name='conv1_1')
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 64, 64], strides=[1, 1, 1, 1], padding='SAME', name='conv1_2')
net = PoolLayer(net, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', pool=tf.nn.max_pool, name='pool1')
# conv2
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 64, 128], strides=[1, 1, 1, 1], padding='SAME', name='conv2_1')
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 128, 128], strides=[1, 1, 1, 1], padding='SAME', name='conv2_2')
net = PoolLayer(net, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', pool=tf.nn.max_pool, name='pool2')
# conv3
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 128, 256], strides=[1, 1, 1, 1], padding='SAME', name='conv3_1')
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 256, 256], strides=[1, 1, 1, 1], padding='SAME', name='conv3_2')
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 256, 256], strides=[1, 1, 1, 1], padding='SAME', name='conv3_3')
net = PoolLayer(net, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', pool=tf.nn.max_pool, name='pool3')
# conv4
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 256, 512], strides=[1, 1, 1, 1], padding='SAME', name='conv4_1')
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 512, 512], strides=[1, 1, 1, 1], padding='SAME', name='conv4_2')
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 512, 512], strides=[1, 1, 1, 1], padding='SAME', name='conv4_3')
net = PoolLayer(net, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', pool=tf.nn.max_pool, name='pool4')
# conv5
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 512, 512], strides=[1, 1, 1, 1], padding='SAME', name='conv5_1')
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 512, 512], strides=[1, 1, 1, 1], padding='SAME', name='conv5_2')
net = Conv2dLayer(net, act=tf.nn.relu, shape=[3, 3, 512, 512], strides=[1, 1, 1, 1], padding='SAME', name='conv5_3')
net = PoolLayer(
net, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', pool=tf.nn.max_pool, name='pool5')
return net
def conv_layers_simple_api(net_in):
with tf.name_scope('preprocess'):
# Notice that we include a preprocessing layer that takes the RGB image
# with pixels values in the range of 0-255 and subtracts the mean image
# values (calculated over the entire ImageNet training set).
mean = tf.constant([123.68, 116.779, 103.939], dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean')
net_in.outputs = net_in.outputs - mean
# conv1
net = Conv2d(net_in, 64, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv1_1')
net = Conv2d(net, n_filter=64, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv1_2')
net = MaxPool2d(net, filter_size=(2, 2), strides=(2, 2), padding='SAME', name='pool1')
# conv2
net = Conv2d(net, n_filter=128, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv2_1')
net = Conv2d(net, n_filter=128, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv2_2')
net = MaxPool2d(net, filter_size=(2, 2), strides=(2, 2), padding='SAME', name='pool2')
# conv3
net = Conv2d(net, n_filter=256, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv3_1')
net = Conv2d(net, n_filter=256, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv3_2')
net = Conv2d(net, n_filter=256, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv3_3')
net = MaxPool2d(net, filter_size=(2, 2), strides=(2, 2), padding='SAME', name='pool3')
# conv4
net = Conv2d(net, n_filter=512, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv4_1')
net = Conv2d(net, n_filter=512, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv4_2')
net = Conv2d(net, n_filter=512, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv4_3')
net = MaxPool2d(net, filter_size=(2, 2), strides=(2, 2), padding='SAME', name='pool4')
# conv5
net = Conv2d(net, n_filter=512, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv5_1')
net = Conv2d(net, n_filter=512, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv5_2')
net = Conv2d(net, n_filter=512, filter_size=(3, 3), strides=(1, 1), act=tf.nn.relu, padding='SAME', name='conv5_3')
net = MaxPool2d(net, filter_size=(2, 2), strides=(2, 2), padding='SAME', name='pool5')
return net
def fc_layers(net):
net = FlattenLayer(net, name='flatten')
net = DenseLayer(net, n_units=4096, act=tf.nn.relu, name='fc1_relu')
net = DenseLayer(net, n_units=4096, act=tf.nn.relu, name='fc2_relu')
net = DenseLayer(net, n_units=1000, act=tf.nn.relu, name='fc3_relu')
return net
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
# y_ = tf.placeholder(tf.int32, shape=[None, ], name='y_')
net_in = InputLayer(x, name='input')
# net_cnn = conv_layers(net_in) # professional CNN APIs
net_cnn = conv_layers_simple_api(net_in) # simplified CNN APIs
net = fc_layers(net_cnn)
y = net.outputs
probs = tf.nn.softmax(y)
y1 = net_cnn.outputs
# y_op = tf.argmax(tf.nn.softmax(y), 1)
# cost = tl.cost.cross_entropy(y, y_, name='cost')
# correct_prediction = tf.equal(tf.cast(tf.argmax(y, 1), tf.float32), tf.cast(y_, tf.float32))
# acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tl.layers.initialize_global_variables(sess)
net.print_params()
net.print_layers()
tl.files.maybe_download_and_extract(
'vgg16_weights.npz', '/home/liuwenjie/premodels', 'http://www.cs.toronto.edu/~frossard/vgg16/', expected_bytes=553436134
)
npz = np.load(os.path.join('/home/liuwenjie/premodels', 'vgg16_weights.npz'))
params = []
for val in sorted(npz.items()):
print(" Loading params %s" % str(val[1].shape))
params.append(val[1])
tl.files.assign_params(sess, params[0:16], net_cnn)
img1 = imread('/home/liuwenjie/liuwenjie/tensorflow_workplace/image2012_1.jpg', mode='RGB') # test data in github
img1 = imresize(img1, (224, 224))
y = sess.run(probs, feed_dict={x: [img1]})[0] # 1st time take time to compile
# start_time = time.time()
# prob = sess.run(probs, feed_dict={x: [img1]})[0]
# print(" End time : %.5ss" % (time.time() - start_time))
# preds = (np.argsort(prob)[::-1])[0:5]
# for p in preds:
# print(p, prob[p])
ys = sess.run(y1,feed_dict={x:[img1]})[0]
print(y)
print(ys)
在這裡我複製了網上vgg16的引數載入程式碼,可以看到,在預處理階段,使用減去均值的辦法進行處理,將圖片輸入,若只需要卷積層不需要全連線層,則將原來npz的前16層的權重加入net_cnn,因為全連線層的資料佔全部引數的70%以上,所以做語義分割的過程中不是很建議使用denselayer層的引數.在使用了這一系列操作之後,使用一張圖片測試一下帶denselayer的檢測圖片的結果,若有顯示則載入預訓練模型成功