1. 程式人生 > >斯坦福CS231n作業程式碼(漢化)Assignment 2 Q5

斯坦福CS231n作業程式碼(漢化)Assignment 2 Q5

TensorFlow是個什麼東東?

編寫:土豆 MoreZheng SlyneD
校對:碧海聽滔 Molly
總校對與稽核:寒小陽

在前面的作業中你已經寫了很多程式碼來實現很多的神經網路功能。Dropout, Batch Norm 和 2D卷積是深度學習在計算機視覺中的一些重活。你已經很努力地讓你的程式碼有效率以及向量化。

對於這份作業的最後一個部分,我們不會繼續探討之前的程式碼,而是轉到兩個流行的深度學習框架之一。在這份Notebook中,主要是Tensorflow(在其他的notebook中,還會有PyTorch程式碼).

TensorFlow是什麼?

Tensorflow是基於Tensor來執行計算圖的系統,對於變數(Variables)有原生的自動反向求導的功能。在它裡面,我們用的n維陣列的tensor相當於是numpy中的ndarray。

為什麼用tensorflow?

  • 我們的程式碼將會執行在GPU上,因此會在訓練的時候快很多。不過,編寫在GPU上執行的程式模組的方法不在這門課的範圍內。
  • 我們希望你為你的專案使用這些框架,這樣比起你自己編寫基礎程式碼,要更加有效率。
  • 我們希望你們可以站在巨人的肩膀上!TensorFlow和PyTorch都是優秀的框架,可以讓你的生活更輕鬆,既然你已經明白了他們的原理,你就可以隨意地使用它們了。
  • 我們希望你可以編寫一些能在學術或工業界可以使用的深度學習程式碼。

我該怎麼學習TensorFlow?

TensorFlow已經有許多優秀的教程,包括來自google自己的那些

另外,這個notebook也會帶領你過一遍在TensorFlow中,訓練模型所需要用到的許多東西。如果你需要學習更多內容,或者瞭解更多細節,可以去看本Notebook的結尾部分,那裡可以找到一些有用的教程連結。

載入資料

import tensorflow as tf
import numpy as np
import math
import timeit
import matplotlib.pyplot as plt
%matplotlib inline
from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=10000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
# Load the raw CIFAR-10 data cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) # Subsample the data mask = range(num_training, num_training + num_validation) X_val = X_train[mask] y_val = y_train[mask] mask = range(num_training) X_train = X_train[mask] y_train = y_train[mask] mask = range(num_test) X_test = X_test[mask] y_test = y_test[mask] # Normalize the data: subtract the mean image mean_image = np.mean(X_train, axis=0) X_train -= mean_image X_val -= mean_image X_test -= mean_image return X_train, y_train, X_val, y_val, X_test, y_test # Invoke the above function to get our data. X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data() print('Train data shape: ', X_train.shape) print('Train labels shape: ', y_train.shape) print('Validation data shape: ', X_val.shape) print('Validation labels shape: ', y_val.shape) print('Test data shape: ', X_test.shape) print('Test labels shape: ', y_test.shape)
Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

案例模型

一些實用的建議

我們的影象資料格式是:N x H x W x C, 其中

  • N 是資料點的個數
  • H 是每張圖片的高度(單位:畫素)
  • W 是每張圖片的寬度(單位: 畫素)
  • C 是通道的數量 (通常是3:R, G, B)

這是一種正確的表示資料的方式,比如當我們做一些像是2D卷積這樣的操作,需要理解空間上相鄰的畫素點。但是,當我們把影象資料放到全連線的仿射層(affine layers)中時,我們希望一個數據樣本可以用一個向量來表示,這個時候,把資料分成不同的通道、行和列就不再有用了。

案例模型本尊

訓練你自己模型的第一步就是要定義它的結構。
這裡有一個定義在TensorFlow中的卷積神經網路的例子 – 試著搞清楚每一行都在做什麼,要記住,每一行都建立在前一行之上。 目前我們還沒有訓練什麼東西 – 這後面會講到 – 現在, 我們希望你能夠明白這些東西都是怎麼建立起來的。

在這個例子裡面,你們會看到2D的卷積層, ReLU啟用層,和全連線層(線性的)。 你們也會看到Hinge loss損失函式, 以及Adam優化器是如何使用的。

確保要明白為什麼線性層的引數是5408和10。

TensorFlow細節

在TensorFlow中,像我們前面的Notebook一樣,我們首先要初始化我們的變數,然後是我們的模型。

# clear old variables
tf.reset_default_graph()

# setup input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
# 設定輸入,比如每個batch要輸入的資料
# 第一維是None, 可以根據輸入的batch size自動改變。

X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)

def simple_model(X,y):
    # define our weights (e.g. init_two_layer_convnet)
    #  定義權重W
    # setup variables
    # 設定變數
    Wconv1 = tf.get_variable("Wconv1", shape=[7, 7, 3, 32])
    bconv1 = tf.get_variable("bconv1", shape=[32])
    W1 = tf.get_variable("W1", shape=[5408, 10]) 
    b1 = tf.get_variable("b1", shape=[10])

    # define our graph (e.g. two_layer_convnet)
    # 定義我們的圖 


    # 這裡我們需要用到conv2d函式,建議大家仔細閱讀官方文件
    # tf.nn.conv2d()  https://www.tensorflow.org/api_docs/python/tf/nn/conv2d
    # conv2d(input,filter,strides,padding,use_cudnn_on_gpu=None,data_format=None,name=None)
    # input : [batch, in_height, in_width, in_channels]
    # filter/kernel: [filter_height, filter_width, in_channels, out_channels]
    # strides: 長度為4的1維tensor,用來指定在每一個維度上的滑動的視窗滑動的步長
    # 水平或者垂直滑動通常會指定strides = [1,stride,,stride,1] 
    # padding: 'VALID' 或者是 'SAME'
    # data_format: 資料的輸入格式,預設是‘NHWC’ 


    # 根據輸出的大小的公式:(W-F+2P)/S + 1
    # W: 影象寬度   32
    # F:Filter的寬度  7
    # P: padding了多少  0
    # padding='valid' 就是不padding  padding='same' 自動padding若干個行列使得輸出的feature map和原輸入feature map的尺寸一致
    # S: stride 步長  2

    a1 = tf.nn.conv2d(X, Wconv1, strides=[1,2,2,1], padding='VALID') + bconv1
    # (W-F+2P)/S + 1 = (32 - 7 + 2*0)/2 + 1 = 13
    # 那麼輸出的feature map的尺寸就是 13 * 13 * 32 = 5408   (Wconv1 有32個out channels, 也就是說有32個filters)

    h1 = tf.nn.relu(a1) # 對a1中的每個神經元加上啟用函式relu
    h1_flat = tf.reshape(h1,[-1,5408])  # reshape h1,把feature map展開成 batchsize * 5408
    y_out = tf.matmul(h1_flat,W1) + b1  # 得到輸出的logits: y_out
    return y_out

y_out = simple_model(X,y)

# define our loss
# 定義我們的loss

total_loss = tf.losses.hinge_loss(tf.one_hot(y,10),logits=y_out)
mean_loss = tf.reduce_mean(total_loss) # loss求平均

# define our optimizer
# 定義優化器,設定學習率
optimizer = tf.train.AdamOptimizer(5e-4) # select optimizer and set learning rate
train_step = optimizer.minimize(mean_loss)

TensorFlow支援許多其他層的型別,損失函式和優化器 - 你將在後面的實驗中遇到。 這裡是官方的API文件(如果上面有任何引數搞不懂,這些資源就會非常有用)

訓練一輪

我們在上面已經定義了圖所需要的操作,為了能夠執行TensorFlow圖中定義的計算,我們首先需要建立一個tf.Session物件。一個session中包含了TensorFlow執行時的狀態。更多內容請參考TensorFlow指南 Getting started

我們也可以指定一個裝置,比如/cpu:0 或者 /gpu:0。 這種型別的操作可以參考this TensorFlow guide

下面你應該可以看到驗證集上的loss在0.4到0.6之間,準確率在0.3到0.35。

def run_model(session, predict, loss_val, Xd, yd,
              epochs=1, batch_size=64, print_every=100,
              training=None, plot_losses=False):

    '''
    run model函式主要是控制整個訓練的流程,需要傳入session,呼叫session.run(variables)會得到variables裡面各個變數的值。
    這裡當訓練模式的時候,也就是training!=None,我們傳入的training是之前定義的train_op,呼叫session.run(train_op)會自動完成反向求導,
    整個模型的引數會發生更新。
    當training==None時,是我們需要對驗證集合做一次預測的時候(或者是測試階段),這時我們不需要反向求導,所以variables裡面並沒有加入train_op
    '''
    # have tensorflow compute accuracy
    # 計算準確度(ACC值)
    correct_prediction = tf.equal(tf.argmax(predict,1), y)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # shuffle indicies
    # 對訓練樣本進行混洗
    train_indicies = np.arange(Xd.shape[0])
    np.random.shuffle(train_indicies)

    training_now = training is not None

    # setting up variables we want to compute (and optimizing)
    # if we have a training function, add that to things we compute
    # 設定需要計算的變數
    # 如果需要進行訓練,將訓練過程(training)也加進來
    variables = [mean_loss,correct_prediction,accuracy]
    if training_now:
        variables[-1] = training

    # counter 
    # 進行迭代
    iter_cnt = 0
    for e in range(epochs):
        # keep track of losses and accuracy
        # 記錄損失函式和準確度的變化
        correct = 0
        losses = []
        # make sure we iterate over the dataset once
        # 確保每個訓練樣本都被遍歷
        for i in range(int(math.ceil(Xd.shape[0]/batch_size))):
            # generate indicies for the batch
            # 產生一個minibatch的樣本
            start_idx = (i*batch_size)%Xd.shape[0]
            idx = train_indicies[start_idx:start_idx+batch_size]

            # create a feed dictionary for this batch
            # 生成一個輸入字典(feed dictionary)
            feed_dict = {X: Xd[idx,:],
                         y: yd[idx],
                         is_training: training_now }
            # get batch size
            # 獲取minibatch的大小
            actual_batch_size = yd[idx].shape[0]

            # have tensorflow compute loss and correct predictions
            # and (if given) perform a training step
            # 計算損失函式和準確率
            # 如果是訓練模式的話,執行訓練過程
            loss, corr, _ = session.run(variables,feed_dict=feed_dict)

            # aggregate performance stats
            # 記錄本輪的訓練表現
            losses.append(loss*actual_batch_size)
            correct += np.sum(corr)

            # print every now and then
            # 定期輸出模型表現
            if training_now and (iter_cnt % print_every) == 0:
                print("Iteration {0}: with minibatch training loss = {1:.3g} and accuracy of {2:.2g}"\
                      .format(iter_cnt,loss,np.sum(corr)/actual_batch_size))
            iter_cnt += 1
        total_correct = correct/Xd.shape[0]
        total_loss = np.sum(losses)/Xd.shape[0]
        print("Epoch {2}, Overall loss = {0:.3g} and accuracy of {1:.3g}"\
              .format(total_loss,total_correct,e+1))
        if plot_losses:
            plt.plot(losses)
            plt.grid(True)
            plt.title('Epoch {} Loss'.format(e+1))
            plt.xlabel('minibatch number')
            plt.ylabel('minibatch loss')
            plt.show()
    return total_loss,total_correct

with tf.Session() as sess:
    with tf.device("/cpu:0"): #"/cpu:0" or "/gpu:0" 
        sess.run(tf.global_variables_initializer())
        print('Training')
        run_model(sess,y_out,mean_loss,X_train,y_train,1,64,100,train_step,True)
        print('Validation')
        run_model(sess,y_out,mean_loss,X_val,y_val,1,64)
Training
Iteration 0: with minibatch training loss = 14.5 and accuracy of 0.078
Iteration 100: with minibatch training loss = 0.89 and accuracy of 0.34
Iteration 200: with minibatch training loss = 0.678 and accuracy of 0.33
Iteration 300: with minibatch training loss = 0.832 and accuracy of 0.16
Iteration 400: with minibatch training loss = 0.524 and accuracy of 0.33
Iteration 500: with minibatch training loss = 0.487 and accuracy of 0.44
Iteration 600: with minibatch training loss = 0.467 and accuracy of 0.33
Iteration 700: with minibatch training loss = 0.399 and accuracy of 0.41
Epoch 1, Overall loss = 0.771 and accuracy of 0.31

png

Validation
Epoch 1, Overall loss = 0.472 and accuracy of 0.373

訓練一個特定的模型

在這部分,我們會指定一個模型需要你來構建。這裡的目標並不是為了得到好的效能(後面會需要),只是為了讓你適應理解TensorFlow的文件以及配置你自己的模型。
用上面的程式碼作為指導,用相應的TensorFlow文件構建一個下面這樣結構的模型:

  • 7x7的卷積視窗,32個卷積核,步長為1
  • ReLU啟用層
  • BatchNorm層(可訓練變數,包含中心(centering)和範圍(scale))
  • 2x2 的Max Pooling層,步長為2
  • 包含1024個神經元的仿射層(affine layer)
  • ReLu啟用層
  • 1024個輸入單元,10個輸出單元的仿射層

這裡的卷積,啟用函式,全連線層都跟之前的程式碼相似。

這裡的bath_normalization主要用到兩個函式:
tf.nn.moments() 用來計算mean, variance
tf.nn.batchnormalization() 根據預先算好的mean和variance對資料進行batch norm.

另外,我們在課件中看到的beta和gamma,在tf.nn.batchnormalization中對應的分別是offset和scale,這點在文件中都有詳細的說明。
值得注意的是,在測試中,我們用到的mean和variance並不是當前測試集batch的mean和variance,而應該是對訓練集訓練過程中逐步迭代獲得的。我這裡的逐步迭代是加入了decay,來用每次新的batch的mean和variance,更新一點全域性的mean,variance。
另外,我們更新了全域性的mean和variance,需要新增

tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_moving_mean)
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_moving_variance)

這兩個操作, 並且我們的train_step需要稍作修改:

# batch normalization in tensorflow requires this extra dependency
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    train_step = optimizer.minimize(mean_loss)
from tensorflow.python.training import moving_averages
from tensorflow.python.ops import control_flow_ops
# clear old variables
# 清除舊變數
tf.reset_default_graph()

# define our input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
# 定義輸入資料(如每輪迭代中都會改變的資料)
# 第一維是None,每次迭代時都會根據輸入資料自動設定
X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)

# define model
# 定義模型
def complex_model(X,y,is_training):
    # parameters
    # 定義一些常量
    MOVING_AVERAGE_DECAY = 0.9997
    BN_DECAY = MOVING_AVERAGE_DECAY
    BN_EPSILON = 0.001

    # 7x7 Convolutional Layer with 32 filters and stride of 1
    # 7x7的卷積視窗,32個卷積核,步長為1
    Wconv1 = tf.get_variable("Wconv1", shape=[7, 7, 3, 32])
    bconv1 = tf.get_variable("bconv1", shape=[32])
    h1 = tf.nn.conv2d(X, Wconv1, strides=[1,1,1,1], padding='VALID') + bconv1
    # ReLU Activation Layer
    # ReLU啟用層
    a1 = tf.nn.relu(h1)  # a1的形狀是 [batch_size, 26, 26, 32]
    # Spatial Batch Normalization Layer (trainable parameters, with scale and centering)
    # for so-called "global normalization", used with convolutional filters with shape [batch, height, width, depth],
    # 與全域性標準化(global normalization)對應,這裡的標準化過程我們稱之為區域性標準化(Spatial Batch Normalization)。記住,我們的卷積視窗大小是[batch, height, width, depth]
    # pass axes=[0,1,2]
    # 需要標準化的軸的索引是 axes = [0, 1, 2]
    axis = list(range(len(a1.get_shape()) - 1))  # axis = [0,1,2]
    mean, variance = tf.nn.moments(a1, axis) # mean, variance for each feature map 求出每個卷積結果(feature map)的平均值,方差

    params_shape = a1.get_shape()[-1:]   # channel or depth 取出最後一維,即通道(channel)或叫深度(depth)
    # each feature map should have one beta and one gamma
    # 每一片卷積結果(feature map)都有一個beta值和一個gamma值
    beta = tf.get_variable('beta',
                         params_shape,
                         initializer=tf.zeros_initializer)

    gamma = tf.get_variable('gamma',
                          params_shape,
                          initializer=tf.ones_initializer)

    # mean and variance during trianing are recorded and saved as moving_mean and moving_variance
    # moving_mean and moving variance are used as mean and variance in testing.
    # 訓練過程中得出的平均值和方差都被記錄下來,並被用來計算移動平均值(moving_mean)和移動方差(moving_variance)
    # 移動平均值(moving_mean)和移動方差(moving_variance)將在預測階段被使用
    moving_mean = tf.get_variable('moving_mean',
                                params_shape,
                                initializer=tf.zeros_initializer,
                                trainable=False)
    moving_variance = tf.get_variable('moving_variance',
                                    params_shape,
                                    initializer=tf.ones_initializer,
                                    trainable=False)

    # update variable by variable * decay + value * (1 - decay)
    # 更新移動平均值和移動方差,更新方式是 variable * decay + value * (1 - decay)
    update_moving_mean = moving_averages.assign_moving_average(moving_mean,
                                                               mean, BN_DECAY)
    update_moving_variance = moving_averages.assign_moving_average(
        moving_variance, variance, BN_DECAY)
    tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_moving_mean)
    tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_moving_variance)

    mean, variance = control_flow_ops.cond(
        is_training, lambda: (mean, variance),
        lambda: (moving_mean, moving_variance))


    a1_b = tf.nn.batch_normalization(a1, mean, variance, beta, gamma, BN_EPSILON)
    # 2x2 Max Pooling layer with a stride of 2
    # 2x2 的池化層,步長為2
    m1 = tf.nn.max_pool(a1_b, ksize=[1,2,2,1], strides = [1,2,2,1], padding='VALID')
    # shape of m1 should be batchsize * 26/2 * 26/2 * 32 = batchsize * 5408
    # Affine layer with 1024 output units
    # 池化後的結果m1的大小應為 batchsize * 26/2 * 26/2 * 32 = batchsize * 5408
    # 仿射層共輸出2014個值
    m1_flat = tf.reshape(m1, [-1, 5408])
    W1 = tf.get_variable("W1", shape=[5408, 1024]) 
    b1 = tf.get_variable("b1", shape=[1024])
    h2 = tf.matmul(m1_flat,W1) + b1 
    # ReLU Activation Layer
    # ReLU啟用層
    a2 = tf.nn.relu(h2)
    # Affine layer from 1024 input units to 10 outputs
    # 仿射層有1024個輸入和10個輸出
    W2 = tf.get_variable("W2", shape=[1024, 10])
    b2 = tf.get_variable("b2", shape=[10])
    y_out = tf.matmul(a2,W2) + b2
    return y_out


y_out = complex_model(X,y,is_training)

為了確保你做對了,用下面的工具來檢查你的輸出維度,應該是64 x 10。因為我們的batch size是64,仿射層的最後一個輸出是10個神經元對應10個類。

# Now we're going to feed a random batch into the model 
# and make sure the output is the right size
# 現在我們隨機輸入一個batch進入模型,來驗證一下輸出的大小是否如預期
x = np.random.randn(64, 32, 32,3)
with tf.Session() as sess:
    with tf.device("/cpu:0"): #"/cpu:0" or "/gpu:0"
        tf.global_variables_initializer().run()

        ans = sess.run(y_out,feed_dict={X:x,is_training:True})
        %timeit sess.run(y_out,feed_dict={X:x,is_training:True})
        print(ans.shape)
        print(np.array_equal(ans.shape, np.array([64, 10])))

Out:

10 loops, best of 3: 118 ms per loop
(64, 10)
True

You should see the following from the run above

(64, 10)

True

GPU!

現在我們要在GPU裝置下試一下我們的模型,剩下的程式碼都保持不變,我們的變數和操作都會用加速的程式碼路徑來執行。然而如果沒有GPU,我們會有Python exception然後不得不重建我們的圖。在一個雙核的CPU上,你大概可以看到50-80毫秒一個batch, 如果用Google Cloud GPUs 應該在2-5毫秒每個batch。

筆者注: 以下程式碼筆者用了CPU實現,得到的結果也是CPU的,如果讀者使用了GPU,可以忽略下面每一個batch得到的計算時間結果。

try:
    with tf.Session() as sess:
        with tf.device("/cpu:0") as dev: # 可以是"/cpu:0" 或 "/gpu:0"
            tf.global_variables_initializer().run()

            ans = sess.run(y_out,feed_dict={X:x,is_training:True})
            %timeit sess.run(y_out,feed_dict={X:x,is_training:True})
except tf.errors.InvalidArgumentError:
    print("no gpu found, please use Google Cloud if you want GPU acceleration")    
    # rebuild the graph
    # trying to start a GPU throws an exception 
    # and also trashes the original graph
    tf.reset_default_graph()
    X = tf.placeholder(tf.float32, [None, 32, 32, 3])
    y = tf.placeholder(tf.int64, [None])
    is_training = tf.placeholder(tf.bool)
    y_out = complex_model(X,y,is_training)

Out:

10 loops, best of 3: 115 ms per loop

你應該可以看到即使是一個簡單的前向傳播過程在GPU上面也極大的加快了速度。所以對於下面剩下的作業(構建assignment3以及你的project的模型的時候),你應該用GPU裝置。 然而,對於tensorflow,預設的裝置是GPU(如果有的話),沒有GPU的情況下會自動使用CPU。所以從現在開始我們都可以跳過裝置的指定部分。

訓練模型

既然你已經看到怎麼定義一個模型並進行前向傳播,下面,我們來用你上面建立的複雜模型,在訓練集上訓練一輪(epoch)。

確保你明白下面的每一個TensorFlow函式(對應於你自定義的神經網路)是怎麼用的。

首先,傳建一個RMSprop優化器(用學習率為1e-3)和一個交叉熵損失函式。可以參考TensorFlow文件來找到更多的資訊。

# Inputs 輸入
#     y_out: is what your model computes 模型輸出
#     y: is your TensorFlow variable with label information 資料的真實標籤
# Outputs 輸出
#    mean_loss: a TensorFlow variable (scalar) with numerical loss 損失函式均值
#    optimizer: a TensorFlow optimizer 優化器
# This should be ~3 lines of code! 大概需要約3行程式碼
total_loss = tf.nn.softmax_cross_entropy_with_logits(logits=y_out, labels=tf.one_hot(y,10))
mean_loss = tf.reduce_mean(total_loss)

# define our optimizer 定義優化器
optimizer = tf.train.RMSPropOptimizer(1e-3) # select optimizer and set learning rate 定義優化器和學習率

# batch normalization in tensorflow requires this extra dependency
# tensorflow中執行batchNorm需要這些額外的依賴
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    train_step = optimizer.minimize(mean_loss)

訓練模型

下面我們建立一個session,並且在一個epoch上訓練模型。你應該可以看到loss在1.4到2.0之間,準確率在0.4-0.5之間。由於初始化和隨機種子的不同,具體值可能會有一些變化。

sess = tf.Session()

sess.run(tf.global_variables_initializer())
print('Training')
run_model(sess,y_out,mean_loss,X_train,y_train,1,64,100,train_step)

Out:

Training
Iteration 0: with minibatch training loss = 3.39 and accuracy of 0.078
Iteration 100: with minibatch training loss = 3.18 and accuracy of 0.14
Iteration 200: with minibatch training loss = 1.78 and accuracy of 0.41
Iteration 300: with minibatch training loss = 1.86 and accuracy of 0.39
Iteration 400: with minibatch training loss = 1.32 and accuracy of 0.48
Iteration 500: with minibatch training loss = 1.2 and accuracy of 0.66
Iteration 600: with minibatch training loss = 1.27 and accuracy of 0.59
Iteration 700: with minibatch training loss = 1.32 and accuracy of 0.48
Epoch 1, Overall loss = 1.67 and accuracy of 0.452

Out:

(1.6708081902873759, 0.45230612244897961)

檢視模型的精確度

讓我們看一下訓練和測試程式碼 – 在下面你自己建的模型中,可以隨意使用這些程式碼來評估模型。你應該可以看到loss在1.3-2.0之間,準確率是0.45到0.55之間。

print('Validation')
run_model(sess,y_out,mean_loss,X_val,y_val,1,64)

Out:

Validation
Epoch 1, Overall loss = 1.44 and accuracy of 0.538

Out:

(1.4403997488021851, 0.53800000000000003)

現在你可以實驗不同的結構,高參,損失函式和優化器來訓練一個模型,能夠在CIFAR-10上得到大於等於70%的準確率,你可以用上面的run_model函式。

你可以嘗試的

  • Filter size: 上面我們用了7 x 7的大小;用小一點的filter也許會更加的有效率。
  • Number of filters: 上面我們用32個filter,用少一點會不會更好?
  • Pooling vs Strided Convolution: 你有沒有用max pooling? 還是隻用了卷積(strided convolution)?
  • Batch normalization: 嘗試在卷積層後面加上區域性bath norm(spatial batch normalization), 在全連線層後面加上普通標準化(vanilla batch normalization),你的網路有沒有訓練的更快一點?
  • Network architecture: 上面的網路有兩層可以訓練的引數,你是不是可以用一個深度的網路訓練的更好,下面這些架構你可以嘗試一下:

    • [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    • [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    • [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
  • Use TensorFlow Scope: 用TensorFlow scope 和/或 tf.layers來使得寫更深的網路更方便。關於怎麼用tf.layers詳見這個教程

  • Use Learning Rate Decay:正如這篇筆記所指出的,衰減學習率也許會幫助模型收斂。當loss不再隨著epoch改變或者任何其他你覺得合適的啟發式規則,可以試著衰減每一輪中的學習率。學習率衰減詳見 Tensorflow documentation

  • Global Average Pooling: 除了展開然後構建多個全連線層,你可以用卷積操作直到你的圖片足夠的小(比如7x7),然後在上面加一個average poooling層,來得到一個1x1(1,1,filter個數)的圖片,然後再reshape成一個向量(filter個數)。這在Google Inception Network中有使用,詳見表1,是他們的網路架構。

訓練建議

對於每個你嘗試的網路架構,你應該調整學習率和正則化強度,這麼做的話有一些很重要的東西需要記住:
-如果引數設定的很好,你應該可以在幾百個迭代中就看到提升
-對於超參的選擇,要記住由粗到精的方法,從一個很大範圍的超參開始,通過迭代來找到那些表現不錯的引數組合。
-一旦你發現了幾組似乎有效的引數,在這些引數附近再進一步搜尋。這時你也許會需要訓練更多的輪數(epochs).
-你應該用驗證集來找超參,我們會用你在驗證集上找到的最好的引數來測試測試集,從而來評估你的模型表現。

除此以外

如果你比較愛冒險,還有很多其他特徵你可以嘗試來提升你的模型效能。你並不一定需要實現下面的全部內容,不過嘗試實現它們可以獲得額外的加分。

  • 其他的更新方法,這個作業中我們用了SGD+momentum, RMSprop 以及Adam;你可以試試其他的比如AdaGrad或者AdaDelta.
  • 其他的啟用函式像是leaky ReLU, parametric ReLU, ELU, 或者 MaxOut.
  • 整合模型
  • 資料擴增

如果你決定實現一些其他的東西,請在下面的”Extra Credict Description”中敘述一下。

我們期望的

最起碼,你應該可以訓練出一個ConNet在驗證集上得到至少70%的準確率,這只是一個最低界限。
- 如果你夠細心,應該是可以得到一個遠遠高於這個結果的準確率!額外的分數會加給得分特別高的模型或者獨特的方法。

你應該用下面的空間來做實驗並訓練你的模型。這個Notebook中的最後一個cell應該包含了你的模型在訓練集和驗證集的準確率。

開開心心地訓練吧!

# Feel free to play with this cell
# 這裡的程式碼可以隨意把玩

def my_model(X,y,is_training):
    def conv_relu_pool(X, num_filter=32, conv_strides = 1, kernel_size=[3,3], pool_size=[2,2], pool_strides = 2):
        conv1 = tf.layers.conv2d(inputs=X, filters=num_filter, kernel_size=kernel_size, stides = conv_strides, padding="same", activation=tf.nn.relu)
        pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=pool_size, strides = pool_strides)
        return pool1

    def conv_relu_conv_relu_pool(X, num_filter1=32, num_filter2=32, conv_strides = 1, kernel_size=[5,5],pool_size=[2,2], pool_strides = 2):
        conv1 = tf.layers.conv2d(inputs=X,filters=num_filter1,kernel_size=kernel_size, strides=conv_strides, padding="same",activation=tf.nn.relu)
        conv2 = tf.layers.conv2d(inputs=conv1,filters=num_filter2,kernel_size=kernel_size, strides=conv_strides, padding="same",activation=tf.nn.relu)
        # Pooling Layer #1
        pool1 = tf.layers.max_pooling2d(inputs=conv2, pool_size=pool_size, strides=pool_strides)
        return pool1

    def affline(X, num_units, act):
        return tf.layers.dense(inputs=X, units=num_units, activation=act)

    def batchnorm_relu_conv(X, num_filters=32, conv_strides = 2, kernel_size=[5,5], is_training=True):
        bat1 = tf.layers.batch_normalization(X, training=is_training)
        act1 = tf.nn.relu(bat1)
        #conv1 = tf.layers.conv2d(inputs=act1, filters=num_filters, 
        #                         kernel_size = kernel_size, strides = 2, padding="same", activation=None,
        #                         kernel_regularizer=tf.contrib.layers.l2_regularizer(scale=0.1),
        #                         bias_regularizer=tf.contrib.layers.l2_regularizer(scale=0.1))
        conv1 = tf.layers.conv2d(inputs=act1, filters=num_filters, 
                                kernel_size = kernel_size, strides = 2, padding="same", activation=None) # without regularization

        return conv1

    N = 3 # num of conv blocks
    M = 1 # num of affine 
    conv = tf.layers.conv2d(inputs = X, filters=64, kernel_size=[5,5], strides=1, padding="same", activation=None)


    for i in range(N):
        print(conv.get_shape())
        conv = batchnorm_relu_conv(conv, is_training=is_training)
        #conv = conv_relu_conv_relu_pool(conv)

    print(conv.get_shape())
    global_average_shape = conv.get_shape()[1:3] # 4,4

    # just flatten the output
    #avg_layer = tf.reshape(conv,(-1,512))

    # global average pooling method 1
    #avg_layer = tf.layers.average_pooling2d(conv,(global_average_shape,global_average_shape),padding='valid')
    #avg_layer = tf.squeeze(avg_layer, axis=[1,2]) # remove all 1 axis

    # global average  pooling method 2
    avg_layer = tf.reduce_mean(conv, [1,2]) # the same as global max pooling

    print(avg_layer.get_shape())

    fc = avg_layer
    #keep_prob = tf.constant(0.5)
    for i in range(M):
        fc = affline(fc,100,tf.nn.relu)
        #fc = tf.nn.dropout(fc, keep_prob)

    fc = affline(fc, 10, None)

    return fc    


tf.reset_default_graph()

X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)

y_out = my_model(X,y,is_training)
total_loss = tf.nn.softmax_cross_entropy_with_logits(logits=y_out, labels=tf.one_hot(y,10))
mean_loss = tf.reduce_mean(total_loss)

global_step = tf.Variable(0, trainable=False, name="Global_Step")
starter_learning_rate = 1e-2
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                           750, 0.96, staircase=True)

#learning_rate = starter_learning_rate
# define our optimizer
optimizer = tf.train.AdamOptimizer(learning_rate) # select optimizer and set learning rate


# batch normalization in tensorflow requires this extra dependency
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    train_step = optimizer.minimize(mean_loss, global_step=global_step)

print([x.name for x in tf.global_variables()])

Out:

(?, 32, 32, 64)
(?, 16, 16, 32)
(?, 8, 8, 32)
(?, 4, 4, 32)
(?, 32)
['conv2d/kernel:0', 'conv2d/bias:0', 'batch_normalization/beta:0', 'batch_normalization/gamma:0', 'batch_normalization/moving_mean:0', 'batch_normalization/moving_variance:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'batch_normalization_1/beta:0', 'batch_normalization_1/gamma:0', 'batch_normalization_1/moving_mean:0', 'batch_normalization_1/moving_variance:0', 'conv2d_2/kernel:0', 'conv2d_2/bias:0', 'batch_normalization_2/beta:0', 'batch_normalization_2/gamma:0', 'batch_normalization_2/moving_mean:0', 'batch_normalization_2/moving_variance:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0', 'Global_Step:0', 'beta1_power:0', 'beta2_power:0', 'conv2d/kernel/Adam:0', 'conv2d/kernel/Adam_1:0', 'conv2d/bias/Adam:0', 'conv2d/bias/Adam_1:0', 'batch_normalization/beta/Adam:0', 'batch_normalization/beta/Adam_1:0', 'batch_normalization/gamma/Adam:0', 'batch_normalization/gamma/Adam_1:0', 'conv2d_1/kernel/Adam:0', 'conv2d_1/kernel/Adam_1:0', 'conv2d_1/bias/Adam:0', 'conv2d_1/bias/Adam_1:0', 'batch_normalization_1/beta/Adam:0', 'batch_normalization_1/beta/Adam_1:0', 'batch_normalization_1/gamma/Adam:0', 'batch_normalization_1/gamma/Adam_1:0', 'conv2d_2/kernel/Adam:0', 'conv2d_2/kernel/Adam_1:0', 'conv2d_2/bias/Adam:0', 'conv2d_2/bias/Adam_1:0', 'batch_normalization_2/beta/Adam:0', 'batch_normalization_2/beta/Adam_1:0', 'batch_normalization_2/gamma/Adam:0', 'batch_normalization_2/gamma/Adam_1:0', 'conv2d_3/kernel/Adam:0', 'conv2d_3/kernel/Adam_1:0', 'conv2d_3/bias/Adam:0', 'conv2d_3/bias/Adam_1:0', 'dense/kernel/Adam:0', 'dense/kernel/Adam_1:0', 'dense/bias/Adam:0', 'dense/bias/Adam_1:0', 'dense_1/kernel/Adam:0', 'dense_1/kernel/Adam_1:0', 'dense_1/bias/Adam:0', 'dense_1/bias/Adam_1:0']
# Feel free to play with this cell
# This default code creates a session
# and trains your model for 10 epochs
# then prints the validation set accuracy
#sess = tf.Session()
#sess.run(tf.global_variables_initializer())
print('Training')
run_model(sess,y_out,mean_loss,X_train,y_train,2,64
          ,100,train_step,True)
print('Validation')
run_model(sess,y_out,mean_loss,X_val,y_val,1,64)

# 下面的loss是我預先跑了5個epoch之後,又跑了兩個epoch

Out:

Training
Iteration 0: with minibatch training loss = 0.614 and accuracy of 0.81
Iteration 100: with minibatch training loss = 0.653 and accuracy of 0.77
Iteration 200: with minibatch training loss = 0.852 and accuracy of 0.75
Iteration 300: with minibatch training loss = 0.868 and accuracy of 0.75
Iteration 400: with minibatch training loss = 0.517 and accuracy of 0.81
Iteration 500: with minibatch training loss = 0.744 and accuracy of 0.69
Iteration 600: with minibatch training loss = 0.547 and accuracy of 0.78
Iteration 700: with minibatch training loss = 0.692 and accuracy of 0.75
Epoch 1, Overall loss = 0.714 and accuracy of 0.745

png

Iteration 800: with minibatch training loss = 0.533 and accuracy of 0.8
Iteration 900: with minibatch training loss = 0.907 and accuracy of 0.69
Iteration 1000: with minibatch training loss = 0.595 and accuracy of 0.73
Iteration 1100: with minibatch training loss = 0.518 and accuracy of 0.83
Iteration 1200: with minibatch training loss = 0.837 and accuracy of 0.73
Iteration 1300: with minibatch training loss = 0.723 and accuracy of 0.73
Iteration 1400: with minibatch training loss = 0.923 and accuracy of 0.67
Iteration 1500: with minibatch training loss = 0.612 and accuracy of 0.8
Epoch 2, Overall loss = 0.656 and accuracy of 0.768

png

Validation
Epoch 1, Overall loss = 0.901 and accuracy of 0.708

Out:

(0.9011877479553223, 0.70799999999999996)
# Test your model here, and make sure 
# the output of this cell is the accuracy
# of your best model on the training and val sets
# We're looking for >= 70% accuracy on Validation
# 在這裡測試你的模型,確保本cell的輸出是你的模型在訓練集和驗證集上最好的準確度
# 驗證集的準確度應該在70%以上
print('Training')
run_model(sess,y_out,mean_loss,X_train,y_train,1,64)
print('Validation')

run_model(sess,y_out,mean_loss,X_val,y_val,1,64)

Out:

Training
Epoch 1, Overall loss = 0.607 and accuracy of 0.783
Validation
Epoch 1, Overall loss = 0.901 and accuracy of 0.708

Out:

(0.90118774318695072, 0.70799999999999996)

在這裡寫一下你都做了些什麼吧

在這裡講述一下你做了神馬,以及你實現的額外的特性,以及任何你用來訓練和評估你的神經網路的視覺化圖

筆者簡單的實現了上面要求中的幾個塊,分別試了一下效果,以及用了一下learning rate decay。建議讀者可以嘗試更多的組合,多查閱官方文件來加深對tensorflow的理解。另外在建模型的時候可以把每一步的結果的shape打印出來,從而對模型每一步的輸出有個概念。如果訓練的過程中遇到問題,可以先用tensorflow的官方文件上的cifar模型結構來執行一下,看看是否可以調通。

測試集-我們只測一次

既然我們已經有一個我們覺得還不錯的結果,那我們需要把最後的模型放到測試集上。這就是我們最後會在比賽上得到的結果,根據這個結果,思考一下,這個結果和你的驗證集準確率比起來如何。

print('Test')
run_model(sess,y_out,mean_loss,X_test,y_test,1,64)

Out:

Test
Epoch 1, Overall loss = 0.899 and accuracy of 0.696

Out:

(0.89940066184997558, 0.69550000000000001)

我們還會用TensorFlow做更多事情

後面的作業都會依賴Tensorflow,你也許會發現它對你的專案也很有幫助。

加分內容說明

如果你實現了額外的一些特性來獲得加分,請在這裡指明程式碼或者其它檔案的位置。