1. 程式人生 > >深度學習 —— 堆疊自動降噪編碼機

深度學習 —— 堆疊自動降噪編碼機

堆疊降噪自動編碼機(SdA)是堆疊自動編碼機的延伸。如果對降噪自動編碼機不太熟悉的話建議先閱讀此前的相關文章。

堆疊自動編碼機

降噪自動編碼機可以堆疊起來構建深層網路,降噪自動編碼機在下層發現的隱藏表徵(輸出碼)可以作為當前層的輸入。這種結構的非監督預訓練以每次一層的方式進行。每層都使用降噪自動編碼機最小化重構輸入(前一層的輸出)的誤差。當前k層完成訓練後就可以訓練第k+1層因為此時我們可以計算下層的程式碼或隱藏表徵。

當所有層都預訓練完後則進入第二階段稱作細調。這裡我們考慮有監督細調,最小化有監督任務的預測誤差。首先,我們在網路上(輸出層的輸出碼)增加一個邏輯迴歸層。然後我們像訓練多層感知機一樣訓練整個網路。此時我們只考慮自動編碼機的編碼部分。我們在訓練中使用目標類因此這一過程是有監督的。

在Theano中實現這一過程很容易,使用此前定義的降噪自動編碼機的類。我們可以看到堆疊自動編碼機有兩部分內容,一個自動編碼機列表和一個MLP。在預訓練階段,我們將模型視為一個自動編碼機列表,並逐個訓練它們,然後我們使用MLP。兩個部分連線在一起基於:

自動編碼機和MLP的sigmoid層共享引數,並且

MLP中間層計算得到的隱藏表徵作為輸入傳遞給自動編碼機

class SdA(object):
    """Stacked denoising auto-encoder class (SdA)

    A stacked denoising autoencoder model is obtained by stacking several
    dAs. The hidden layer of the dA at layer `i` becomes the input of
    the dA at layer `i+1`. The first layer dA gets as input the input of
    the SdA, and the hidden layer of the last dA represents the output.
    Note that after pretraining, the SdA is dealt with as a normal MLP,
    the dAs are only used to initialize the weights.
    """

    def __init__(
        self,
        numpy_rng,
        theano_rng=None,
        n_ins=784,
        hidden_layers_sizes=[500, 500],
        n_outs=10,
        corruption_levels=[0.1, 0.1]
    ):
        """ This class is made to support a variable number of layers.

        :type numpy_rng: numpy.random.RandomState
        :param numpy_rng: numpy random number generator used to draw initial
                    weights

        :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
        :param theano_rng: Theano random generator; if None is given one is
                           generated based on a seed drawn from `rng`

        :type n_ins: int
        :param n_ins: dimension of the input to the sdA

        :type hidden_layers_sizes: list of ints
        :param hidden_layers_sizes: intermediate layers size, must contain
                               at least one value

        :type n_outs: int
        :param n_outs: dimension of the output of the network

        :type corruption_levels: list of float
        :param corruption_levels: amount of corruption to use for each
                                  layer
        """

        self.sigmoid_layers = []
        self.dA_layers = []
        self.params = []
        self.n_layers = len(hidden_layers_sizes)

        assert self.n_layers > 0

        if not theano_rng:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
        # allocate symbolic variables for the data
        self.x = T.matrix('x')  # the data is presented as rasterized images
        self.y = T.ivector('y')  # the labels are presented as 1D vector of
                                 # [int] labels

self.sigmoid_layers將儲存MLP的sigmoid層,而self.dA_layer將儲存與MLP層相關的降噪自動編碼機。

接下來我們構建n_layers sigmoid層和 n_layers降噪自動編碼機,n_layers是我們模型的深度。我們使用Multilayer Perceptron中介紹的HiddenLayer,將非線性的tanh替換為邏輯函式

我們將sigmoid層連線起來構建MLP,然後以分享權重矩陣和相應sigmoid層編碼部分偏差構建降噪自動編碼機。

 for i in range(self.n_layers):
            # construct the sigmoidal layer

            # the size of the input is either the number of hidden units of
            # the layer below or the input size if we are on the first layer
            if i == 0:
                input_size = n_ins
            else:
                input_size = hidden_layers_sizes[i - 1]

            # the input to this layer is either the activation of the hidden
            # layer below or the input of the SdA if you are on the first
            # layer
            if i == 0:
                layer_input = self.x
            else:
                layer_input = self.sigmoid_layers[-1].output

            sigmoid_layer = HiddenLayer(rng=numpy_rng,
                                        input=layer_input,
                                        n_in=input_size,
                                        n_out=hidden_layers_sizes[i],
                                        activation=T.nnet.sigmoid)
            # add the layer to our list of layers
            self.sigmoid_layers.append(sigmoid_layer)
            # its arguably a philosophical question...
            # but we are going to only declare that the parameters of the
            # sigmoid_layers are parameters of the StackedDAA
            # the visible biases in the dA are parameters of those
            # dA, but not the SdA
            self.params.extend(sigmoid_layer.params)

            # Construct a denoising autoencoder that shared weights with this
            # layer
            dA_layer = dA(numpy_rng=numpy_rng,
                          theano_rng=theano_rng,
                          input=layer_input,
                          n_visible=input_size,
                          n_hidden=hidden_layers_sizes[i],
                          W=sigmoid_layer.W,
                          bhid=sigmoid_layer.b)
            self.dA_layers.append(dA_layer)

我們現在只需在sigmoid層之上加邏輯層以構建MLP。我們使用此前介紹的邏輯迴歸類。

 # We now need to add a logistic layer on top of the MLP
        self.logLayer = LogisticRegression(
            input=self.sigmoid_layers[-1].output,
            n_in=hidden_layers_sizes[-1],
            n_out=n_outs
        )

        self.params.extend(self.logLayer.params)
        # construct a function that implements one step of finetunining

        # compute the cost for second phase of training,
        # defined as the negative log likelihood
        self.finetune_cost = self.logLayer.negative_log_likelihood(self.y)
        # compute the gradients with respect to the model parameters
        # symbolic variable that points to the number of errors made on the
        # minibatch given by self.x and self.y
        self.errors = self.logLayer.errors(self.y)

SdA類也提供了在層中生成降噪自動編碼機訓練模組的方法。他們以列表的形式返回,其中元素i是實現對應i層單步訓練dA的功能模組。

  def pretraining_functions(self, train_set_x, batch_size):
        ''' Generates a list of functions, each of them implementing one
        step in trainnig the dA corresponding to the layer with same index.
        The function will require as input the minibatch index, and to train
        a dA you just need to iterate, calling the corresponding function on
        all minibatch indexes.

        :type train_set_x: theano.tensor.TensorType
        :param train_set_x: Shared variable that contains all datapoints used
                            for training the dA

        :type batch_size: int
        :param batch_size: size of a [mini]batch

        :type learning_rate: float
        :param learning_rate: learning rate used during training for any of
                              the dA layers
        '''

        # index to a [mini]batch
        index = T.lscalar('index')  # index to a minibatch

為能在訓練中更改損壞程度或學習速率,我們使用Theano變數。

 corruption_level = T.scalar('corruption')  # % of corruption to use
        learning_rate = T.scalar('lr')  # learning rate to use
        # begining of a batch, given `index`
        batch_begin = index * batch_size
        # ending of a batch given `index`
        batch_end = batch_begin + batch_size

        pretrain_fns = []
        for dA in self.dA_layers:
            # get the cost and the updates list
            cost, updates = dA.get_cost_updates(corruption_level,
                                                learning_rate)
            # compile the theano function
            fn = theano.function(
                inputs=[
                    index,
                    theano.In(corruption_level, value=0.2),
                    theano.In(learning_rate, value=0.1)
                ],
                outputs=cost,
                updates=updates,
                givens={
                    self.x: train_set_x[batch_begin: batch_end]
                }
            )
            # append `fn` to the list of functions
            pretrain_fns.append(fn)

        return pretrain_fns

現在,任何函式pretrain_fns[i]接受宣告變數index和可選的corruption—損壞程度或者lr—學習速率。注意引數名稱是構建時給予Theano變數的名稱,不是Python變數名稱(learning_rate或者corruption_level),這點要注意。

以同樣方式我們構建細調時所需的建構函式(train_fn, valid_score, test_score)

 def build_finetune_functions(self, datasets, batch_size, learning_rate):
        '''Generates a function `train` that implements one step of
        finetuning, a function `validate` that computes the error on
        a batch from the validation set, and a function `test` that
        computes the error on a batch from the testing set

        :type datasets: list of pairs of theano.tensor.TensorType
        :param datasets: It is a list that contain all the datasets;
                         the has to contain three pairs, `train`,
                         `valid`, `test` in this order, where each pair
                         is formed of two Theano variables, one for the
                         datapoints, the other for the labels

        :type batch_size: int
        :param batch_size: size of a minibatch

        :type learning_rate: float
        :param learning_rate: learning rate used during finetune stage
        '''

        (train_set_x, train_set_y) = datasets[0]
        (valid_set_x, valid_set_y) = datasets[1]
        (test_set_x, test_set_y) = datasets[2]

        # compute number of minibatches for training, validation and testing
        n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
        n_valid_batches //= batch_size
        n_test_batches = test_set_x.get_value(borrow=True).shape[0]
        n_test_batches //= batch_size

        index = T.lscalar('index')  # index to a [mini]batch

        # compute the gradients with respect to the model parameters
        gparams = T.grad(self.finetune_cost, self.params)

        # compute list of fine-tuning updates
        updates = [
            (param, param - gparam * learning_rate)
            for param, gparam in zip(self.params, gparams)
        ]

        train_fn = theano.function(
            inputs=[index],
            outputs=self.finetune_cost,
            updates=updates,
            givens={
                self.x: train_set_x[
                    index * batch_size: (index + 1) * batch_size
                ],
                self.y: train_set_y[
                    index * batch_size: (index + 1) * batch_size
                ]
            },
            name='train'
        )

        test_score_i = theano.function(
            [index],
            self.errors,
            givens={
                self.x: test_set_x[
                    index * batch_size: (index + 1) * batch_size
                ],
                self.y: test_set_y[
                    index * batch_size: (index + 1) * batch_size
                ]
            },
            name='test'
        )

        valid_score_i = theano.function(
            [index],
            self.errors,
            givens={
                self.x: valid_set_x[
                    index * batch_size: (index + 1) * batch_size
                ],
                self.y: valid_set_y[
                    index * batch_size: (index + 1) * batch_size
                ]
            },
            name='valid'
        )

        # Create a function that scans the entire validation set
        def valid_score():
            return [valid_score_i(i) for i in range(n_valid_batches)]

        # Create a function that scans the entire test set
        def test_score():
            return [test_score_i(i) for i in range(n_test_batches)]

        return train_fn, valid_score, test_score

注意valid_score和test_score不是Theano函式,而是Python函式分別遍歷所有驗證集和測試集,生成相應的損失列表。

總結

以下程式碼構建了堆疊降噪自動編碼機

 numpy_rng = numpy.random.RandomState(89677)
    print('... building the model')
    # construct the stacked denoising autoencoder class
    sda = SdA(
        numpy_rng=numpy_rng,
        n_ins=28 * 28,
        hidden_layers_sizes=[1000, 1000, 1000],
        n_outs=10
    )

訓練分兩個階段,以層為單位的預訓練和細調。

預訓練階段我們遍歷網路所有層。每一層我們使用編譯的Theano函式實現SGD以優化該層降低重建成本的權重。函式執行次數由pretraining_epochs給定。

  #########################
    # PRETRAINING THE MODEL #
    #########################
    print('... getting the pretraining functions')
    pretraining_fns = sda.pretraining_functions(train_set_x=train_set_x,
                                                batch_size=batch_size)

    print('... pre-training the model')
    start_time = timeit.default_timer()
    ## Pre-train layer-wise
    corruption_levels = [.1, .2, .3]
    for i in range(sda.n_layers):
        # go through pretraining epochs
        for epoch in range(pretraining_epochs):
            # go through the training set
            c = []
            for batch_index in range(n_train_batches):
                c.append(pretraining_fns[i](index=batch_index,
                         corruption=corruption_levels[i],
                         lr=pretrain_lr))
            print('Pre-training layer %i, epoch %d, cost %f' % (i, epoch, numpy.mean(c, dtype='float64')))

    end_time = timeit.default_timer()

    print(('The pretraining code for file ' +
           os.path.split(__file__)[1] +
           ' ran for %.2fm' % ((end_time - start_time) / 60.)), file=sys.stderr)

細調與多層感知機中的類似,唯一不同在於函式由build_finetune_functions給出。

執行程式碼

python code/SdA.py

程式碼預設每層執行15次預訓練,批次大小為1。損壞度地一層0.1, 第二層0.2,第三層0.3。速率0.001,細調速率0.1。預訓練耗時581.01分鐘,平均每次13分鐘。細調在444.2分鐘36次後完成,平均12.34分鐘。最終驗證1.39%,測試1.3%。使用了 IntelXeon E5430 @ 2.66GHz CPU, 單一執行緒 GotoBLAS。

技巧

一個提高執行速度的方法(假設有足夠的記憶體)是計算網路如何轉化到k-1層的資料。即訓練第1層dA,計算資料集中每個點的隱藏單元值並存儲以訓練對應第2層的dA,依次重複。可以看到,dA是分別訓練的,只是為輸入提供了非線性轉化。當所有dA訓練完成後,可以開始微調模型。