DeepLearning.ai作業:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）（2）

阿新 • • 發佈：2018-11-09

title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）（2）’
id: dl-ai-5-1h2
tags:

dl.ai
homework
categories:
AI
Deep Learning
date: 2018-10-18 16:20:33

作業2搭建了一個字元級的語言模型，來生成恐龍的名字。

Part2:Character level language model - Dinosaurus land

模型結構

初始化引數

執行最優化迴圈
- 計算前向傳播的損失函式
- 計算反向傳播的梯度及損失函式
- 剪裁梯度避免梯度爆炸
- 使用梯度更新梯度下降中的各引數
返回學習到的引數

梯度裁剪

確保不會梯度爆炸

### GRADED FUNCTION: clip

def clip(gradients, maxValue):
    '''
    Clips the gradients' values between minimum and maximum.
    
    Arguments:
    gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
    maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
    
    Returns: 
    gradients -- a dictionary with the clipped gradients.
    ''' 

    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
   
    ### START CODE HERE ###
    # clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
    for gradient in [dWax, dWaa, dWya, db, dby] 
:
        np.clip(gradient, -1 * maxValue, maxValue,out=gradient)
    ### END CODE HERE ###
    
    gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients

取樣

現在假設你的模型已經訓練好了，你需要以此生成新的字母，過程如下:

# GRADED FUNCTION: sample

def sample(parameters, char_to_ix, seed):
    """
    Sample a sequence of characters according to a sequence of probability distributions output of the RNN

    Arguments:
    parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. 
    char_to_ix -- python dictionary mapping each character to an index.
    seed -- used for grading purposes. Do not worry about it.

    Returns:
    indices -- a list of length n containing the indices of the sampled characters.
    """
    
    # Retrieve parameters and relevant shapes from "parameters" dictionary
    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    vocab_size = by.shape[0]
    n_a = Waa.shape[1]
    
    ### START CODE HERE ###
    # Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)
    x = np.zeros((vocab_size, 1))
    # Step 1': Initialize a_prev as zeros (≈1 line)
    a_prev = np.zeros((n_a, 1))
    
    # Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)
    indices = []
    
    # Idx is a flag to detect a newline character, we initialize it to -1
    idx = -1 
    
    # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append 
    # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well 
    # trained model), which helps debugging and prevents entering an infinite loop. 
    counter = 0
    newline_character = char_to_ix['\n']
    
    while (idx != newline_character and counter != 50):
        
        # Step 2: Forward propagate x using the equations (1), (2) and (3)
        a = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b)
        z = np.dot(Wya, a) + by
        y = softmax(z)
        
        # for grading purposes
        np.random.seed(counter+seed) 
        
        # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
        idx = np.random.choice(range(len(y)),p = y.ravel())

        # Append the index to "indices"
        indices.append(idx)
        
        # Step 4: Overwrite the input character as the one corresponding to the sampled index.
        x = np.zeros((vocab_size, 1))
        x[idx] = 1
        
        # Update "a_prev" to be "a"
        a_prev = a
        
        # for grading purposes
        seed += 1
        counter +=1
        
    ### END CODE HERE ###

    if (counter == 50):
        indices.append(char_to_ix['\n'])
    
    return indices

構建模型

函式都已經給你了

# GRADED FUNCTION: optimize

def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
    """
    Execute one step of the optimization to train the model.
    
    Arguments:
    X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
    Y -- list of integers, exactly the same as X but shifted one index to the left.
    a_prev -- previous hidden state.
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        b --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    learning_rate -- learning rate for the model.
    
    Returns:
    loss -- value of the loss function (cross-entropy)
    gradients -- python dictionary containing:
                        dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
                        dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
                        dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
                        db -- Gradients of bias vector, of shape (n_a, 1)
                        dby -- Gradients of output bias vector, of shape (n_y, 1)
    a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
    """
    
    ### START CODE HERE ###
    
    # Forward propagate through time (≈1 line)
    loss, cache = rnn_forward(X, Y, a_prev, parameters)
    
    # Backpropagate through time (≈1 line)
    gradients, a = rnn_backward(X, Y, parameters, cache)
    
    # Clip your gradients between -5 (min) and 5 (max) (≈1 line)
    gradients = clip(gradients, 5)
    
    # Update parameters (≈1 line)
    parameters = update_parameters(parameters, gradients, learning_rate)
    
    ### END CODE HERE ###
    
    return loss, gradients, a[len(X)-1]

訓練模型

# GRADED FUNCTION: model

def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):
    """
    Trains the model and generates dinosaur names. 
    
    Arguments:
    data -- text corpus
    ix_to_char -- dictionary that maps the index to a character
    char_to_ix -- dictionary that maps a character to an index
    num_iterations -- number of iterations to train the model for
    n_a -- number of units of the RNN cell
    dino_names -- number of dinosaur names you want to sample at each iteration. 
    vocab_size -- number of unique characters found in the text, size of the vocabulary
    
    Returns:
    parameters -- learned parameters
    """
    
    # Retrieve n_x and n_y from vocab_size
    n_x, n_y = vocab_size, vocab_size
    
    # Initialize parameters
    parameters = initialize_parameters(n_a, n_x, n_y)
    
    # Initialize loss (this is required because we want to smooth our loss, don't worry about it)
    loss = get_initial_loss(vocab_size, dino_names)
    
    # Build list of all dinosaur names (training examples).
    with open("dinos.txt") as f:
        examples = f.readlines()
    examples = [x.lower().strip() for x in examples]
    
    # Shuffle list of all dinosaur names
    np.random.seed(0)
    np.random.shuffle(examples)
    
    # Initialize the hidden state of your LSTM
    a_prev = np.zeros((n_a, 1))
    
    # Optimization loop
    for j in range(num_iterations):
        
        ### START CODE HERE ###
        
        # Use the hint above to define one training example (X,Y) (≈ 2 lines)
        index = j % len(examples)
        X = [None] + [char_to_ix[ch] for ch in examples[index]]
        Y = X[1:] + [char_to_ix['\n']]
        
        # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
        # Choose a learning rate of 0.01
        curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate=0.01)
        
        ### END CODE HERE ###
        
        # Use a latency trick to keep the loss smooth. It happens here to accelerate the training.
        loss = smooth(loss, curr_loss)

        # Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
        if j % 2000 == 0:
            
            print('Iteration: %d, Loss: %f' % (j, loss) + '\n')
            
            # The number of dinosaur names to print
            seed = 0
            for name in range(dino_names):
                
                # Sample indices and print them
                sampled_indices = sample(parameters, char_to_ix, seed)
                print_sample(sampled_indices, ix_to_char)
                
                seed += 1  # To get the same result for grading purposed, increment the seed by one. 
      
            print('\n')
        
    return parameters

DeepLearning.ai作業:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）（1）

title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）（1）’ id: dl-ai-5-1h1 tags: dl.ai homework categories: AI Deep

DeepLearning.ai作業:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）（2）

title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）（2）’ id: dl-ai-5-1h2 tags: dl.ai homework categories: AI Deep

DeepLearning.ai作業:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）（3）

title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）（3）’ id: dl-ai-5-1h3 tags: dl.ai homework categories: AI Deep

DeepLearning.ai筆記:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）

title: ‘DeepLearning.ai筆記:(5-1)-- 迴圈神經網路（Recurrent Neural Networks）’ id: dl-ai-5-1 tags: dl.ai categories: AI Deep Learning date: 2

TensorFlow HOWTO 5.1 迴圈神經網路（時間序列）

5.1 迴圈神經網路（時間序列）迴圈神經網路（RNN）用於建模帶有時間關係的資料。它的架構是這樣的。在最基本的 RNN 中，單元（方框）中的操作和全連線層沒什麼區別，都是線性變換和啟用。它完全可以看做多個全連線層的橫向擴充套件。但是運算元量多了之後，就會有梯度消失

DeepLearning.ai作業:(4-1)-- 卷積神經網路（Foundations of CNN）

title: ‘DeepLearning.ai作業:(4-1)-- 卷積神經網路（Foundations of CNN）’ id: dl-ai-4-1h tags: dl.ai homework categories: AI Deep Learning d

DeepLearning.ai作業:(5-2) -- 自然語言處理與詞嵌入(NLP and Word Embeddings)

title: ‘DeepLearning.ai作業:(5-2) – 自然語言處理與詞嵌入(NLP and Word Embeddings)’ id: dl-ai-5-2h tags: dl.ai homework categories: AI Deep L

DeepLearning.ai作業:(5-3) -- 序列模型和注意力機制

title: ‘DeepLearning.ai作業:(5-3) – 序列模型和注意力機制’ id: dl-ai-5-3h tags: dl.ai homework categories: AI Deep Learning date: 2018-10-18 1

吳恩達deeplearning.ai課程《改善深層神經網路：超引數除錯、正則化以及優化》____學習筆記（第一週）

____tz_zs學習筆記第一週深度學習的實用層面（Practical aspects of Deep Learning）我們將學習如何有效運作神經網路（超引數調優、如何構建資料以及如何確保優化演算法快速執行）設定ML應用（Setting up your ML applic

Coursera 吳恩達 Deep Learning 第二課改善神經網路 Improving Deep Neural Networks 第二週程式設計作業程式碼Optimization methods

Optimization Methods Until now, you’ve always used Gradient Descent to update the parameters and minimize the cost. In this notebo

吳恩達Deeplearning.ai 第五課 Sequence Model 第一週------Recurrent Neural Network Model

這一節內容比較多，主要講述瞭如何搭建一個RNN標準單元使用標準神經網路的不足： 1.不同樣本的輸入輸出長度不等（雖然都可以padding成最大長度的樣本） 2.（更主要的原因）text不同的位置之間不共享學習到的引數 RNN模型，可以用左邊也可

卷積神經網路（Convolutional Neural Networks，CNNS/ConvNets）

卷積神經網路非常類似於普通的神經網路：它們都是由具有可以學習的權重和偏置的神經元組成。每一個神經元接收一些輸入，然後進行點積和可選的非線性運算。而整個網路仍然表示一個可微的得分函式：從原始的影象畫素對映到類得分。在最後一層（全連線層）也有損失函

卷積神經網路：Convolutional Neural Networks(CNN)

卷積神經網路是一種多層神經網路，擅長處理影象特別是大影象的相關機器學習問題。卷積網路通過一系列方法，成功將資料量龐大的影象識別問題不斷降維，最終使其能夠被訓練。CNN最早由Yann LeCun提出並應用在手寫字型識別上（MINST）。LeCun提出的網路稱為LeNet，其網路結構如下：這是一個最典

Stanford機器學習---第五講. 神經網路的學習 Neural Networks learning

轉載自：http://blog.csdn.net/dan1900/article/details/17787917 本欄目（Machine learning）包括單引數的線性迴歸、多引數的線性迴歸、Octave Tutorial、Logistic Regression、

深度學習之文字分類模型-前饋神經網路(Feed-Forward Neural Networks)

目錄DAN(Deep Average Network)Fasttextfasttext文字分類fasttext的n-gram模型Doc2vec DAN(Deep Average Network) MLP（Multi-Layer Perceptrons）叫做多層感知機，即由多層網路簡單堆疊而成，進而我們可以在輸