轉載自吳恩達老師深度學習課程作業notebook

Residual Networks

Welcome to the second assignment of this week! You will learn how to build very deep convolutional networks, using Residual Networks (ResNets). In theory, very deep networks can represent very complex functions; but in practice, they are hard to train. Residual Networks, introduced by

He et al., allow you to train much deeper networks than were previously practically feasible.

In this assignment, you will:

Implement the basic building blocks of ResNets.
Put together these building blocks to implement and train a state-of-the-art neural network for image classification.

This assignment will be done in Keras.

Before jumping into the problem, let’s run the cell below to load the required packages.

import numpy as np
from keras import layers
from keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, 
 MaxPooling2D, GlobalMaxPooling2D
from keras.models import Model, load_model
from keras.preprocessing import image
from keras.utils import layer_utils
from keras.utils.data_utils import get_file
from keras.applications.imagenet_utils import preprocess_input
import pydot
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
from keras.utils import plot_model
from resnets_utils import *
from keras.initializers import glorot_uniform
import scipy.misc
from matplotlib.pyplot import imshow
%matplotlib inline

import keras.backend as K
K.set_image_data_format('channels_last')
K.set_learning_phase(1)

Using TensorFlow backend.

1 - The problem of very deep neural networks

Last week, you built your first convolutional neural network. In recent years, neural networks have become deeper, with state-of-the-art networks going from just a few layers (e.g., AlexNet) to over a hundred layers.

The main benefit of a very deep network is that it can represent very complex functions. It can also learn features at many different levels of abstraction, from edges (at the lower layers) to very complex features (at the deeper layers). However, using a deeper network doesn’t always help. A huge barrier to training them is vanishing gradients: very deep networks often have a gradient signal that goes to zero quickly, thus making gradient descent unbearably slow. More specifically, during gradient descent, as you backprop from the final layer back to the first layer, you are multiplying by the weight matrix on each step, and thus the gradient can decrease exponentially quickly to zero (or, in rare cases, grow exponentially quickly and “explode” to take very large values).

During training, you might therefore see the magnitude (or norm) of the gradient for the earlier layers descrease to zero very rapidly as training proceeds:

Figure 1 :Vanishing gradient
The speed of learning decreases very rapidly for the early layers as the network trains

You are now going to solve this problem by building a Residual Network!

2 - Building a Residual Network

In ResNets, a “shortcut” or a “skip connection” allows the gradient to be directly backpropagated to earlier layers:

Figure 2 : A ResNet block showing a skip-connection

The image on the left shows the “main path” through the network. The image on the right adds a shortcut to the main path. By stacking these ResNet blocks on top of each other, you can form a very deep network.

We also saw in lecture that having ResNet blocks with the shortcut also makes it very easy for one of the blocks to learn an identity function. This means that you can stack on additional ResNet blocks with little risk of harming training set performance. (There is also some evidence that the ease of learning an identity function–even more than skip connections helping with vanishing gradients–accounts for ResNets’ remarkable performance.)

Two main types of blocks are used in a ResNet, depending mainly on whether the input/output dimensions are same or different. You are going to implement both of them.

2.1 - The identity block

The identity block is the standard block used in ResNets, and corresponds to the case where the input activation (say $a^{[l]}$ ) has the same dimension as the output activation (say $a^{[l+2]}$ ). To flesh out the different steps of what happens in a ResNet’s identity block, here is an alternative diagram showing the individual steps:

Figure 3 : Identity block. Skip connection "skips over" 2 layers.

The upper path is the “shortcut path.” The lower path is the “main path.” In this diagram, we have also made explicit the CONV2D and ReLU steps in each layer. To speed up training we have also added a BatchNorm step. Don’t worry about this being complicated to implement–you’ll see that BatchNorm is just one line of code in Keras!

In this exercise, you’ll actually implement a slightly more powerful version of this identity block, in which the skip connection “skips over” 3 hidden layers rather than 2 layers. It looks like this:

Figure 4 : Identity block. Skip connection "skips over" 3 layers.

Here’re the individual steps.

First component of main path:

The first CONV2D has $F_1$ filters of shape (1,1) and a stride of (1,1). Its padding is “valid” and its name should be conv_name_base + '2a'. Use 0 as the seed for the random initialization.
The first BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2a'.
Then apply the ReLU activation function. This has no name and no hyperparameters.

Second component of main path:

The second CONV2D has $F_2$ filters of shape $(f,f)$ and a stride of (1,1). Its padding is “same” and its name should be conv_name_base + '2b'. Use 0 as the seed for the random initialization.
The second BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2b'.
Then apply the ReLU activation function. This has no name and no hyperparameters.

Third component of main path:

The third CONV2D has $F_3$ filters of shape (1,1) and a stride of (1,1). Its padding is “valid” and its name should be conv_name_base + '2c'. Use 0 as the seed for the random initialization.
The third BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2c'. Note that there is no ReLU activation function in this component.

Final step:

The shortcut and the input are added together.
Then apply the ReLU activation function. This has no name and no hyperparameters.

Exercise: Implement the ResNet identity block. We have implemented the first component of the main path. Please read over this carefully to make sure you understand what it is doing. You should implement the rest.

To implement the Conv2D step: See reference
To implement BatchNorm: See reference (axis: Integer, the axis that should be normalized (typically the channels axis))
For the activation, use: Activation('relu')(X)
To add the value passed forward by the shortcut: See reference

# GRADED FUNCTION: identity_block

def identity_block(X, f, filters, stage, block):
    """
    Implementation of the identity block as defined in Figure 3
    
    Arguments:
    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
    f -- integer, specifying the shape of the middle CONV's window for the main path
    filters -- python list of integers, defining the number of filters in the CONV layers of the main path
    stage -- integer, used to name the layers, depending on their position in the network
    block -- string/character, used to name the layers, depending on their position in the network
    
    Returns:
    X -- output of the identity block, tensor of shape (n_H, n_W, n_C)
    """
    
    # defining name basis
    conv_name_base = 'res_' + str(stage) + block + '_branch'
    bn_name_base = 'bn_' + str(stage) + block + '_branch'
#     print (conv_name_base)
#     print (bn_name_base)
    # Retrieve Filters
    F1, F2, F3 = filters
    
    # Save the input value. You'll need this later to add back to the main path. 
    X_shortcut = X
    
    # First component of main path
    X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,name = bn_name_base + '2a')(X)
    X = Activation('relu')(X)
    
    ### START CODE HERE ###
    
    # Second component of main path (≈3 lines)
    X = Conv2D(filters = F2,kernel_size = (f,f),strides = (1,1),padding = 'same',name = conv_name_base + '2b',kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,name = bn_name_base + '2b')(X)
    X = Activation('relu')(X)

    # Third component of main path (≈2 lines)
    X = Conv2D(filters = F3,kernel_size = (1,1),strides = (1,1),padding = 'valid',name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,name = bn_name_base + '2c')(X)

    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
    X = Add()([X, X_shortcut])
    X = Activation('relu')(X)
    
    ### END CODE HERE ###
    
    return X

tf.reset_default_graph()

with tf.Session() as test:
    np.random.seed(1)
    A_prev = tf.placeholder("float", [3, 4, 4, 6])
    X = np.random.randn(3, 4, 4, 6)
    A = identity_block(A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = 'a')
    test.run(tf.global_variables_initializer())
    out = test.run([A], feed_dict={A_prev: X, K.learning_phase(): 0})
    print("out = " + str(out[0][1][1][0]))

out = [ 0.19716817 -0.          1.3561227   2.1713073  -0.          1.3324987 ]

Expected Output:

out

[ 0.94822985 0. 1.16101444 2.747859 0. 1.36677003]

2.2 - The convolutional block

You’ve implemented the ResNet identity block. Next, the ResNet “convolutional block” is the other type of block. You can use this type of block when the input and output dimensions don’t match up. The difference with the identity block is that there is a CONV2D layer in the shortcut path:

Figure 4 : Convolutional block

The CONV2D layer in the shortcut path is used to resize the input $x$ to a different dimension, so that the dimensions match up in the final addition needed to add the shortcut value back to the main path. (This plays a similar role as the matrix $W_s$ discussed in lecture.) For example, to reduce the activation dimensions’s height and width by a factor of 2, you can use a 1x1 convolution with a stride of 2. The CONV2D layer on the shortcut path does not use any non-linear activation function. Its main role is to just apply a (learned) linear function that reduces the dimension of the input, so that the dimensions match up for the later addition step.

The details of the convolutional block are as follows.

First component of main path:

The first CONV2D has $F_1$ filters of shape (1,1) and a stride of (s,s). Its padding is “valid” and its name should be conv_name_base + '2a'.
The first BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2a'.
Then apply the ReLU activation function. This has no name and no hyperparameters.

Second component of main path:

The second CONV2D has $F_2$ filters of (f,f) and a stride of (1,1). Its padding is “same” and it’s name should be conv_name_base + '2b'.
The second BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2b'.
Then apply the ReLU activation function. This has no name and no hyperparameters.

Third component of main path:

The third CONV2D has $F_3$ filters of (1,1) and a stride of (1,1). Its padding is “valid” and it’s name should be conv_name_base + '2c'.
The third BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2c'. Note that there is no ReLU activation function in this component.

Shortcut path:

The CONV2D has $F_3$ filters of shape (1,1) and a stride of (s,s). Its padding is “valid” and its name should be conv_name_base + '1'.
The BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '1'.

Final step:

The shortcut and the main path values are added together.
Then apply the ReLU activation function. This has no name and no hyperparameters.

Exercise: Implement the convolutional block. We have implemented the first component of the main path; you should implement the rest. As before, always use 0 as the seed for the random initialization, to ensure consistency with our grader.

Conv Hint
BatchNorm Hint (axis: Integer, the axis that should be normalized (typically the features axis))
For the activation, use: Activation('relu')(X)
Addition Hint

# GRADED FUNCTION: convolutional_block

def convolutional_block(X, f, filters, stage, block, s = 2):
    """
    Implementation of the convolutional block as defined in Figure 4
    
    Arguments:
    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
    f -- integer, specifying the shape of the middle CONV's window for the main path
    filters -- python list of integers, defining the number of filters in the CONV layers of the main path
    stage -- integer, used to name the layers, depending on their position in the network
    block -- string/character, used to name the layers, depending on their position in the network
    s -- Integer, specifying the stride to be used
    
    Returns:
    X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)
    """
    
    # defining name basis
    conv_name_base = 'res_' + str(stage) + block + '_branch'
    bn_name_base = 'bn_' + str(stage) + block + '_branch'
    
    # Retrieve Filters
    F1, F2, F3 = filters
    
    # Save the input value
    X_shortcut = X


    ##### MAIN PATH #####
    # First component of main path 
    X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
    X = Activation('relu')(X)
    
    ### START CODE HERE ###

    # Second component of main path (≈3 lines)
    X = Conv2D(F2, (f, f), strides = (1,1), padding = 'same',name = conv_name_base + '2b', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
    X = Activation('relu')(X)

    # Third component of main path (≈2 lines)
    X = Conv2D(F3, (1, 1), strides = (1,1), padding = 'valid',name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)

    ##### SHORTCUT PATH #### (≈2 lines)
    X_shortcut = Conv2D(F3, (1, 1), strides = (s,s), padding = 'valid',name = conv_name_base + '1', kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
    X_shortcut = BatchNormalization(axis = 3, name = bn_name_base + '1')(X_shortcut)

    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
    X = Add()([X, X_shortcut])
    X = Activation('relu')(X)
    
    ### END CODE HERE ###
    
    return X

tf.reset_default_graph()

with tf.Session() as test:
    np.random.seed(1)
    A_prev = tf.placeholder("float", [3, 4, 4, 6])
    X = np.random.randn(3, 4, 4, 6)
    A = convolutional_block(A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = 'a')
    test.run(tf.global_variables_initializer())
    out = test.run([A], feed_dict={A_prev: X, K.learning_phase(): 0})
    print("out = " + str(out[0][1][1][0]))

out = [ 0.09018461  1.2348979   0.46822017  0.03671762 -0.          0.65516603]

Expected Output:

out

[ 0.09018463 1.23489773 0.46822017 0.0367176 0. 0.65516603]

3 - Building your first ResNet model (50 layers)

You now have the necessary blocks to build a very deep ResNet. The following figure describes in detail the architecture of this neural network. “ID BLOCK” in the diagram stands for “Identity block,” and “ID BLOCK x3” means you should stack 3 identity blocks together.

Figure 5 : ResNet-50 model

The details of this ResNet-50 model are:

Zero-padding pads the input with a pad of (3,3)
Stage 1:
- The 2D Convolution has 64 filters of shape (7,7) and uses a stride of (2,2). Its name is “conv1”.
- BatchNorm is applied to the channels axis of the input.
- MaxPooling uses a (3,3) window and a (2,2) stride.
Stage 2:
- The convolutional block uses three set of filters of size [64,64,256], “f” is 3, “s” is 1 and the block is “a”.
- The 2 identity blocks use three set of filters of size [64,64,256], “f” is 3 and the blocks are “b” and “c”.
Stage 3:
- The convolutional block uses three set of filters of size [128,128,512], “f” is 3, “s” is 2 and the block is “a”.
- The 3 identity blocks use three set of filters of size [128,128,512], “f” is 3 and the blocks are “b”, “c” and “d”.
Stage 4:
- The convolutional block uses three set of filters of size [256, 256, 1024], “f” is 3, “s” is 2 and the block is “a”.
- The 5 identity blocks use three set of filters of size [256, 256, 1024], “f” is 3 and the blocks are “b”, “c”, “d”, “e” and “f”.
Stage 5:
- The convolutional block uses three set of filters of size [512, 512, 2048], “f” is 3, “s” is 2 and the block is “a”.
- The 2 identity blocks use three set of filters of size [512, 512, 2048], “f” is 3 and the blocks are “b” and “c”.
The 2D Average Pooling uses a window of shape (2,2) and its name is “avg_pool”.
The flatten doesn’t have any hyperparameters or name.
The Fully Connected (Dense) layer reduces its input to the number of classes using a softmax activation. Its name should be 'fc' + str(classes).

Exercise: Implement the ResNet with 50 layers described in the figure above. We have implemented Stages 1 and 2. Please implement the rest. (The syntax for implementing Stages 3-5 should be quite similar to that of Stage 2.) Make sure you follow the naming convention in the text above.

You’ll need to use this function:

Average pooling see reference

Here’re some other functions we used in the code below:

Conv2D: See reference
BatchNorm: See reference (axis: Integer, the axis that should be normalized (typically the features axis))
Zero padding: See reference
Max pooling: See reference
Fully conected layer: See reference
Addition: See reference

# GRADED FUNCTION: ResNet50

def ResNet50(input_shape = (64, 64, 3), classes = 6):
    """
    Implementation of the popular ResNet50 the following architecture:
    CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3
    -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER

    Arguments:
    input_shape -- shape of the images of the dataset
    classes -- integer, number of classes

    Returns:
    model -- a Model() instance in Keras
    """
    
    # Define the input as a tensor with shape input_shape
    X_input = Input(input_shape)

    
    
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    吳恩達深度學習4.2練習_Convolutional Neural Networks_Residual Networks
       
  
  
 轉載自吳恩達老師深度學習課程作業notebook 
 Residual Networks 
 Welcome to the second assignment of this week! You will learn how to build very deep convolutional  

  
 

    

    
    吳恩達深度學習4.2練習_Convolutional Neural Networks_Happy House & Residual Networks
       
  
  
 1、Happy House 
  
  1.1、 Load Dataset 
  
  
  1.2、構建流圖：def HappyModel 
  
  
  1.3、PlaceHolder --> happyModel = HappyModel((64,64,3)) 
  
  
  

  
 

    

    
    吳恩達深度學習4.3練習_Convolutional Neural Networks_Car detection
       
  
  
 轉載自吳恩達老師深度學習課程作業notebook 
 Autonomous driving - Car detection 
 Welcome to your week 3 programming assignment. You will learn about object detecti 

  
 

    

    
    吳恩達深度學習4.1練習_Convolutional Neural Networks_Convolution_model_Application_2
       
  
  
 版權宣告：本文為博主原創文章，未經博主允許不得轉載。 https://blog.csdn.net/weixin_42432468 
 學習心得： 1、每週的視訊課程看一到兩遍 2、做筆記 
 3、做每週的作業練習，這個裡面的含金量非常高。先根據notebook過一遍，掌握後一定要自己敲一遍， 

  
 

    

    
    吳恩達深度學習4.1練習_Convolutional Neural Networks_Convolution_model_StepByStep_1
       
  
  
 轉載自吳恩達老師深度學習練習notebook 
 Convolutional Neural Networks: Step by Step 
 Welcome to Course 4’s first assignment! In this assignment, you will implem 

  
 

    

    
    吳恩達深度學習4-Week2課後作業2-殘差網路
       
 
 
 一、Deeplearning-assignment 
 在本次作業中，我們將學習如何通過殘差網路(ResNets)建立更深的卷及網路。理論上，深層次的網路可以表示非常複雜的函式，但在實踐中，他們是很難建立和訓練的。殘差網路使得建立比以前更深層次的網路成為可能。對於殘差網路的詳細講解，具體可參考該 

  
 

    

    
    吳恩達深度學習4-Week4課後作業2-Neural Style Transfer
       
 
 
 一、Deeplearning-assignment 
 在本節的學習中，我們將學習神經風格遷移（Neural Style Transfer）演算法，通過該演算法使得兩張不同風格的圖片融合成一張圖片。 
 問題描述：神經風格遷移演算法是深度學習中的一種有趣的技術。正如下面的圖片所示，演算法將兩種圖 

  
 

    

    
    吳恩達深度學習4.4練習_Convolutional Neural Networks_Face Recognition for the Happy House
       
  
  
 轉載自吳恩達老師深度學習課程作業notebook 
 Face Recognition for the Happy House 
 Welcome to the first assignment of week 4! Here you will build a face recognitio 

  
 

    

    
    吳恩達深度學習總結(2)
       
  
  
 
 
  DeaplearningAI01.weak3
  
   回顧 Logistic Regression
   淺層神經網路(只有一層隱藏單元)
   
    網路中每個符號的含義
   
   啟用函式的選擇
   
    可選函式
    啟用函式的選擇
    使用非線性啟 

  
 

    

    
    吳恩達深度學習4-Week2課後作業1-Keras-Happy House
       
 
 
 一、Deeplearning-assignment 
 在本週的任務中，將開始學習使用Keras： 
 
  學習使用Keras，這是一個用Python編寫的高階神經網路API（程式設計框架），能夠在包括TensorFlow和CNTK在內的幾個底層框架上執行。 
  看看如何在幾個小時內建立一個 

  
 

    

    
    吳恩達深度學習4-Week1課後作業1-卷積模型Step by Step
      
                一、deeplearning-assignment

在本次任務中，我們將學習用numpy實現卷積（CONV）層和池化（POOL）層，由於大多數深度學習工程師不需要關注反向傳遞的細節，而且卷積網路的反向傳遞很複雜，所以在本次作業中只討論關於前向傳播的處理細節。

用 pyth 

  
 

    

    
    吳恩達深度學習2.3練習_Improving Deep Neural Networks_Tensorflow
       
  
  
 轉載自吳恩達老師深度學習練習notebook 
 TensorFlow Tutorial 
 Welcome to this week’s programming assignment. Until now, you’ve always used numpy to build neural  

  
 

    

    
    吳恩達深度學習2.2練習_Improving Deep Neural Networks_Optimization
       
  
  
 版權宣告：本文為博主原創文章，未經博主允許不得轉載。 https://blog.csdn.net/weixin_42432468 
 學習心得： 1、每週的視訊課程看一到兩遍 2、做筆記 
 3、做每週的作業練習，這個裡面的含金量非常高。先根據notebook過一遍，掌握後一定要自己敲一遍， 

  
 

    

    
    吳恩達深度學習2.1練習_Improving Deep Neural Networks(Initialization_Regularization_Gradientchecking)
       
  
  
 版權宣告：本文為博主原創文章，未經博主允許不得轉載。 https://blog.csdn.net/weixin_42432468 
 學習心得： 1、每週的視訊課程看一到兩遍 2、做筆記 
 3、做每週的作業練習，這個裡面的含金量非常高。先根據notebook過一遍，掌握後一定要自己敲一遍， 

  
 

    

    
    吳恩達深度學習2.1練習_Improving Deep Neural Networks_initialization
       
  
  
 轉載自吳恩達老師深度學習練習notebook 
 Initialization 
 Welcome to the first assignment of “Improving Deep Neural Networks”. 
 Training your neural network requ 

  
 

    

    
    吳恩達深度學習課程deeplearning.ai課程作業：Class 4 Week 2 Residual Networks
      
							
							
							吳恩達deeplearning.ai課程作業，自己寫的答案。 
補充說明： 
 1. 評論中總有人問為什麼直接複製這些notebook執行不了？請不要直接複製貼上，不可能執行通過的，這個只是notebook中我們要自己寫的那部分，要正確執行還需要其他py檔案，請 

  
 

    

    
    吳恩達深度學習課程deeplearning.ai課程作業：Class 4 Week 2 Keras
      
							
							
							吳恩達deeplearning.ai課程作業，自己寫的答案。 
補充說明： 
 1. 評論中總有人問為什麼直接複製這些notebook執行不了？請不要直接複製貼上，不可能執行通過的，這個只是notebook中我們要自己寫的那部分，要正確執行還需要其他py檔案，請 

  
 

    

    
    吳恩達-深度學習-課程筆記-3: Python和向量化( Week 2 )
      有時   指數   檢查   都是   效果   很快   -1   tro   str   1 向量化( Vectorization )
在邏輯回歸中，以計算z為例，z = w的轉置和x進行內積運算再加上b，你可以用for循環來實現。
但是在python中z可以調用numpy的方法，直接一句z = np.d 

  
 

    

    
    吳恩達深度學習專項課程2學習筆記/week2/Optimization Algorithms
      sce   適應   耗時   bubuko   優化算法   src   bat   -a   過程   Optimization algorithms
優化算法以加速訓練。
Mini-batch gradient descend

Batch gradient descend：每一小步梯度下降否需要計算所 

  
 

    

    
    吳恩達深度學習2-Week2課後作業3-優化演算法
       
 
 
 一、deeplearning-assignment 
 到目前為止，在之前的練習中我們一直使用梯度下降來更新引數並最小化成本函式。在本次作業中，將學習更先進的優化方法，它在加快學習速度的同時，甚至可以獲得更好的最終值。一個好的優化演算法可以讓你幾個小時內就獲得一個結果，而不是等待幾天。 
 1.