1. 程式人生 > >Pytorch入門學習(五)---- 示例講解Tensor, Autograd, nn.module

Pytorch入門學習(五)---- 示例講解Tensor, Autograd, nn.module

提示:我覺得大部分的人可以直接看文章最後,你覺得呢?

Tensors

雖然python有Numpy這樣的框架,但Numpy是不支援GPU的。Pytorch的主要兩個特性就是:N維Tensor以及自動求導。

兩層網路模型,單純用Tensor實現:

# -*- coding: utf-8 -*-

import torch


dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random input and output data x = torch.randn(N, D_in).type(dtype) y = torch.randn(N, D_out).type(dtype) # Randomly initialize weights w1 = torch.randn(D_in, H).type(dtype) w2 = torch.randn(H, D_out).type(dtype) learning_rate = 1e-6
for t in range(500): # Forward pass: compute predicted y h = x.mm(w1) h_relu = h.clamp(min=0) y_pred = h_relu.mm(w2) # Compute and print loss loss = (y_pred - y).pow(2).sum() print(t, loss) #手動寫求導 # Backprop to compute gradients of w1 and w2 with respect to loss grad_y_pred = 2.0
* (y_pred - y) grad_w2 = h_relu.t().mm(grad_y_pred) grad_h_relu = grad_y_pred.mm(w2.t()) grad_h = grad_h_relu.clone() grad_h[h < 0] = 0 grad_w1 = x.t().mm(grad_h) # Update weights using gradient descent w1 -= learning_rate * grad_w1 w2 -= learning_rate * grad_w2

Autograd

如何讓求導自動,這時候進入第二階段。我們需要把網路中的所有變數wrap到 Variable物件中。Variable物件是代表計算圖中的一個節點,這種節點有 x.data代表Tensor,和x.grad代表其梯度。
值得注意的是,Pytorch Variables和Pytorch Tensors幾乎具有所有相同的的API,唯一的不同就是Variables可以是定義了一個計算圖,可以自動求導。所以牛頓說:想要自動求導,那就用Variable包一包?

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass.
# 輸入求導幹嘛,一點用都沒有,所以 requires_grad = False
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Variables; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Variables.
    # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
    # (1,); loss.data[0] is a scalar value holding the loss.
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Variables with requires_grad=True.
    # After this call w1.grad and w2.grad will be Variables holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Update weights using gradient descent; w1.data and w2.data are Tensors,
    # w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
    # Tensors.
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

自定義自動求導的函式

我們可以通過 構造torch.autograd.Function的子類,從而過載forwardbackward函式就行。

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable


class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """

    def forward(self, input):
        """
        In the forward pass we receive a Tensor containing the input and return a
        Tensor containing the output. You can cache arbitrary Tensors for use in the
        backward pass using the save_for_backward method.
        """
        self.save_for_backward(input)  #這個要注意!
        return input.clamp(min=0)

    def backward(self, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = self.saved_tensors   #這個要注意!
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input


dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Construct an instance of our MyReLU class to use in our network
    relu = MyReLU()

    # Forward pass: compute predicted y using operations on Variables; we compute
    # ReLU using our custom autograd operation.
    y_pred = relu(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass.
    loss.backward()

    # Update weights using gradient descent
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

TensorFlow: 靜態圖

這個討論其實很多了,就是靜態圖和動態圖的區別,簡單來說靜態圖是預先定義好的,可以針對整個預定義的網路進行優化,然後每次就在相同的計算圖上不斷喂入資料,跑啊跑。 然而Pytorch使用的是動態圖,動態圖可以讓每輪前向傳播時,都可以定義一個新的計算圖!
另外一個區別是,在控制流中的不同,比如在RNN中對每個計算資料點在每次計算中都需要展開,這種展開用迴圈就可以實現。靜態圖的迴圈結構需要成為整個圖的一部分,所以tf提供了tf.scan來將這種迴圈巢狀進圖中。如果用動態圖,就沒必要把這種迴圈結構作為圖的一部分了,用平常的控制流就行。

nn module

上面構建網路是直接用Tensor來的,顯然很麻煩。我們可以將那些經常要用的計算弄進layers,於是就有了 Modules

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Variables for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Variable of input data to the Module and it produces
    # a Variable of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Variables containing the predicted and true
    # values of y, and the loss function returns a Variable containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Variables with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Variable, so
    # we can access its data and gradients like we did before.
    for param in model.parameters():
        param.data -= learning_rate * param.grad.data

文章最後

你可能直接看下面就夠了。

Pytorch:optim

可以看到,上面更新引數都是手動的,顯然很low。pytorch有optim包,從而實現各種複雜的優化器:SGDAdaGrad,RMSProp,Adam等等。

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(size_average=False)

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Variables it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable weights
    # of the model)
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

自定義nn Modules

有時候我們想更好的定義網路,下面這種是直接呼叫nn.Sequential構建的 “順序網路”。

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

另一種可以定義更加複雜的網路方式是:

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Pytorch: Control Flow + Weight Sharing

假想構建: 全連線Relu網路,每次前向時隨機選擇隱藏層的數目是1-4層。注意,每層隱藏層的引數都是一樣的,是共享的!那麼顯然啊,我們在每次前向時都要向網路中加入1-4層的,我訓練多次,每次的網路結構都是不同的。

# -*- coding: utf-8 -*-
import random
import torch
from torch.autograd import Variable


class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we construct three nn.Linear instances that we will use
        in the forward pass.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once.
        """
        h_relu = self.input_linear(x).clamp(min=0)
        # 每次構建隨機1-4層數的相同的隱藏層(weight sharing)
        # 就是前面所說的,這一層通過權值共享,實現了在網路中使用多次!
        # 這對Lua Torch是很大的一種提升!
        # 在模型定義中使用控制結構,如for等,可以!
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

相關推薦

Pytorch入門學習---- 示例講解Tensor, Autograd, nn.module

提示:我覺得大部分的人可以直接看文章最後,你覺得呢? Tensors 雖然python有Numpy這樣的框架,但Numpy是不支援GPU的。Pytorch的主要兩個特性就是:N維Tensor以及自動求導。 兩層網路模型,單純用Tensor實現:

NS2入門學習之分裂物件模型和TclCL

 TclCL其實就是連線C++與Otcl,實現兩者的互相操作和兩者之間類的對應. NS中使用兩種語言原因: C++執行速度較快,是強制型別語言(進行嚴格的資料型別檢查),容易實現複雜的資料型別和精確/複雜的演算法。但是修改/debug和重新編譯時間較長,所以適合完成網路協

Pytorch入門學習---- 多GPU的使用

DataParrallel import torch.nn as nn class DataParallelModel(nn.Module): def __init__(self):

Pytorch入門學習---detach()的作用從GAN程式碼分析)

(八)還沒寫,先跳過。。。 總說 簡單來說detach就是截斷反向傳播的梯度流。 def detach(self): """Returns a new Variable, detached from the current gr

Pytorch入門學習-----自定義層的實現甚至不可導operation的backward寫法

總說 雖然pytorch可以自動求導,但是有時候一些操作是不可導的,這時候你需要自定義求導方式。也就是所謂的 “Extending torch.autograd”. 官網雖然給了例子,但是很簡單。這裡將會更好的說明。 擴充套件 torch.autogra

基於PyTorch的深度學習入門教程——訓練神經網路分類器

前言本文參考PyTorch官網的教程,分為五個基本模組來介紹PyTorch。為了避免文章過長,這五個模組分別在五篇博文中介紹。本文是關於Part4的內容。Part4:訓練一個神經網路分類器前面已經介紹了

NodeJS簡易部落格系統NodeJS入門學習

一、模組 在NodeJS中,一般將程式碼合理拆分到不同的JS檔案中,每一個檔案就是一個模組,而檔案路徑就是模組名。在編寫每個模組時,都有require、exports、module三個預先定義好的變數可供使用。 1、require require函式用於在當前模組中載入和使用別的模組,傳

java_入門基礎學習

這個帖子主要記錄程式碼語句使用方法 判斷語句 1.if 、else、else if 判斷方法 //語法 if(條件){ 滿足條件後執行的程式碼邏輯 } if(條件){ 滿足條件後

NS2入門學習之tcl指令碼示例

1、示例分析 tcl語法與python相差不多,以《NS與網路模擬》中的第一個tcl指令碼為例,學習如下:#建立一個Simulator物件的例項並把它賦值給變數ns set ns [new Simulator] #開啟一個名為linktrace.tr的檔案,用來記錄模擬過程

Android動畫學習之屬性動畫實現Tween的效果和高階屬性示例

一、基本介紹 Golang設計者為了程式設計方便,提供了一些函式,這些函式可以直接使用,我們稱為Go的內建函式。文件:https://studygolang.com/pkgdoc -> buil

JAVA學習

java學習 步驟 字符串 關系運算符 幫助 條件判斷 方式 -- lean 一、 調試 步驟1:設置斷點(不能在空白處設置斷點) 步驟2:啟動調試 步驟3:調試代碼(F6單步跳過)筆記本Fn+F6(F5) 步驟4:結束調試 掌握調試的好處? 很清晰的看到,

Wpf學習 數據綁定Binding【小達原創】

per 學習 items style lock lis sof text 繼承 程序的本質是數據+算法,咱們這一篇就看看wpf程序中的數據是怎樣展現在前臺的。 1、咱們新建一個Wpf項目。為該項目增加【data】文件夾,在該文件夾中添加【Person】類,如下圖: 2、打

java學習java類繼承

author java學習 dex int demo [] color javac print 1.制作一個工具類的文檔 javadoc -d 目錄 -author -version arrayTool.java 實例: class arrayDemo {

python學習

module pen pre strong sdfs nbsp from cnblogs bar                   第五章 條件、循環和其他語句 5.1 print和import的更多信息 5.1.1 使用逗號輸出 >>> print

Guava源碼學習EventBus

mmu 類圖 null find sync fab cnblogs table array 基於版本:Guava 22.0 Wiki:EventBus 0. EventBus簡介 提供了發布-訂閱模型,可以方便的在EventBus上註冊訂閱者,發布者可以簡單的將事件傳遞

SQL 初級教程學習

san 16px and sql 逗號 括號 nbsp def per 1.DEFAULT 約束用於向列中插入默認值。 CREATE TABLE Orders(Id_O int NOT NULL,OrderNo int NOT NULL,Id_P int,OrderDate

wordpress學習----插件

ons php 插入 function delet 插件 name ext let wordpress加載順序:首先加載插件,再加載主題中的functions.php,初始化一些數據之類的,最後加載模板了!!! update_option("hc_copyright_tex

Netty源碼學習ChannelInitializer

tty 用戶 bst tran 之前 warnings run 相關 fab 0. ChannelInitializer簡介 直接用ChannelInitializer的註釋吧:A special ChannelInboundHandler which offers an

SQL server學習——T-SQL編程之存儲過程

mds ren creat 存儲 創建數據庫 希望 with 接收 bank 周五了,祝大家周末愉快。 之前一直在寫SQL server的分享,今天再來個T-SQL編程中的存儲過程。 存儲過程 存儲過程(procedure)類似於C語言中的函數,用來執行管理任務或應用復雜的

python學習---集合操作

symmetric 現在 集合 super 指定 沒有 出現 pri 元素 集合操作# 集合是個無序的,不重復的數據組合,其主要作用如下:# 1、去重,把一個列表變成集合,就自動去重了# 2、關系測試,測試兩個數據之前的交集、差集、並集等關系# 常用操作:s = set([