Pytorch之Tensor學習

阿新 • • 發佈：2021-06-30

Pytorch之Tensor學習

Tensors是與陣列和矩陣類似的資料結構，比如它與numpy 的ndarray類似，但tensors可以在GPU上執行。實際上，tensors和numpy陣列經常共用記憶體，消除了拷貝資料的需要。Tensors被優化的可以自動求微分。

import torch
import numpy as np

初始化Tensor

直接從資料

data=[[1,2],[3,4]]

x_data=torch.tensor(data)

x_data

tensor([[1, 2],
        [3, 4]])

從numpy陣列

np_array=np.array(data)

x_np=torch.tensor(np_array)

x_np

tensor([[1, 2],
        [3, 4]], dtype=torch.int32)

x_np=torch.from_numpy(np_array)

x_np

tensor([[1, 2],
        [3, 4]], dtype=torch.int32)

從另一個tensor

新tensor與引數tensor相比，保留了其特性（shape,datatype)等，除非顯式的替換：

x_ones=torch.ones_like(x_data);x_ones

tensor([[1, 1],
        [1, 1]])

x_rand=torch.rand_like(x_data,dtype=torch.float);x_rand

tensor([[0.1462, 0.1567],
        [0.6331, 0.8472]])

隨機或者恆定值

shape是tensor維度的元組

shape=(2,3)
rand_tensor=torch.rand(shape)
ones_tensor=torch.ones(shape)
zeros_tensor=torch.zeros(shape)
print(rand_tensor)
print(ones_tensor)
print(zeros_tensor)

tensor([[0.4811, 0.5744, 0.8909],
        [0.6602, 0.9882, 0.1145]])
tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[0., 0., 0.],
        [0., 0., 0.]])

Tensor的屬性

Tensor屬性為shape,datatype,被儲存在的裝置，device

tensor=torch.rand(3,4)

tensor.shape

torch.Size([3, 4])

tensor.dtype

torch.float32

tensor.device

device(type='cpu')

Tensor運算

超過100個tensor運算，包括算術，線性代數，矩陣操作（轉置，索引，切片），取樣等。每個運算都可以在GPU上進行（常常比在CPU上更快）

預設地，tensors在CPU上被建立。我們需要顯式的通過.to方法來將它移動到GPU上。在裝置間拷貝大型tensor對於時間和開銷都是高昂的。

if torch.cuda.is_available():
    tensor=tensor.to('cuda')

類似numpy的索引和切片：

tensor=torch.ones((4,4));tensor

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

tensor[0]

tensor([1., 1., 1., 1.])

tensor[:,0]

tensor([1., 1., 1., 1.])

tensor[...,-1]=100;tensor

tensor([[  1.,   1.,   1., 100.],
        [  1.,   1.,   1., 100.],
        [  1.,   1.,   1., 100.],
        [  1.,   1.,   1., 100.]])

tensor[:,1]=10;tensor

tensor([[  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.]])

除了常用的索引選擇資料，PyTorch還提供了一些高階的選擇函式：

help(torch.index_select)

Help on built-in function index_select:

index_select(...)
    index_select(input, dim, index, *, out=None) -> Tensor
    
    Returns a new tensor which indexes the :attr:`input` tensor along dimension
    :attr:`dim` using the entries in :attr:`index` which is a `LongTensor`.
    
    The returned tensor has the same number of dimensions as the original tensor
    (:attr:`input`).  The :attr:`dim`\ th dimension has the same size as the length
    of :attr:`index`; other dimensions have the same size as in the original tensor.
    
    .. note:: The returned tensor does **not** use the same storage as the original
              tensor.  If :attr:`out` has a different shape than expected, we
              silently change it to the correct shape, reallocating the underlying
              storage if necessary.
    
    Args:
        input (Tensor): the input tensor.
        dim (int): the dimension in which we index
        index (IntTensor or LongTensor): the 1-D tensor containing the indices to index
    
    Keyword args:
        out (Tensor, optional): the output tensor.
    
    Example::
    
        >>> x = torch.randn(3, 4)
        >>> x
        tensor([[ 0.1427,  0.0231, -0.5414, -1.0009],
                [-0.4664,  0.2647, -0.1228, -1.1068],
                [-1.1734, -0.6571,  0.7230, -0.6004]])
        >>> indices = torch.tensor([0, 2])
        >>> torch.index_select(x, 0, indices)
        tensor([[ 0.1427,  0.0231, -0.5414, -1.0009],
                [-1.1734, -0.6571,  0.7230, -0.6004]])
        >>> torch.index_select(x, 1, indices)
        tensor([[ 0.1427, -0.5414],
                [-0.4664, -0.1228],
                [-1.1734,  0.7230]])

help(torch.masked_select)

Help on built-in function masked_select:

masked_select(...)
    masked_select(input, mask, *, out=None) -> Tensor
    
    Returns a new 1-D tensor which indexes the :attr:`input` tensor according to
    the boolean mask :attr:`mask` which is a `BoolTensor`.
    
    The shapes of the :attr:`mask` tensor and the :attr:`input` tensor don't need
    to match, but they must be :ref:`broadcastable <broadcasting-semantics>`.
    
    .. note:: The returned tensor does **not** use the same storage
              as the original tensor
    
    Args:
        input (Tensor): the input tensor.
        mask  (BoolTensor): the tensor containing the binary mask to index with
    
    Keyword args:
        out (Tensor, optional): the output tensor.
    
    Example::
    
        >>> x = torch.randn(3, 4)
        >>> x
        tensor([[ 0.3552, -2.3825, -0.8297,  0.3477],
                [-1.2035,  1.2252,  0.5002,  0.6248],
                [ 0.1307, -2.0608,  0.1244,  2.0139]])
        >>> mask = x.ge(0.5)
        >>> mask
        tensor([[False, False, False, False],
                [False, True, True, True],
                [False, False, False, True]])
        >>> torch.masked_select(x, mask)
        tensor([ 1.2252,  0.5002,  0.6248,  2.0139])

help(torch.gather)

Help on built-in function gather:

gather(...)
    gather(input, dim, index, *, sparse_grad=False, out=None) -> Tensor
    
    Gathers values along an axis specified by `dim`.
    
    For a 3-D tensor the output is specified by::
    
        out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
        out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
        out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2
    
    :attr:`input` and :attr:`index` must have the same number of dimensions.
    It is also required that ``index.size(d) <= input.size(d)`` for all
    dimensions ``d != dim``.  :attr:`out` will have the same shape as :attr:`index`.
    Note that ``input`` and ``index`` do not broadcast against each other.
    
    Args:
        input (Tensor): the source tensor
        dim (int): the axis along which to index
        index (LongTensor): the indices of elements to gather
    
    Keyword arguments:
        sparse_grad (bool, optional): If ``True``, gradient w.r.t. :attr:`input` will be a sparse tensor.
        out (Tensor, optional): the destination tensor
    
    Example::
    
        >>> t = torch.tensor([[1, 2], [3, 4]])
        >>> torch.gather(t, 1, torch.tensor([[0, 0], [1, 0]]))
        tensor([[ 1,  1],
                [ 4,  3]])

可以用torch.cat來合併tensor，沿著某個方向，另外還有torch.stack，這稍微與torch.cat有些不一樣。

t1=torch.cat([tensor,tensor,tensor],dim=1);t1

tensor([[  1.,  10.,   1., 100.,   1.,  10.,   1., 100.,   1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.,   1.,  10.,   1., 100.,   1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.,   1.,  10.,   1., 100.,   1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.,   1.,  10.,   1., 100.,   1.,  10.,   1., 100.]])

torch.cat([tensor,tensor,tensor],dim=0)

tensor([[  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.],
        [  1.,  10.,   1., 100.]])

cat和stack的區別在於前者會再增加現有維度的值，可以理解為續接，後者會增加一個維度，可以理解為疊加。

a=torch.arange(0,12).reshape(3,4)

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

torch.cat([a,a]).shape

torch.Size([6, 4])

torch.stack([a,a]).shape

torch.Size([2, 3, 4])

torch.cat([a,a])

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

torch.stack([a,a])

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]]])

算術運算

tensor=torch.arange(0,9).reshape(3,3);tensor

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])

以下計算了tensor之間的矩陣乘法，y1,y2的值相同

[email protected]

y1

tensor([[  5,  14,  23],
        [ 14,  50,  86],
        [ 23,  86, 149]])

y2=tensor.matmul(tensor.T)

y2

tensor([[  5,  14,  23],
        [ 14,  50,  86],
        [ 23,  86, 149]])

y3=torch.empty(3,3)
torch.add(tensor,tensor.T,out=y3)
print(y3)

tensor([[ 0.,  4.,  8.],
        [ 4.,  8., 12.],
        [ 8., 12., 16.]])

單元素tensor，比如通過aggregate所有值得到一個值，那麼就可以通過item()來得到Python的數值。

agg=tensor.sum();agg

tensor(36)

agg_item=agg.item();agg_item

在位操作，那些把結果儲存在運算數的運算被稱為在位操作，可以用_來標識。比如x.copy_(y)，x.t_()將會改變x。

tensor

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])

tensor.add_(5)

tensor([[ 5,  6,  7],
        [ 8,  9, 10],
        [11, 12, 13]])

tensor

tensor([[ 5,  6,  7],
        [ 8,  9, 10],
        [11, 12, 13]])

在位運算可能會省儲存空間，但當計算導數的時候，會出錯，因此不建議使用。

與numpy 陣列的相互轉換

使用numpy()和from_numpy()將tensor和numpy陣列相互轉換。但需要注意的是：這兩個函式所產生的tensor和Numpy的陣列共享相同的記憶體（所以它們之間的轉換很快），改變其中一個就改變了另一個！

Tensor to Numpy array

t=torch.ones(5)

tensor([1., 1., 1., 1., 1.])

n=t.numpy();n

array([ 1.,  1.,  1.,  1.,  1.], dtype=float32)

t.add_(1)

tensor([2., 2., 2., 2., 2.])

Numpy array to Tensor

n=np.ones(5)
t=torch.from_numpy(n)
t

tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

np.add(n,1,out=n)

array([ 2.,  2.,  2.,  2.,  2.])

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

array([ 2.,  2.,  2.,  2.,  2.])

此外，除了上面的方法，還有一個常用的方法就算直接使用torch.tensor()將numpy陣列轉換為tensor，需要注意的的是該方法總是會進行資料拷貝，返回的tensor和原來的資料不再共享記憶體。

a=np.arange(9).reshape(3,3)
c=torch.tensor(a)
a+=1
print(c)
print(a)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]], dtype=torch.int32)
[[1 2 3]
 [4 5 6]
 [7 8 9]]

View()

用view()來改變tensor的形狀，該方法返回的新tensor與源tensor共享記憶體（其實是同一個tensor),也即更改其中的一個，另外一個也會跟著改變。具有相同功能的reshape,也不能保證返回的是其拷貝。

x=torch.randn(5,3);x

tensor([[-0.5722, -0.4844,  1.5515],
        [-0.2504,  0.2010,  0.0182],
        [ 0.0400,  0.0397,  2.0167],
        [ 1.8868, -0.4670,  0.5968],
        [ 0.9070,  0.5825, -1.0549]])

y=x.view(15);y

tensor([-0.5722, -0.4844,  1.5515, -0.2504,  0.2010,  0.0182,  0.0400,  0.0397,
         2.0167,  1.8868, -0.4670,  0.5968,  0.9070,  0.5825, -1.0549])

y[0]=100

tensor([[ 1.0000e+02, -4.8445e-01,  1.5515e+00],
        [-2.5042e-01,  2.0102e-01,  1.8231e-02],
        [ 3.9969e-02,  3.9711e-02,  2.0167e+00],
        [ 1.8868e+00, -4.6697e-01,  5.9683e-01],
        [ 9.0702e-01,  5.8254e-01, -1.0549e+00]])

z=x.view(-1,5);z

tensor([[ 1.0000e+02, -4.8445e-01,  1.5515e+00, -2.5042e-01,  2.0102e-01],
        [ 1.8231e-02,  3.9969e-02,  3.9711e-02,  2.0167e+00,  1.8868e+00],
        [-4.6697e-01,  5.9683e-01,  9.0702e-01,  5.8254e-01, -1.0549e+00]])

q=x.reshape(15);q

tensor([ 1.0000e+02, -4.8445e-01,  1.5515e+00, -2.5042e-01,  2.0102e-01,
         1.8231e-02,  3.9969e-02,  3.9711e-02,  2.0167e+00,  1.8868e+00,
        -4.6697e-01,  5.9683e-01,  9.0702e-01,  5.8254e-01, -1.0549e+00])

q[0]=250;x

tensor([[ 2.5000e+02, -4.8445e-01,  1.5515e+00],
        [-2.5042e-01,  2.0102e-01,  1.8231e-02],
        [ 3.9969e-02,  3.9711e-02,  2.0167e+00],
        [ 1.8868e+00, -4.6697e-01,  5.9683e-01],
        [ 9.0702e-01,  5.8254e-01, -1.0549e+00]])

如果我們想要返回一個真正新的副本（即不共享記憶體），可以先用clone創造一個副本，再用view

x_cp=x.clone().view(15)
x-=1
print(x)
print(x_cp)

tensor([[ 2.4900e+02, -1.4844e+00,  5.5149e-01],
        [-1.2504e+00, -7.9898e-01, -9.8177e-01],
        [-9.6003e-01, -9.6029e-01,  1.0167e+00],
        [ 8.8677e-01, -1.4670e+00, -4.0317e-01],
        [-9.2979e-02, -4.1746e-01, -2.0549e+00]])
tensor([ 2.5000e+02, -4.8445e-01,  1.5515e+00, -2.5042e-01,  2.0102e-01,
         1.8231e-02,  3.9969e-02,  3.9711e-02,  2.0167e+00,  1.8868e+00,
        -4.6697e-01,  5.9683e-01,  9.0702e-01,  5.8254e-01, -1.0549e+00])

使用clone還有一個好處就是會記錄在計算圖中，即梯度回傳到副本時也會傳到源tensor.
另外一個常用的函式就是item()，它可以將一個標量tensor轉換為python number

x=torch.randn(1);x

tensor([-0.9871])

x.item()

-0.9870905876159668

線性代數

跡：torch.trace

help(torch.trace)

Help on built-in function trace:

trace(...)
    trace(input) -> Tensor
    
    Returns the sum of the elements of the diagonal of the input 2-D matrix.
    
    Example::
    
        >>> x = torch.arange(1., 10.).view(3, 3)
        >>> x
        tensor([[ 1.,  2.,  3.],
                [ 4.,  5.,  6.],
                [ 7.,  8.,  9.]])
        >>> torch.trace(x)
        tensor(15.)

對角線元素：torch.diag

help(torch.diag)

Help on built-in function diag:

diag(...)
    diag(input, diagonal=0, *, out=None) -> Tensor
    
    - If :attr:`input` is a vector (1-D tensor), then returns a 2-D square tensor
      with the elements of :attr:`input` as the diagonal.
    - If :attr:`input` is a matrix (2-D tensor), then returns a 1-D tensor with
      the diagonal elements of :attr:`input`.
    
    The argument :attr:`diagonal` controls which diagonal to consider:
    
    - If :attr:`diagonal` = 0, it is the main diagonal.
    - If :attr:`diagonal` > 0, it is above the main diagonal.
    - If :attr:`diagonal` < 0, it is below the main diagonal.
    
    Args:
        input (Tensor): the input tensor.
        diagonal (int, optional): the diagonal to consider
    
    Keyword args:
        out (Tensor, optional): the output tensor.
    
    .. seealso::
    
            :func:`torch.diagonal` always returns the diagonal of its input.
    
            :func:`torch.diagflat` always constructs a tensor with diagonal elements
            specified by the input.
    
    Examples:
    
    Get the square matrix where the input vector is the diagonal::
    
        >>> a = torch.randn(3)
        >>> a
        tensor([ 0.5950,-0.0872, 2.3298])
        >>> torch.diag(a)
        tensor([[ 0.5950, 0.0000, 0.0000],
                [ 0.0000,-0.0872, 0.0000],
                [ 0.0000, 0.0000, 2.3298]])
        >>> torch.diag(a, 1)
        tensor([[ 0.0000, 0.5950, 0.0000, 0.0000],
                [ 0.0000, 0.0000,-0.0872, 0.0000],
                [ 0.0000, 0.0000, 0.0000, 2.3298],
                [ 0.0000, 0.0000, 0.0000, 0.0000]])
    
    Get the k-th diagonal of a given matrix::
    
        >>> a = torch.randn(3, 3)
        >>> a
        tensor([[-0.4264, 0.0255,-0.1064],
                [ 0.8795,-0.2429, 0.1374],
                [ 0.1029,-0.6482,-1.6300]])
        >>> torch.diag(a, 0)
        tensor([-0.4264,-0.2429,-1.6300])
        >>> torch.diag(a, 1)
        tensor([ 0.0255, 0.1374])

triu 上三角

help(torch.triu)

Help on built-in function triu:

triu(...)
    triu(input, diagonal=0, *, out=None) -> Tensor
    
    Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices
    :attr:`input`, the other elements of the result tensor :attr:`out` are set to 0.
    
    The upper triangular part of the matrix is defined as the elements on and
    above the diagonal.
    
    The argument :attr:`diagonal` controls which diagonal to consider. If
    :attr:`diagonal` = 0, all elements on and above the main diagonal are
    retained. A positive value excludes just as many diagonals above the main
    diagonal, and similarly a negative value includes just as many diagonals below
    the main diagonal. The main diagonal are the set of indices
    :math:`\lbrace (i, i) \rbrace` for :math:`i \in [0, \min\{d_{1}, d_{2}\} - 1]` where
    :math:`d_{1}, d_{2}` are the dimensions of the matrix.
    
    Args:
        input (Tensor): the input tensor.
        diagonal (int, optional): the diagonal to consider
    
    Keyword args:
        out (Tensor, optional): the output tensor.
    
    Example::
    
        >>> a = torch.randn(3, 3)
        >>> a
        tensor([[ 0.2309,  0.5207,  2.0049],
                [ 0.2072, -1.0680,  0.6602],
                [ 0.3480, -0.5211, -0.4573]])
        >>> torch.triu(a)
        tensor([[ 0.2309,  0.5207,  2.0049],
                [ 0.0000, -1.0680,  0.6602],
                [ 0.0000,  0.0000, -0.4573]])
        >>> torch.triu(a, diagonal=1)
        tensor([[ 0.0000,  0.5207,  2.0049],
                [ 0.0000,  0.0000,  0.6602],
                [ 0.0000,  0.0000,  0.0000]])
        >>> torch.triu(a, diagonal=-1)
        tensor([[ 0.2309,  0.5207,  2.0049],
                [ 0.2072, -1.0680,  0.6602],
                [ 0.0000, -0.5211, -0.4573]])
    
        >>> b = torch.randn(4, 6)
        >>> b
        tensor([[ 0.5876, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
                [-0.2447,  0.9556, -1.2919,  1.3378, -0.1768, -1.0857],
                [ 0.4333,  0.3146,  0.6576, -1.0432,  0.9348, -0.4410],
                [-0.9888,  1.0679, -1.3337, -1.6556,  0.4798,  0.2830]])
        >>> torch.triu(b, diagonal=1)
        tensor([[ 0.0000, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
                [ 0.0000,  0.0000, -1.2919,  1.3378, -0.1768, -1.0857],
                [ 0.0000,  0.0000,  0.0000, -1.0432,  0.9348, -0.4410],
                [ 0.0000,  0.0000,  0.0000,  0.0000,  0.4798,  0.2830]])
        >>> torch.triu(b, diagonal=-1)
        tensor([[ 0.5876, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
                [-0.2447,  0.9556, -1.2919,  1.3378, -0.1768, -1.0857],
                [ 0.0000,  0.3146,  0.6576, -1.0432,  0.9348, -0.4410],
                [ 0.0000,  0.0000, -1.3337, -1.6556,  0.4798,  0.2830]])

tril 下三角

help(torch.tril)

Help on built-in function tril:

tril(...)
    tril(input, diagonal=0, *, out=None) -> Tensor
    
    Returns the lower triangular part of the matrix (2-D tensor) or batch of matrices
    :attr:`input`, the other elements of the result tensor :attr:`out` are set to 0.
    
    The lower triangular part of the matrix is defined as the elements on and
    below the diagonal.
    
    The argument :attr:`diagonal` controls which diagonal to consider. If
    :attr:`diagonal` = 0, all elements on and below the main diagonal are
    retained. A positive value includes just as many diagonals above the main
    diagonal, and similarly a negative value excludes just as many diagonals below
    the main diagonal. The main diagonal are the set of indices
    :math:`\lbrace (i, i) \rbrace` for :math:`i \in [0, \min\{d_{1}, d_{2}\} - 1]` where
    :math:`d_{1}, d_{2}` are the dimensions of the matrix.
    
    Args:
        input (Tensor): the input tensor.
        diagonal (int, optional): the diagonal to consider
    
    Keyword args:
        out (Tensor, optional): the output tensor.
    
    Example::
    
        >>> a = torch.randn(3, 3)
        >>> a
        tensor([[-1.0813, -0.8619,  0.7105],
                [ 0.0935,  0.1380,  2.2112],
                [-0.3409, -0.9828,  0.0289]])
        >>> torch.tril(a)
        tensor([[-1.0813,  0.0000,  0.0000],
                [ 0.0935,  0.1380,  0.0000],
                [-0.3409, -0.9828,  0.0289]])
    
        >>> b = torch.randn(4, 6)
        >>> b
        tensor([[ 1.2219,  0.5653, -0.2521, -0.2345,  1.2544,  0.3461],
                [ 0.4785, -0.4477,  0.6049,  0.6368,  0.8775,  0.7145],
                [ 1.1502,  3.2716, -1.1243, -0.5413,  0.3615,  0.6864],
                [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024,  0.0978]])
        >>> torch.tril(b, diagonal=1)
        tensor([[ 1.2219,  0.5653,  0.0000,  0.0000,  0.0000,  0.0000],
                [ 0.4785, -0.4477,  0.6049,  0.0000,  0.0000,  0.0000],
                [ 1.1502,  3.2716, -1.1243, -0.5413,  0.0000,  0.0000],
                [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024,  0.0000]])
        >>> torch.tril(b, diagonal=-1)
        tensor([[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
                [ 0.4785,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
                [ 1.1502,  3.2716,  0.0000,  0.0000,  0.0000,  0.0000],
                [-0.0614, -0.7344, -1.3164,  0.0000,  0.0000,  0.0000]])

廣播機制

x=torch.arange(1,3).view(1,2);x

tensor([[1, 2]])

y=torch.arange(1,4).view(3,1);y

tensor([[1],
        [2],
        [3]])

x+y

tensor([[2, 3],
        [3, 4],
        [4, 5]])

運算的記憶體開銷

索引，view是不會開闢新記憶體，而y=x+y這樣的運算是會新開記憶體，然後將y指向新記憶體。

x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
y=y+x
id(y)==id_before

False

如果我們想指定結果到原來y的記憶體，可以使用索引來進行替換操作。

x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
y[:]=y+x
id_before==id(y)

True

我們還可以使用運算子全名函式的out引數或者自加符號（也即add_)：

x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
torch.add(x,y,out=y)
id(y)==id_before

True

y+=x
id(y)==id_before

True

y.add_(x)
id(y)==id_before

True

y.requires_grad

False

自動求梯度

Pytorch提供的autograd包能根據輸入和前向傳播過程自動構建計算圖，並執行反向傳播。

如果將Tensor類的屬性.require_grad設定為True，它將追蹤在其上的所有操作（這樣就可以利用鏈式法則進行梯度傳播了）。完成計算後，可以呼叫.backward()來完成所有梯度計算。此tensor的梯度將累積到.grad屬性中。

注意在y.backward()時，如果y是標量，則不需要backward()傳入任何引數，否則，需要傳入一個與y同形的tensor，則此時y.backward(w)的含義是：先計算L=torch.sum(y*w)，則L是個標量，然後求L對自變數x的導數。

如果不想要被繼續追蹤，可以呼叫.detach()可將其從追蹤記錄中分離出來，這樣就可以防止將來的計算被追蹤，這樣梯度就傳不過去了。此外，還可以用with torch.no_grad()將不想被追蹤的操作程式碼塊包裹起來，這種方法在評價模型的時候很常用，因為在評估模型時，我們並不需要計算可訓練引數（requires_grad=True)的梯度。

Function是另外一個很重要的類。Tensor和Function互相結合就可以構建一個記錄有整個計算過程的有向無環圖（DAG）。每個tensor都有一個.grad_fn屬性，該屬性即建立該Tensor的Function，也就是說該tensor是不是通過某些運算得到的，若是，則grad_fn1返回一個與這些運算相關的物件，否則是None.

x=torch.ones(2,2,requires_grad=True)
print(x)
print(x.grad_fn)
print(x.grad) # 未計算則為None
print(x.dtype)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None
None
torch.float32

y=x+2
print(y)
print(y.grad_fn)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x000001BA1B94F860>

注意x是直接建立的，所以沒有grad_fn，而y是通過一個加法操作建立的，所以它有grad_fn。像x這種直接建立的稱為葉子節點，葉子節點對應的grad_fn是None.

z=y*y*3
out=z.mean()
print(z,out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)

通過.requires_grad_()來用in-place的方式改變requires_grad屬性：

a=torch.randn(2,2)
a=((a*3)/(a-1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b=(a*a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x000001BA1B92FBA8>

梯度

因為out是一個標量，所以呼叫backward()時不需要指定求導變數：

out

tensor(27., grad_fn=<MeanBackward0>)

out.backward()

print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

令out為o，因為：

\[o=1/4 \sum_{i=1}^{4}3(x_i+2)^2 \]

所以：

\[\frac {\partial o } {\partial x_i }|_{x_i=1}=9/2=4.5 \]

量為向量的函式對於向量的梯度就是一個雅可比矩陣J，而torch.autograd這個包就是用來計算一些雅可比矩陣的乘積的，例如，如果v是已給標量函式的 $$ l=g( y^{\rightarrow} ) $$ 的梯度：

\[v=( \frac {\partial l} {y_1} ... \frac {\partial l} {y_m}) \]

根據鏈式法則，我們有l關於 $$ x^{\rightarrow} $$ 的雅可比矩陣

\[VJ= (\frac {\partial l} {x_1} ... \frac {\partial l} {x_m} ) \]

注意：grad 在反向傳播過程中是累加的，這意味著每一次執行反向傳播，梯度都會累加之前的梯度，所以一般在反向傳播之前需要把梯度清零。

out2=x.sum();out2

tensor(4., grad_fn=<SumBackward0>)

out2.backward()
print(x.grad)

tensor([[5.5000, 5.5000],
        [5.5000, 5.5000]])

out3=x.sum()
x.grad.data.zero_()
out3.backward()
print(x.grad)

tensor([[1., 1.],
        [1., 1.]])

小練習：

a=torch.tensor([1,2,3],requires_grad=True,dtype=torch.float32)

print(a.grad)

None

b=a**2;b

tensor([1., 4., 9.], grad_fn=<PowBackward0>)

b.requires_grad

True

w=torch.tensor([0.1,0.2,0.3])

b.backward(w)

print(a.grad)

tensor([0.2000, 0.8000, 1.8000])

d=b.sum();d

tensor(14., grad_fn=<SumBackward0>)

d.requires_grad

True

d.backward()

RuntimeError Traceback (most recent call last)
in
----> 1 d.backward()

E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):

E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\autograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable.execution_engine.run_backward(
148 tensors, grad_tensors, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

d=2*x
for i in range(11):
    d.backward(retain_graph=True)
    print(x.grad)

tensor(4.)
tensor(6.)
tensor(8.)
tensor(10.)
tensor(12.)
tensor(14.)
tensor(16.)
tensor(18.)
tensor(20.)
tensor(22.)
tensor(24.)


```
d=2*x
for i in range(11):
    d.backward()
    print(x.grad)
```
tensor(26.)

RuntimeError Traceback (most recent call last)
in
1 d=2*x
2 for i in range(11):
----> 3 d.backward()
4 print(x.grad)

c=a.sum();c

tensor(6., grad_fn=<SumBackward0>)

c.backward()

a.grad

tensor([1.2000, 1.8000, 2.8000])

a.grad.data.zero_()

tensor([0., 0., 0.])

c=a.sum()
c.backward()
print(a.grad)

tensor([1., 1., 1.])

torch.arange(0,9).view(3,3)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])

torch.arange(0,9).view(3,3).sum()

tensor(36)

更實際的例子

x=torch.tensor([1.0,2.0,3.0,4.0],requires_grad=True) #注意賦值時候是1.0，而不是1,2,3，否則dtype不是torch.float

x.dtype

torch.float32

y=2*x
z=y.view(2,2)
print(z)
v=torch.tensor([[1.0,0.1],[0.01,0.001]],dtype=torch.float)
z.backward(v)
print(x.grad)

tensor([[2., 4.],
        [6., 8.]], grad_fn=<ViewBackward>)
tensor([2.0000, 0.2000, 0.0200, 0.0020])

中斷梯度追蹤的例子

x=torch.tensor(1.0,requires_grad=True)
y1=x**2
with torch.no_grad():
    y2=x**3
y3=y1+y2

print(x.requires_grad)
print(y1,y1.requires_grad)
print(y2,y2.requires_grad)
print(y3,y3.requires_grad)

True
tensor(1., grad_fn=<PowBackward0>) True
tensor(1.) False
tensor(2., grad_fn=<AddBackward0>) True

y3.backward()
print(x.grad)

tensor(2.)

\[ y_3=y_1+y_2=x^2+x^3 $$ ，當x=1時， $$ \frac {d y_3} {dx} $$ 不應該是5麼？實際上，由於 y2的定義被`torch.no_grad()`包裹，所以與y2有關的梯度是不會回傳的，只有y1有關的梯度才會回傳。上面提到，`y2.requires_grad=False`，所以不能呼叫`y2.backward()`，會報錯： y2.backward() --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-131-8061dc2a05a4> in <module> ----> 1 y2.backward() E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs) 253 create_graph=create_graph, 254 inputs=inputs) --> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) 256 257 def register_hook(self, hook): E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs) 147 Variable._execution_engine.run_backward( 148 tensors, grad_tensors_, retain_graph, create_graph, inputs, --> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag 150 151 RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn 此外,若我們要修改`tensor`的數值,但又不希望被`autograd`記錄(即不影響反向傳播),那麼就可以對`tensor.data`操作. ```python x=torch.ones(1,requires_grad=True) print(x.data) # 還是一個tensor print(x.data.requires_grad) #但已經獨立於計算圖之外 y=2*x x.data*=100 #僅僅改變了值,不會記錄在計算圖,所以不會影響梯度傳播 y.backward() print(x) print(x.grad) ``` tensor([1.]) False tensor([100.], requires_grad=True) tensor([2.]) ### 注意reshape的使用考慮 $$ y=\sum_{i=1}^{n} {x_i} \]

example 1:

x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True)

y=x.sum()
print(y)
y.backward()
print(x.grad)

tensor(15., grad_fn=<SumBackward0>)
tensor([[1., 1., 1., 1., 1.]])

example 2:故意多一個步驟,讓輸入變下形狀

x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True).reshape(-1,1);x

tensor([[1.],
        [2.],
        [3.],
        [4.],
        [5.]], grad_fn=<ViewBackward>)

y=x.sum()
y.backward()
print(x.grad)

None


E:\software\Anaconda\envs\pytorch_env\lib\site-packages\ipykernel\__main__.py:3: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.
  app.launch_new_instance()

如果初始時就使用reshape,那麼被求導的變數實際是reshape之前的tensor,而非x,但被要求求導的物件沒有變數名,所以不能使用.grad，正確的方法：

x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True)
print(x)
z=x.reshape(-1,1)
print(z)
y=z.sum()
y.backward()
x.grad

tensor([[1., 2., 3., 4., 5.]], requires_grad=True)
tensor([[1.],
        [2.],
        [3.],
        [4.],
        [5.]], grad_fn=<ViewBackward>)





tensor([[1., 1., 1., 1., 1.]])

Pytorch之Tensor學習

Pytorch之Tensor學習

初始化Tensor

Tensor的屬性

Tensor運算

與numpy 陣列的相互轉換

Tensor to Numpy array

Numpy array to Tensor

View()

線性代數

廣播機制

運算的記憶體開銷

自動求梯度

梯度

Pytorch之Tensor學習

Pytorch之Tensor和Numpy之間的轉換的實現方法

Pytorch 之修改Tensor部分值方式

[課堂筆記][pytorch學習][1]pytorch的tensor是什麼？如何使用cuda？和簡單的神經網路實現

（轉）pytorch入門之Tensor

PyTorch中 tensor.detach() 和 tensor.data 的區別詳解

pytorch之新增BN的實現

pytorch之inception_v3的實現案例

pytorch之ImageFolder使用詳解

pytorch 改變tensor尺寸的實現

pytorch 獲取tensor維度資訊示例

對Pytorch中Tensor的各種池化操作解析

Pytorch之view及view_as使用詳解

Pytorch之parameters的使用

Pytorch之Variable的用法

Pytorch之contiguous的用法

pytorch中tensor張量資料型別的轉化方式

Pytorch之卷積層的使用詳解

Pytorch之儲存讀取模型例項

pytorch 實現tensor與numpy陣列轉換

Pytorch之Tensor學習

Pytorch之Tensor學習

初始化Tensor

Tensor的屬性

Tensor運算

與numpy 陣列的相互轉換

Tensor to Numpy array

Numpy array to Tensor

View()

線性代數

廣播機制

運算的記憶體開銷

自動求梯度

梯度

相關推薦