Pytorch之Tensor學習
Pytorch之Tensor學習
Tensors
是與陣列和矩陣類似的資料結構,比如它與numpy 的ndarray類似,但tensors
可以在GPU上執行。實際上,tensors
和numpy陣列經常共用記憶體,消除了拷貝資料的需要。Tensors
被優化的可以自動求微分。
import torch
import numpy as np
初始化Tensor
- 直接從資料
data=[[1,2],[3,4]]
x_data=torch.tensor(data)
x_data
tensor([[1, 2],
[3, 4]])
- 從numpy陣列
np_array=np.array(data)
x_np=torch.tensor(np_array)
x_np
tensor([[1, 2],
[3, 4]], dtype=torch.int32)
x_np=torch.from_numpy(np_array)
x_np
tensor([[1, 2],
[3, 4]], dtype=torch.int32)
- 從另一個tensor
新tensor與引數tensor相比,保留了其特性(shape,datatype)等,除非顯式的替換:
x_ones=torch.ones_like(x_data);x_ones
tensor([[1, 1], [1, 1]])
x_rand=torch.rand_like(x_data,dtype=torch.float);x_rand
tensor([[0.1462, 0.1567],
[0.6331, 0.8472]])
- 隨機或者恆定值
shape
是tensor維度的元組
shape=(2,3)
rand_tensor=torch.rand(shape)
ones_tensor=torch.ones(shape)
zeros_tensor=torch.zeros(shape)
print(rand_tensor)
print(ones_tensor)
print(zeros_tensor)
tensor([[0.4811, 0.5744, 0.8909], [0.6602, 0.9882, 0.1145]]) tensor([[1., 1., 1.], [1., 1., 1.]]) tensor([[0., 0., 0.], [0., 0., 0.]])
Tensor的屬性
Tensor
屬性為shape
,datatype
,被儲存在的裝置,device
tensor=torch.rand(3,4)
tensor.shape
torch.Size([3, 4])
tensor.dtype
torch.float32
tensor.device
device(type='cpu')
Tensor運算
超過100個tensor運算,包括算術,線性代數,矩陣操作(轉置,索引,切片),取樣等。每個運算都可以在GPU上進行(常常比在CPU上更快)
預設地,tensors在CPU上被建立。我們需要顯式的通過.to
方法來將它移動到GPU上。在裝置間拷貝大型tensor對於時間和開銷都是高昂的。
if torch.cuda.is_available():
tensor=tensor.to('cuda')
類似numpy的索引和切片:
tensor=torch.ones((4,4));tensor
tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
tensor[0]
tensor([1., 1., 1., 1.])
tensor[:,0]
tensor([1., 1., 1., 1.])
tensor[...,-1]=100;tensor
tensor([[ 1., 1., 1., 100.],
[ 1., 1., 1., 100.],
[ 1., 1., 1., 100.],
[ 1., 1., 1., 100.]])
tensor[:,1]=10;tensor
tensor([[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.]])
除了常用的索引選擇資料,PyTorch還提供了一些高階的選擇函式:
help(torch.index_select)
Help on built-in function index_select:
index_select(...)
index_select(input, dim, index, *, out=None) -> Tensor
Returns a new tensor which indexes the :attr:`input` tensor along dimension
:attr:`dim` using the entries in :attr:`index` which is a `LongTensor`.
The returned tensor has the same number of dimensions as the original tensor
(:attr:`input`). The :attr:`dim`\ th dimension has the same size as the length
of :attr:`index`; other dimensions have the same size as in the original tensor.
.. note:: The returned tensor does **not** use the same storage as the original
tensor. If :attr:`out` has a different shape than expected, we
silently change it to the correct shape, reallocating the underlying
storage if necessary.
Args:
input (Tensor): the input tensor.
dim (int): the dimension in which we index
index (IntTensor or LongTensor): the 1-D tensor containing the indices to index
Keyword args:
out (Tensor, optional): the output tensor.
Example::
>>> x = torch.randn(3, 4)
>>> x
tensor([[ 0.1427, 0.0231, -0.5414, -1.0009],
[-0.4664, 0.2647, -0.1228, -1.1068],
[-1.1734, -0.6571, 0.7230, -0.6004]])
>>> indices = torch.tensor([0, 2])
>>> torch.index_select(x, 0, indices)
tensor([[ 0.1427, 0.0231, -0.5414, -1.0009],
[-1.1734, -0.6571, 0.7230, -0.6004]])
>>> torch.index_select(x, 1, indices)
tensor([[ 0.1427, -0.5414],
[-0.4664, -0.1228],
[-1.1734, 0.7230]])
help(torch.masked_select)
Help on built-in function masked_select:
masked_select(...)
masked_select(input, mask, *, out=None) -> Tensor
Returns a new 1-D tensor which indexes the :attr:`input` tensor according to
the boolean mask :attr:`mask` which is a `BoolTensor`.
The shapes of the :attr:`mask` tensor and the :attr:`input` tensor don't need
to match, but they must be :ref:`broadcastable <broadcasting-semantics>`.
.. note:: The returned tensor does **not** use the same storage
as the original tensor
Args:
input (Tensor): the input tensor.
mask (BoolTensor): the tensor containing the binary mask to index with
Keyword args:
out (Tensor, optional): the output tensor.
Example::
>>> x = torch.randn(3, 4)
>>> x
tensor([[ 0.3552, -2.3825, -0.8297, 0.3477],
[-1.2035, 1.2252, 0.5002, 0.6248],
[ 0.1307, -2.0608, 0.1244, 2.0139]])
>>> mask = x.ge(0.5)
>>> mask
tensor([[False, False, False, False],
[False, True, True, True],
[False, False, False, True]])
>>> torch.masked_select(x, mask)
tensor([ 1.2252, 0.5002, 0.6248, 2.0139])
help(torch.gather)
Help on built-in function gather:
gather(...)
gather(input, dim, index, *, sparse_grad=False, out=None) -> Tensor
Gathers values along an axis specified by `dim`.
For a 3-D tensor the output is specified by::
out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0
out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1
out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2
:attr:`input` and :attr:`index` must have the same number of dimensions.
It is also required that ``index.size(d) <= input.size(d)`` for all
dimensions ``d != dim``. :attr:`out` will have the same shape as :attr:`index`.
Note that ``input`` and ``index`` do not broadcast against each other.
Args:
input (Tensor): the source tensor
dim (int): the axis along which to index
index (LongTensor): the indices of elements to gather
Keyword arguments:
sparse_grad (bool, optional): If ``True``, gradient w.r.t. :attr:`input` will be a sparse tensor.
out (Tensor, optional): the destination tensor
Example::
>>> t = torch.tensor([[1, 2], [3, 4]])
>>> torch.gather(t, 1, torch.tensor([[0, 0], [1, 0]]))
tensor([[ 1, 1],
[ 4, 3]])
可以用torch.cat
來合併tensor,沿著某個方向,另外還有torch.stack
,這稍微與torch.cat
有些不一樣。
t1=torch.cat([tensor,tensor,tensor],dim=1);t1
tensor([[ 1., 10., 1., 100., 1., 10., 1., 100., 1., 10., 1., 100.],
[ 1., 10., 1., 100., 1., 10., 1., 100., 1., 10., 1., 100.],
[ 1., 10., 1., 100., 1., 10., 1., 100., 1., 10., 1., 100.],
[ 1., 10., 1., 100., 1., 10., 1., 100., 1., 10., 1., 100.]])
torch.cat([tensor,tensor,tensor],dim=0)
tensor([[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.]])
cat
和stack
的區別在於前者會再增加現有維度的值,可以理解為續接
,後者會增加一個維度,可以理解為疊加。
a=torch.arange(0,12).reshape(3,4)
a
tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
torch.cat([a,a]).shape
torch.Size([6, 4])
torch.stack([a,a]).shape
torch.Size([2, 3, 4])
torch.cat([a,a])
tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
torch.stack([a,a])
tensor([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
- 算術運算
tensor=torch.arange(0,9).reshape(3,3);tensor
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
以下計算了tensor之間的矩陣乘法,y1,y2的值相同
[email protected]
y1
tensor([[ 5, 14, 23],
[ 14, 50, 86],
[ 23, 86, 149]])
y2=tensor.matmul(tensor.T)
y2
tensor([[ 5, 14, 23],
[ 14, 50, 86],
[ 23, 86, 149]])
y3=torch.empty(3,3)
torch.add(tensor,tensor.T,out=y3)
print(y3)
tensor([[ 0., 4., 8.],
[ 4., 8., 12.],
[ 8., 12., 16.]])
單元素tensor,比如通過aggregate所有值得到一個值,那麼就可以通過item()
來得到Python的數值。
agg=tensor.sum();agg
tensor(36)
agg_item=agg.item();agg_item
36
在位操作,那些把結果儲存在運算數的運算被稱為在位操作,可以用_
來標識。比如x.copy_(y)
,x.t_()
將會改變x
。
tensor
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
tensor.add_(5)
tensor([[ 5, 6, 7],
[ 8, 9, 10],
[11, 12, 13]])
tensor
tensor([[ 5, 6, 7],
[ 8, 9, 10],
[11, 12, 13]])
在位運算可能會省儲存空間,但當計算導數的時候,會出錯,因此不建議使用。
與numpy 陣列的相互轉換
使用numpy()
和from_numpy()
將tensor和numpy陣列相互轉換。但需要注意的是:這兩個函式所產生的tensor
和Numpy的陣列共享相同的記憶體(所以它們之間的轉換很快),改變其中一個就改變了另一個!
Tensor to Numpy array
t=torch.ones(5)
t
tensor([1., 1., 1., 1., 1.])
n=t.numpy();n
array([ 1., 1., 1., 1., 1.], dtype=float32)
t.add_(1)
tensor([2., 2., 2., 2., 2.])
Numpy array to Tensor
n=np.ones(5)
t=torch.from_numpy(n)
t
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
np.add(n,1,out=n)
array([ 2., 2., 2., 2., 2.])
t
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n
array([ 2., 2., 2., 2., 2.])
此外,除了上面的方法,還有一個常用的方法就算直接使用torch.tensor()
將numpy陣列轉換為tensor,需要注意的的是該方法總是會進行資料拷貝,返回的tensor和原來的資料不再共享記憶體。
a=np.arange(9).reshape(3,3)
c=torch.tensor(a)
a+=1
print(c)
print(a)
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]], dtype=torch.int32)
[[1 2 3]
[4 5 6]
[7 8 9]]
View()
用view()
來改變tensor
的形狀,該方法返回的新tensor與源tensor共享記憶體(其實是同一個tensor),也即更改其中的一個,另外一個也會跟著改變。具有相同功能的reshape
,也不能保證返回的是其拷貝。
x=torch.randn(5,3);x
tensor([[-0.5722, -0.4844, 1.5515],
[-0.2504, 0.2010, 0.0182],
[ 0.0400, 0.0397, 2.0167],
[ 1.8868, -0.4670, 0.5968],
[ 0.9070, 0.5825, -1.0549]])
y=x.view(15);y
tensor([-0.5722, -0.4844, 1.5515, -0.2504, 0.2010, 0.0182, 0.0400, 0.0397,
2.0167, 1.8868, -0.4670, 0.5968, 0.9070, 0.5825, -1.0549])
y[0]=100
x
tensor([[ 1.0000e+02, -4.8445e-01, 1.5515e+00],
[-2.5042e-01, 2.0102e-01, 1.8231e-02],
[ 3.9969e-02, 3.9711e-02, 2.0167e+00],
[ 1.8868e+00, -4.6697e-01, 5.9683e-01],
[ 9.0702e-01, 5.8254e-01, -1.0549e+00]])
z=x.view(-1,5);z
tensor([[ 1.0000e+02, -4.8445e-01, 1.5515e+00, -2.5042e-01, 2.0102e-01],
[ 1.8231e-02, 3.9969e-02, 3.9711e-02, 2.0167e+00, 1.8868e+00],
[-4.6697e-01, 5.9683e-01, 9.0702e-01, 5.8254e-01, -1.0549e+00]])
q=x.reshape(15);q
tensor([ 1.0000e+02, -4.8445e-01, 1.5515e+00, -2.5042e-01, 2.0102e-01,
1.8231e-02, 3.9969e-02, 3.9711e-02, 2.0167e+00, 1.8868e+00,
-4.6697e-01, 5.9683e-01, 9.0702e-01, 5.8254e-01, -1.0549e+00])
q[0]=250;x
tensor([[ 2.5000e+02, -4.8445e-01, 1.5515e+00],
[-2.5042e-01, 2.0102e-01, 1.8231e-02],
[ 3.9969e-02, 3.9711e-02, 2.0167e+00],
[ 1.8868e+00, -4.6697e-01, 5.9683e-01],
[ 9.0702e-01, 5.8254e-01, -1.0549e+00]])
如果我們想要返回一個真正新的副本(即不共享記憶體),可以先用clone
創造一個副本,再用view
x_cp=x.clone().view(15)
x-=1
print(x)
print(x_cp)
tensor([[ 2.4900e+02, -1.4844e+00, 5.5149e-01],
[-1.2504e+00, -7.9898e-01, -9.8177e-01],
[-9.6003e-01, -9.6029e-01, 1.0167e+00],
[ 8.8677e-01, -1.4670e+00, -4.0317e-01],
[-9.2979e-02, -4.1746e-01, -2.0549e+00]])
tensor([ 2.5000e+02, -4.8445e-01, 1.5515e+00, -2.5042e-01, 2.0102e-01,
1.8231e-02, 3.9969e-02, 3.9711e-02, 2.0167e+00, 1.8868e+00,
-4.6697e-01, 5.9683e-01, 9.0702e-01, 5.8254e-01, -1.0549e+00])
使用clone
還有一個好處就是會記錄在計算圖中,即梯度回傳到副本時也會傳到源tensor
.
另外一個常用的函式就是item()
,它可以將一個標量tensor
轉換為python number
x=torch.randn(1);x
tensor([-0.9871])
x.item()
-0.9870905876159668
線性代數
- 跡:torch.trace
help(torch.trace)
Help on built-in function trace:
trace(...)
trace(input) -> Tensor
Returns the sum of the elements of the diagonal of the input 2-D matrix.
Example::
>>> x = torch.arange(1., 10.).view(3, 3)
>>> x
tensor([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]])
>>> torch.trace(x)
tensor(15.)
- 對角線元素:torch.diag
help(torch.diag)
Help on built-in function diag:
diag(...)
diag(input, diagonal=0, *, out=None) -> Tensor
- If :attr:`input` is a vector (1-D tensor), then returns a 2-D square tensor
with the elements of :attr:`input` as the diagonal.
- If :attr:`input` is a matrix (2-D tensor), then returns a 1-D tensor with
the diagonal elements of :attr:`input`.
The argument :attr:`diagonal` controls which diagonal to consider:
- If :attr:`diagonal` = 0, it is the main diagonal.
- If :attr:`diagonal` > 0, it is above the main diagonal.
- If :attr:`diagonal` < 0, it is below the main diagonal.
Args:
input (Tensor): the input tensor.
diagonal (int, optional): the diagonal to consider
Keyword args:
out (Tensor, optional): the output tensor.
.. seealso::
:func:`torch.diagonal` always returns the diagonal of its input.
:func:`torch.diagflat` always constructs a tensor with diagonal elements
specified by the input.
Examples:
Get the square matrix where the input vector is the diagonal::
>>> a = torch.randn(3)
>>> a
tensor([ 0.5950,-0.0872, 2.3298])
>>> torch.diag(a)
tensor([[ 0.5950, 0.0000, 0.0000],
[ 0.0000,-0.0872, 0.0000],
[ 0.0000, 0.0000, 2.3298]])
>>> torch.diag(a, 1)
tensor([[ 0.0000, 0.5950, 0.0000, 0.0000],
[ 0.0000, 0.0000,-0.0872, 0.0000],
[ 0.0000, 0.0000, 0.0000, 2.3298],
[ 0.0000, 0.0000, 0.0000, 0.0000]])
Get the k-th diagonal of a given matrix::
>>> a = torch.randn(3, 3)
>>> a
tensor([[-0.4264, 0.0255,-0.1064],
[ 0.8795,-0.2429, 0.1374],
[ 0.1029,-0.6482,-1.6300]])
>>> torch.diag(a, 0)
tensor([-0.4264,-0.2429,-1.6300])
>>> torch.diag(a, 1)
tensor([ 0.0255, 0.1374])
- triu 上三角
help(torch.triu)
Help on built-in function triu:
triu(...)
triu(input, diagonal=0, *, out=None) -> Tensor
Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices
:attr:`input`, the other elements of the result tensor :attr:`out` are set to 0.
The upper triangular part of the matrix is defined as the elements on and
above the diagonal.
The argument :attr:`diagonal` controls which diagonal to consider. If
:attr:`diagonal` = 0, all elements on and above the main diagonal are
retained. A positive value excludes just as many diagonals above the main
diagonal, and similarly a negative value includes just as many diagonals below
the main diagonal. The main diagonal are the set of indices
:math:`\lbrace (i, i) \rbrace` for :math:`i \in [0, \min\{d_{1}, d_{2}\} - 1]` where
:math:`d_{1}, d_{2}` are the dimensions of the matrix.
Args:
input (Tensor): the input tensor.
diagonal (int, optional): the diagonal to consider
Keyword args:
out (Tensor, optional): the output tensor.
Example::
>>> a = torch.randn(3, 3)
>>> a
tensor([[ 0.2309, 0.5207, 2.0049],
[ 0.2072, -1.0680, 0.6602],
[ 0.3480, -0.5211, -0.4573]])
>>> torch.triu(a)
tensor([[ 0.2309, 0.5207, 2.0049],
[ 0.0000, -1.0680, 0.6602],
[ 0.0000, 0.0000, -0.4573]])
>>> torch.triu(a, diagonal=1)
tensor([[ 0.0000, 0.5207, 2.0049],
[ 0.0000, 0.0000, 0.6602],
[ 0.0000, 0.0000, 0.0000]])
>>> torch.triu(a, diagonal=-1)
tensor([[ 0.2309, 0.5207, 2.0049],
[ 0.2072, -1.0680, 0.6602],
[ 0.0000, -0.5211, -0.4573]])
>>> b = torch.randn(4, 6)
>>> b
tensor([[ 0.5876, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235],
[-0.2447, 0.9556, -1.2919, 1.3378, -0.1768, -1.0857],
[ 0.4333, 0.3146, 0.6576, -1.0432, 0.9348, -0.4410],
[-0.9888, 1.0679, -1.3337, -1.6556, 0.4798, 0.2830]])
>>> torch.triu(b, diagonal=1)
tensor([[ 0.0000, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235],
[ 0.0000, 0.0000, -1.2919, 1.3378, -0.1768, -1.0857],
[ 0.0000, 0.0000, 0.0000, -1.0432, 0.9348, -0.4410],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.4798, 0.2830]])
>>> torch.triu(b, diagonal=-1)
tensor([[ 0.5876, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235],
[-0.2447, 0.9556, -1.2919, 1.3378, -0.1768, -1.0857],
[ 0.0000, 0.3146, 0.6576, -1.0432, 0.9348, -0.4410],
[ 0.0000, 0.0000, -1.3337, -1.6556, 0.4798, 0.2830]])
- tril 下三角
help(torch.tril)
Help on built-in function tril:
tril(...)
tril(input, diagonal=0, *, out=None) -> Tensor
Returns the lower triangular part of the matrix (2-D tensor) or batch of matrices
:attr:`input`, the other elements of the result tensor :attr:`out` are set to 0.
The lower triangular part of the matrix is defined as the elements on and
below the diagonal.
The argument :attr:`diagonal` controls which diagonal to consider. If
:attr:`diagonal` = 0, all elements on and below the main diagonal are
retained. A positive value includes just as many diagonals above the main
diagonal, and similarly a negative value excludes just as many diagonals below
the main diagonal. The main diagonal are the set of indices
:math:`\lbrace (i, i) \rbrace` for :math:`i \in [0, \min\{d_{1}, d_{2}\} - 1]` where
:math:`d_{1}, d_{2}` are the dimensions of the matrix.
Args:
input (Tensor): the input tensor.
diagonal (int, optional): the diagonal to consider
Keyword args:
out (Tensor, optional): the output tensor.
Example::
>>> a = torch.randn(3, 3)
>>> a
tensor([[-1.0813, -0.8619, 0.7105],
[ 0.0935, 0.1380, 2.2112],
[-0.3409, -0.9828, 0.0289]])
>>> torch.tril(a)
tensor([[-1.0813, 0.0000, 0.0000],
[ 0.0935, 0.1380, 0.0000],
[-0.3409, -0.9828, 0.0289]])
>>> b = torch.randn(4, 6)
>>> b
tensor([[ 1.2219, 0.5653, -0.2521, -0.2345, 1.2544, 0.3461],
[ 0.4785, -0.4477, 0.6049, 0.6368, 0.8775, 0.7145],
[ 1.1502, 3.2716, -1.1243, -0.5413, 0.3615, 0.6864],
[-0.0614, -0.7344, -1.3164, -0.7648, -1.4024, 0.0978]])
>>> torch.tril(b, diagonal=1)
tensor([[ 1.2219, 0.5653, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.4785, -0.4477, 0.6049, 0.0000, 0.0000, 0.0000],
[ 1.1502, 3.2716, -1.1243, -0.5413, 0.0000, 0.0000],
[-0.0614, -0.7344, -1.3164, -0.7648, -1.4024, 0.0000]])
>>> torch.tril(b, diagonal=-1)
tensor([[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.4785, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 1.1502, 3.2716, 0.0000, 0.0000, 0.0000, 0.0000],
[-0.0614, -0.7344, -1.3164, 0.0000, 0.0000, 0.0000]])
廣播機制
x=torch.arange(1,3).view(1,2);x
tensor([[1, 2]])
y=torch.arange(1,4).view(3,1);y
tensor([[1],
[2],
[3]])
x+y
tensor([[2, 3],
[3, 4],
[4, 5]])
運算的記憶體開銷
索引,view
是不會開闢新記憶體,而y=x+y
這樣的運算是會新開記憶體,然後將y
指向新記憶體。
x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
y=y+x
id(y)==id_before
False
如果我們想指定結果到原來y
的記憶體,可以使用索引來進行替換操作。
x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
y[:]=y+x
id_before==id(y)
True
我們還可以使用運算子全名函式的out
引數或者自加符號(也即add_):
x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
torch.add(x,y,out=y)
id(y)==id_before
True
y+=x
id(y)==id_before
True
y.add_(x)
id(y)==id_before
True
y.requires_grad
False
自動求梯度
Pytorch提供的autograd
包能根據輸入和前向傳播過程自動構建計算圖,並執行反向傳播。
如果將Tensor
類的屬性.require_grad
設定為True
,它將追蹤在其上的所有操作(這樣就可以利用鏈式法則進行梯度傳播了)。完成計算後,可以呼叫.backward()
來完成所有梯度計算。此tensor
的梯度將累積到.grad
屬性中。
注意在y.backward()
時,如果y
是標量,則不需要backward()
傳入任何引數,否則,需要傳入一個與y
同形的tensor
,則此時y.backward(w)
的含義是:先計算L=torch.sum(y*w)
,則L
是個標量,然後求L
對自變數x
的導數。
如果不想要被繼續追蹤,可以呼叫.detach()
可將其從追蹤記錄中分離出來,這樣就可以防止將來的計算被追蹤,這樣梯度就傳不過去了。此外,還可以用with torch.no_grad()
將不想被追蹤的操作程式碼塊包裹起來,這種方法在評價模型的時候很常用,因為在評估模型時,我們並不需要計算可訓練引數(requires_grad=True
)的梯度。
Function
是另外一個很重要的類。Tensor
和Function
互相結合就可以構建一個記錄有整個計算過程的有向無環圖(DAG)。每個tensor
都有一個.grad_fn
屬性,該屬性即建立該Tensor
的Function
,也就是說該tensor
是不是通過某些運算得到的,若是,則grad_fn1
返回一個與這些運算相關的物件,否則是None.
x=torch.ones(2,2,requires_grad=True)
print(x)
print(x.grad_fn)
print(x.grad) # 未計算則為None
print(x.dtype)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
None
None
torch.float32
y=x+2
print(y)
print(y.grad_fn)
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x000001BA1B94F860>
注意x是直接建立的,所以沒有grad_fn
,而y是通過一個加法操作建立的,所以它有grad_fn
。像x這種直接建立的稱為葉子節點,葉子節點對應的grad_fn
是None
.
z=y*y*3
out=z.mean()
print(z,out)
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)
通過.requires_grad_()
來用in-place的方式改變requires_grad
屬性:
a=torch.randn(2,2)
a=((a*3)/(a-1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b=(a*a).sum()
print(b.grad_fn)
False
True
<SumBackward0 object at 0x000001BA1B92FBA8>
梯度
因為out
是一個標量,所以呼叫backward()
時不需要指定求導變數:
out
tensor(27., grad_fn=<MeanBackward0>)
out.backward()
print(x.grad)
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
x
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
令out
為o,因為:
所以:
\[\frac {\partial o } {\partial x_i }|_{x_i=1}=9/2=4.5 \]量為向量的函式對於向量的梯度就是一個雅可比矩陣J,而torch.autograd
這個包就是用來計算一些雅可比矩陣的乘積的,例如,如果v是已給標量函式的 $$ l=g( y^{\rightarrow} ) $$ 的梯度:
根據鏈式法則,我們有l關於 $$ x^{\rightarrow} $$ 的雅可比矩陣
\[VJ= (\frac {\partial l} {x_1} ... \frac {\partial l} {x_m} ) \]注意:grad 在反向傳播過程中是累加的,這意味著每一次執行反向傳播,梯度都會累加之前的梯度,所以一般在反向傳播之前需要把梯度清零。
out2=x.sum();out2
tensor(4., grad_fn=<SumBackward0>)
out2.backward()
print(x.grad)
tensor([[5.5000, 5.5000],
[5.5000, 5.5000]])
out3=x.sum()
x.grad.data.zero_()
out3.backward()
print(x.grad)
tensor([[1., 1.],
[1., 1.]])
小練習:
a=torch.tensor([1,2,3],requires_grad=True,dtype=torch.float32)
print(a.grad)
None
b=a**2;b
tensor([1., 4., 9.], grad_fn=<PowBackward0>)
b.requires_grad
True
w=torch.tensor([0.1,0.2,0.3])
b.backward(w)
print(a.grad)
tensor([0.2000, 0.8000, 1.8000])
d=b.sum();d
tensor(14., grad_fn=<SumBackward0>)
d.requires_grad
True
d.backward()
RuntimeError Traceback (most recent call last)
----> 1 d.backward()
E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):
E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\autograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable.execution_engine.run_backward(
148 tensors, grad_tensors, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
d=2*x
for i in range(11):
d.backward(retain_graph=True)
print(x.grad)
tensor(4.)
tensor(6.)
tensor(8.)
tensor(10.)
tensor(12.)
tensor(14.)
tensor(16.)
tensor(18.)
tensor(20.)
tensor(22.)
tensor(24.)
```
d=2*x
for i in range(11):
d.backward()
print(x.grad)
```
tensor(26.)
RuntimeError Traceback (most recent call last)
1 d=2*x
2 for i in range(11):
----> 3 d.backward()
4 print(x.grad)
E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):
E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\autograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable.execution_engine.run_backward(
148 tensors, grad_tensors, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
c=a.sum();c
tensor(6., grad_fn=<SumBackward0>)
c.backward()
a.grad
tensor([1.2000, 1.8000, 2.8000])
a.grad.data.zero_()
tensor([0., 0., 0.])
c=a.sum()
c.backward()
print(a.grad)
tensor([1., 1., 1.])
torch.arange(0,9).view(3,3)
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
torch.arange(0,9).view(3,3).sum()
tensor(36)
- 更實際的例子
x=torch.tensor([1.0,2.0,3.0,4.0],requires_grad=True) #注意賦值時候是1.0,而不是1,2,3,否則dtype不是torch.float
x.dtype
torch.float32
y=2*x
z=y.view(2,2)
print(z)
v=torch.tensor([[1.0,0.1],[0.01,0.001]],dtype=torch.float)
z.backward(v)
print(x.grad)
tensor([[2., 4.],
[6., 8.]], grad_fn=<ViewBackward>)
tensor([2.0000, 0.2000, 0.0200, 0.0020])
- 中斷梯度追蹤的例子
x=torch.tensor(1.0,requires_grad=True)
y1=x**2
with torch.no_grad():
y2=x**3
y3=y1+y2
print(x.requires_grad)
print(y1,y1.requires_grad)
print(y2,y2.requires_grad)
print(y3,y3.requires_grad)
True
tensor(1., grad_fn=<PowBackward0>) True
tensor(1.) False
tensor(2., grad_fn=<AddBackward0>) True
y3.backward()
print(x.grad)
tensor(2.)
\[ y_3=y_1+y_2=x^2+x^3 $$ ,當x=1時, $$ \frac {d y_3} {dx} $$ 不應該是5麼?實際上,由於 y2的定義被`torch.no_grad()`包裹,所以與y2有關的梯度是不會回傳的,只有y1有關的梯度才會回傳。
上面提到,`y2.requires_grad=False`,所以不能呼叫`y2.backward()`,會報錯:
y2.backward()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-131-8061dc2a05a4> in <module>
----> 1 y2.backward()
E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):
E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable._execution_engine.run_backward(
148 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
此外,若我們要修改`tensor`的數值,但又不希望被`autograd`記錄(即不影響反向傳播),那麼就可以對`tensor.data`操作.
```python
x=torch.ones(1,requires_grad=True)
print(x.data) # 還是一個tensor
print(x.data.requires_grad) #但已經獨立於計算圖之外
y=2*x
x.data*=100 #僅僅改變了值,不會記錄在計算圖,所以不會影響梯度傳播
y.backward()
print(x)
print(x.grad)
```
tensor([1.])
False
tensor([100.], requires_grad=True)
tensor([2.])
### 注意reshape的使用
考慮 $$ y=\sum_{i=1}^{n} {x_i} \]example 1:
x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True)
y=x.sum()
print(y)
y.backward()
print(x.grad)
tensor(15., grad_fn=<SumBackward0>)
tensor([[1., 1., 1., 1., 1.]])
example 2:故意多一個步驟,讓輸入變下形狀
x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True).reshape(-1,1);x
tensor([[1.],
[2.],
[3.],
[4.],
[5.]], grad_fn=<ViewBackward>)
y=x.sum()
y.backward()
print(x.grad)
None
E:\software\Anaconda\envs\pytorch_env\lib\site-packages\ipykernel\__main__.py:3: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.
app.launch_new_instance()
如果初始時就使用reshape,那麼被求導的變數實際是reshape之前的tensor,而非x,但被要求求導的物件沒有變數名,所以不能使用.grad
,正確的方法:
x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True)
print(x)
z=x.reshape(-1,1)
print(z)
y=z.sum()
y.backward()
x.grad
tensor([[1., 2., 3., 4., 5.]], requires_grad=True)
tensor([[1.],
[2.],
[3.],
[4.],
[5.]], grad_fn=<ViewBackward>)
tensor([[1., 1., 1., 1., 1.]])