PyTorch中的Loss Fucntion

深度學習中的Loss Function有很多，常見的有L1、L2、HingeLoss、CrossEntropy，其最終目的就是計算預測的f(x) 與真值 y 之間的差別，而優化器的目的就是minimize這個差值，當loss的值穩定後，便是 f(x) 的引數W最優的時候。不同的Loss Function適用場景不同，各個深度學習框架實現大同小異，這裡用PyTorch來對常見的Loss Function進行闡述。這裡先構造一個預測值 y^ 和真值 y

Cross Entropy

Cross Entropy（也就是交叉熵）來自夏農的資訊理論，簡單來說，交叉熵是用來衡量在給定的真實分佈p

kpk下，使用非真實分佈qkqk所指定的策略 f(x)f(x) 消除系統的不確定性所需要付出的努力的大小。交叉熵的越低說明這個策略越好，我們總是minimize交叉熵，因為交叉熵越小，就證明演算法所產生的策略越接近最優策略，也就間接證明我們的演算法所計算出的非真實分佈越接近真實分佈。交叉熵損失函式從資訊理論的角度來說，其實來自於KL散度，只不過最後推導的新式等價於交叉熵的計算公式：

H(p,q)=−∑k=1N(pk∗logqk)H(p,q)=−∑k=1N(pk∗logqk)

最大似然估計、Negative Log Liklihood(NLL)、KL散度與Cross Entropy其實是等價的，都可以進行互相推導，當然MSE也可以用Cross Entropy進行對到出（詳見Deep Learning Book P132）。

Cross Entropy可以用於分類問題，也可以用於語義分割，對於分類問題，其輸出層通常為Sigmoid或者Softmax，當然也有可能直接輸出加權之後的，而pytorch中與Cross Entropy相關的loss Function包括：

CrossEntropyLoss: combines LogSoftMax and NLLLoss in one single class，也就是說我們的網路不需要在最後一層加任何輸出層，該loss Function為我們打包好了；
NLLLoss: 也就是negative log likelihood loss，如果需要得到log分佈，則需要在網路的最後一層加上LogSoftmax

NLLLoss2d: 二維的negative log likelihood loss，多用於分割問題
BCELoss: Binary Cross Entropy，常用於二分類問題，當然也可以用於多分類問題，通常需要在網路的最後一層新增sigmoid進行配合使用，其target也就是y值需要進行one hot編碼，另外BCELoss還可以用於Multi-label classification
BCEWithLogitsLoss: 把Sigmoid layer 和 the BCELoss整合到了一起
KLDivLoss: TODO
PoissonNLLLoss: TODO

下面就用PyTorch對上面的Loss Function進行說明

CrossEntropyLoss

pytorch中CrossEntropyLoss是通過兩個步驟計算出來的，第一步是計算log softmax，第二步是計算cross entropy（或者說是negative log likehood），CrossEntropyLoss不需要在網路的最後一層新增softmax和log層，直接輸出全連線層即可。而NLLLoss則需要在定義網路的時候在最後一層新增softmax和log層

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.autograd as autograd
import numpy as np

# 預測值f(x) 構造樣本，神經網路輸出層
inputs_tensor = torch.FloatTensor( [
 [10, 2, 1,-2,-3],
 [-1,-6,-0,-3,-5],
 [-5, 4, 8, 2, 1]
 ])

# 真值y
targets_tensor = torch.LongTensor([1,3,2])
# targets_tensor = torch.LongTensor([1])

inputs_variable = autograd.Variable(inputs_tensor, requires_grad=True) 
targets_variable = autograd.Variable(targets_tensor)
print('input tensor(nBatch x nClasses): {}'.format(inputs_tensor.shape))
print('target tensor shape: {}'.format(targets_tensor.shape))

input tensor(nBatch x nClasses): torch.Size([3, 5])
target tensor shape: torch.Size([3])

loss = nn.CrossEntropyLoss()
output = loss(inputs_variable, targets_variable)
# output.backward()
print('pytorch 內部實現的CrossEntropyLoss: {}'.format(output))

pytorch 內部實現的CrossEntropyLoss: Variable containing:
 3.7925
[torch.FloatTensor of size 1]

手動計算

1.log softmax

# 手動計算log softmax, 計算結果的值域是[0, 1]
softmax_result = F.softmax(inputs_variable) #.sum() #計算softmax
print(('softmax_result（sum=1）:{} \n'.format(softmax_result)))
logsoftmax_result = np.log(softmax_result.data)  # 計算log，以e為底, 計算後所有的值都小於0
print('手動計算 calculate logsoftmax_result:{} \n'.format(logsoftmax_result))

# 直接呼叫F.log_softmax
softmax_result = F.log_softmax(inputs_variable)
print('F.log_softmax calculate logsoftmax_result:{} \n'.format(logsoftmax_result))

softmax_result（sum=1）:Variable containing:
 9.9953e-01  3.3531e-04  1.2335e-04  6.1413e-06  2.2593e-06
 2.5782e-01  1.7372e-03  7.0083e-01  3.4892e-02  4.7221e-03
 2.2123e-06  1.7926e-02  9.7875e-01  2.4261e-03  8.9251e-04
[torch.FloatTensor of size 3x5]


手動計算 calculate logsoftmax_result:
-4.6717e-04 -8.0005e+00 -9.0005e+00 -1.2000e+01 -1.3000e+01
-1.3555e+00 -6.3555e+00 -3.5549e-01 -3.3555e+00 -5.3555e+00
-1.3021e+01 -4.0215e+00 -2.1476e-02 -6.0215e+00 -7.0215e+00
[torch.FloatTensor of size 3x5]


F.log_softmax calculate logsoftmax_result:
-4.6717e-04 -8.0005e+00 -9.0005e+00 -1.2000e+01 -1.3000e+01
-1.3555e+00 -6.3555e+00 -3.5549e-01 -3.3555e+00 -5.3555e+00
-1.3021e+01 -4.0215e+00 -2.1476e-02 -6.0215e+00 -7.0215e+00
[torch.FloatTensor of size 3x5]

2.手動計算loss

pytorch中NLLLoss定義如下：

loss(x,class)=−x[class]loss(x,class)=−x[class]

這裡為什麼可以這麼寫呢？下面用第三個樣本進行解釋

我們用one-hot編碼後，得到真實分佈概率的值px(orpmodel)為(這裡一共有5類)：[0,0,1,0,0]

而模型預測的每一類分佈概率，也就是非真實分佈的概率qx(orppred)為：[2.5782e-01 1.7372e-03 7.0083e-01 3.4892e-02 4.7221e-03] 注意：概率要求其結果為1，這裡使用的是softmax計算出來的結果，而不是log softmax

那麼根據Cross Entroy（交叉熵）: −∑Nk=1(pk∗logqk)

或者negative log likehood（最大似然）: −∑mi=1log(pmodel(yi∣xi;θ))

將對應專案相乘即可得到最終的loss結果：

0×log(2.5782⋅10−01)+0×log(1.7372⋅10−03)+0×lo

PyTorch中的Loss Fucntion

Cross Entropy

CrossEntropyLoss

PyTorch中的Loss Fucntion

pytorch中網路loss傳播和引數更新理解

Pytorch 中triplet loss的寫法

Pytorch中RoI pooling layer的幾種實現

Pytorch中的Batch Normalization操作

pytorch中的cat、stack、tranpose、permute、unsqeeze

pytorch中Linear類中weight的形狀問題原始碼探討

pytorch中required_grad和detach的澄清

pytorch 中的grid_sample和affine_grid

「Deep Learning」理解PyTorch中的「torchvision.transforms」

Pytorch中的VGG怎麼修改最後一層FC

[work] pytorch中的cat、stack、tranpose、permute、unsqeeze

pytorch 中的view和permute的用法

pytorch中的上取樣以及各種反操作，求逆操作

2018.11.14——pytorch中的grad_fn，requires_grad

pytorch中資料載入和處理例項

pytorch中的contiguous()

PyTorch中使用預訓練的模型初始化網路的一部分引數(增減網路層，修改某層引數等) 固定引數

pytorch中的 relu、sigmoid、tanh、softplus 函式

[PyTorch]PyTorch中模型的參數初始化的幾種方法（轉）

PyTorch中的Loss Fucntion

Cross Entropy

CrossEntropyLoss

相關推薦