淺談pytorch中torch.max和F.softmax函式的維度解釋

阿新 • • 發佈：2020-06-29

在利用torch.max函式和F.Ssoftmax函式時，對應該設定什麼維度，總是有點懵，遂總結一下：

首先看看二維tensor的函式的例子：

import torch
import torch.nn.functional as F
 
input = torch.randn(3,4)
print(input)
tensor([[-0.5526,-0.0194,2.1469,-0.2567],[-0.3337,-0.9229,0.0376,-0.0801],[ 1.4721,0.1181,-2.6214,1.7721]])
 
b = F.softmax(input,dim=0) # 按列SoftMax,列和為1
print(b)
tensor([[0.1018,0.3918,0.8851,0.1021],[0.1268,0.1587,0.1074,0.1218],[0.7714,0.4495,0.0075,0.7762]])
 
c = F.softmax(input,dim=1)  # 按行SoftMax,行和為1
print(c)
tensor([[0.0529,0.0901,0.7860,0.0710],[0.2329,0.1292,0.3377,0.3002],[0.3810,0.0984,0.0064,0.5143]])
 
d = torch.max(input,dim=0)  # 按列取max,print(d)
torch.return_types.max(
values=tensor([1.4721,1.7721]),indices=tensor([2,2,2]))
 
e = torch.max(input,dim=1)  # 按行取max，
print(e)
torch.return_types.max(
values=tensor([2.1469,3]))

下面看看三維tensor解釋例子：

函式softmax輸出的是所給矩陣的概率分佈；

b輸出的是在dim=0維上的概率分佈，b[0][5][6]+b[1][5][6]+b[2][5][6]=1

a=torch.rand(3,16,20)
b=F.softmax(a,dim=0)
c=F.softmax(a,dim=1)
d=F.softmax(a,dim=2)
 
In [1]: import torch as t
In [2]: import torch.nn.functional as F
In [4]: a=t.Tensor(3,4,5)
In [5]: b=F.softmax(a,dim=0)
In [6]: c=F.softmax(a,dim=1)
In [7]: d=F.softmax(a,dim=2)
 
In [8]: a
Out[8]: 
tensor([[[-0.1581,0.0000,-0.0344],[ 0.0000,-0.0344,0.0000],[-0.0344,0.0000]],[[-0.0344,0.0000]]])
 
In [9]: b
Out[9]: 
 
tensor([[[0.3064,0.3333,0.3410,0.3333],[0.3333,0.3333]],[[0.3468,0.3295,0.3333]]])
 
 
In [10]: b.sum()
Out[10]: tensor(20.0000)
 
In [11]: b[0][0][0]+b[1][0][0]+b[2][0][0]
Out[11]: tensor(1.0000)
 
In [12]: c.sum()
Out[12]: tensor(15.)
 
In [13]: c
Out[13]: 
tensor([[[0.2235,0.2543,0.2521,0.2457],[0.2618,0.2457,0.2543],[0.2529,0.2436,0.2543]],[[0.2457,[0.2543,[0.2457,0.2543]]])
 
In [14]: n=t.rand(3,4)
 
In [15]: n
Out[15]: 
 
tensor([[0.2769,0.3475,0.8914,0.6845],[0.9251,0.3976,0.8690,0.4510],[0.8249,0.1157,0.3075,0.3799]])
 
In [16]: m=t.argmax(n,dim=0)
 
In [17]: m
Out[17]: tensor([1,1,0])
 
In [18]: p=t.argmax(n,dim=1)
 
In [19]: p
Out[19]: tensor([2,0])
 
In [20]: d.sum()
Out[20]: tensor(12.0000)
 
In [22]: d
Out[22]: 
 
tensor([[[0.1771,0.2075,0.2005],[0.2027,0.1959,0.2027,0.2027],[0.1972,0.2041,0.1972,0.1972],0.2027]],[[0.1972,0.2027]]])
 
In [23]: d[0][0].sum()
Out[23]: tensor(1.)

補充知識：多分類問題torch.nn.Softmax的使用

為什麼談論這個問題呢？是因為我在工作的過程中遇到了語義分割預測輸出特徵圖個數為16，也就是所謂的16分類問題。

因為每個通道的畫素的值的大小代表了畫素屬於該通道的類的大小，為了在一張圖上用不同的顏色顯示出來，我不得不學習了torch.nn.Softmax的使用。

首先看一個簡答的例子，倘若輸出為(3,4)，也就是3張4x4的特徵圖。

import torch
img = torch.rand((3,4))
print(img)

輸出為：

tensor([[[0.0413,0.8728,0.8926,0.0693],[0.4072,0.0302,0.9248,0.6676],[0.4699,0.9197,0.4809],[0.3877,0.7673,0.6132,0.5203]],[[0.4940,0.7996,0.5513,0.8016],[0.1157,0.8323,0.9944,0.2127],[0.3055,0.4343,0.8123,0.3184],[0.8246,0.6731,0.3229,0.1730]],[[0.0661,0.1905,0.4490,0.7484],[0.4013,0.1468,0.2145,0.8838],[0.0083,0.5029,0.0141,0.8998],[0.8673,0.2308,0.8808,0.0532]]])

我們可以看到共三張特徵圖，每張特徵圖上對應的值越大，說明屬於該特徵圖對應類的概率越大。

import torch.nn as nn
sogtmax = nn.Softmax(dim=0)
img = sogtmax(img)
print(img)

輸出為：

tensor([[[0.2780,0.4107,0.4251,0.1979],[0.3648,0.2297,0.3901,0.3477],[0.4035,0.4396,0.2993,0.2967],[0.2402,0.4008,0.3273,0.4285]],[[0.4371,0.3817,0.3022,0.4117],[0.2726,0.5122,0.4182,0.2206],[0.3423,0.2706,0.4832,0.2522],[0.3718,0.3648,0.2449,0.3028]],[[0.2849,0.2076,0.2728,0.3904],[0.3627,0.2581,0.1917,0.4317],0.2898,0.2175,0.4511],[0.3880,0.2344,0.4278,0.2686]]])

可以看到，上面的程式碼對每張特徵圖對應位置的畫素值進行Softmax函式處理，圖中標紅位置加和=1，同理，標藍位置加和=1。

我們看到Softmax函式會對原特徵圖每個畫素的值在對應維度(這裡dim=0，也就是第一維)上進行計算，將其處理到0～1之間，並且大小固定不變。

print(torch.max(img,0))

輸出為：

torch.return_types.max(
values=tensor([[0.4371,0.4285]]),indices=tensor([[1,1],[0,2],[2,0]]))

可以看到這裡3x4x4變成了1x4x4，而且對應位置上的值為畫素對應每個通道上的最大值，並且indices是對應的分類。

清楚理解了上面的流程，那麼我們就容易處理了。

看具體案例，這裡輸出output的大小為：16x416x416.

output = torch.tensor(output)
 
sm = nn.Softmax(dim=0)
output = sm(output)
 
mask = torch.max(output,0).indices.numpy()
 
# 因為要轉化為RGB彩色圖，所以增加一維
rgb_img = np.zeros((output.shape[1],output.shape[2],3))
for i in range(len(mask)):
  for j in range(len(mask[0])):
    if mask[i][j] == 0:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 255
      rgb_img[i][j][2] = 255
    if mask[i][j] == 1:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 180
      rgb_img[i][j][2] = 0
    if mask[i][j] == 2:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 180
      rgb_img[i][j][2] = 180
    if mask[i][j] == 3:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 180
      rgb_img[i][j][2] = 255
    if mask[i][j] == 4:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 255
      rgb_img[i][j][2] = 180
    if mask[i][j] == 5:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 255
      rgb_img[i][j][2] = 0
    if mask[i][j] == 6:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 0
      rgb_img[i][j][2] = 180
    if mask[i][j] == 7:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 0
      rgb_img[i][j][2] = 255
    if mask[i][j] == 8:
      rgb_img[i][j][0] = 255
      rgb_img[i][j][1] = 0
      rgb_img[i][j][2] = 0
    if mask[i][j] == 9:
      rgb_img[i][j][0] = 180
      rgb_img[i][j][1] = 0
      rgb_img[i][j][2] = 0
    if mask[i][j] == 10:
      rgb_img[i][j][0] = 180
      rgb_img[i][j][1] = 255
      rgb_img[i][j][2] = 255
    if mask[i][j] == 11:
      rgb_img[i][j][0] = 180
      rgb_img[i][j][1] = 0
      rgb_img[i][j][2] = 180
    if mask[i][j] == 12:
      rgb_img[i][j][0] = 180
      rgb_img[i][j][1] = 0
      rgb_img[i][j][2] = 255
    if mask[i][j] == 13:
      rgb_img[i][j][0] = 180
      rgb_img[i][j][1] = 255
      rgb_img[i][j][2] = 180
    if mask[i][j] == 14:
      rgb_img[i][j][0] = 0
      rgb_img[i][j][1] = 180
      rgb_img[i][j][2] = 255
    if mask[i][j] == 15:
      rgb_img[i][j][0] = 0
      rgb_img[i][j][1] = 0
      rgb_img[i][j][2] = 0
 
cv2.imwrite('output.jpg',rgb_img)

最後儲存得到的圖為：

淺談pytorch中torch.max和F.softmax函式的維度解釋