lua，torch，nn模組入門筆記

阿新 • • 發佈：2019-01-11

最近看到好多論文的神經網路都是用lua基於torch實現的，於是迫不得已學學lua和torch以及nn模組，才能看懂人家的程式碼。

首先看教程和文件：

nn模組

簡單操作

nn.Narrow()

這是nn標準庫中的提取矩陣中對應子塊的函式，官方定義如下：
module = nn.Narrow(dimension, offset, length)
用法如下：

> x = torch.rand(4, 5)
> x
 0.3695  0.2017  0.4485  0.4638  0.0513
 0.9222  0.1877  0.3388  0.6265  0.5659 

 0.8785  0.7394  0.8265  0.9212  0.0129
 0.2290  0.7971  0.2113  0.1097  0.3166
[torch.DoubleTensor of size 4x5]

> nn.Narrow(1, 2, 3):forward(x)
 0.9222  0.1877  0.3388  0.6265  0.5659
 0.8785  0.7394  0.8265  0.9212  0.0129
 0.2290  0.7971  0.2113  0.1097  0.3166
[torch.DoubleTensor of size 3x5]

> nn.Narrow(1, 2, -1):forward(x)
 0.9222 
  0.1877  0.3388  0.6265  0.5659
 0.8785  0.7394  0.8265  0.9212  0.0129
 0.2290  0.7971  0.2113  0.1097  0.3166
[torch.DoubleTensor of size 3x5]

> nn.Narrow(1, 2, 2):forward(x)
 0.9222  0.1877  0.3388  0.6265  0.5659
 0.8785  0.7394  0.8265  0.9212  0.0129
[torch.DoubleTensor of size 2x5]

> nn.Narrow(1, 2, -2):forward(x)
 0.9222 
  0.1877  0.3388  0.6265  0.5659
 0.8785  0.7394  0.8265  0.9212  0.0129
[torch.DoubleTensor of size 2x5]

> nn.Narrow(2, 2, 3):forward(x)
 0.2017  0.4485  0.4638
 0.1877  0.3388  0.6265
 0.7394  0.8265  0.9212
 0.7971  0.2113  0.1097
[torch.DoubleTensor of size 4x3]

> nn.Narrow(2, 2, -2):forward(x)
 0.2017  0.4485  0.4638
 0.1877  0.3388  0.6265
 0.7394  0.8265  0.9212
 0.7971  0.2113  0.1097
[torch.DoubleTensor of size 4x3]

可以看到，dimension=1時為按行取(1為起始序號),dimension=2時為按列取。length為正數時表示取多少行或者列，為負數是表示取到倒數第幾行或者列，其中-1表示最後。

注意，在Lua中進行擷取或分片時，是包含開始和結束這兩個元素的。如上面所示，從第2行開始取是包含第二行的，取到-2行是包含倒數第二行的，這一點與python不同

nn.CMulTable()

這個函式的輸入為幾個向量的table（在lua裡所有列表、陣列、矩陣都是table），輸出這個table裡各個向量component-wise的乘積。用法如下：

ii = {torch.ones(5)*2, torch.ones(5)*3, torch.ones(5)*4}
m = nn.CMulTable()
=m:forward(ii)
 24
 24
 24
 24
 24
[torch.DoubleTensor of dimension 5]

nn.CAddTable()用於幾個矩陣相加，也同樣是component-wise的

nn.Identity()

這個函式建立一個輸入模組，什麼都不做，通常用在神經網路的輸入層。用法如下：

mlp = nn.Identity()
print(mlp:forward(torch.ones(5, 2)))

輸出：

 1  1
 1  1
 1  1
 1  1
 1  1
[torch.Tensor of dimension 5x2]

這個官網上的例子可能令人費解。再看一下一個LSTM中的例子：

local inputs = {}
table.insert(inputs, nn.Identity()())   -- network input
table.insert(inputs, nn.Identity()())   -- c at time t-1
table.insert(inputs, nn.Identity()())   -- h at time t-1
local input = inputs[1]
local prev_c = inputs[2]
local prev_h = inputs[3]

在這裡，用nn.Identity()作為輸入模組，存到inputs中，暫時為空。在程式中呼叫時，分別輸入3個向量，即對應到模組中的3個Indentity。如下中的forward.

th>LSTM = require'LSTM.lua'                                                                                [0.0224s]
th> layer = LSTM.create(3, 2)
                                             [0.0019s]
th> layer:forward({torch.randn(1,3), torch.randn(1,2), torch.randn(1,2)})
{
  1 : DoubleTensor - size: 1x2
  2 : DoubleTensor - size: 1x2
}                                                    
                                             [0.0005s]

nn.View()

module = nn.View(sizes)

這個函式使用指定的size建立了一個view(這是一個物件). 這個size引數既可以是一個數字，也可以是一個LongStorage(陣列樣子的物件). 它自己有個 setNumInputDims() 函式可以指定輸入的維數.如果使用longstorage,當其中一個維度引數為-1時就是使用minibatch. 另外一個方法resetSize(sizes) 可以在初始化之後重置view的輸入大小。

這個函式的大概作用就是把輸入變成指定大小的矩陣。我就是在一個conv網路中看到這個函式的：

-- input is batch_size x length x input_size
nn.View(1, -1, input_size):setNumInputDims(2)(input)

上面的程式碼把input變成了3維的，第一維是大小為1，第二維大小沒有指定，最後一維為input_size。推斷可知，第二維的大小為 batch_size * length。由於指定了輸入維數是2，可想而知batch_size為1
不是很明白？那看看官網的例子：

> x = torch.Tensor(4, 4)
> for i = 1, 4 do
>    for j = 1, 4 do
>       x[i][j] = (i-1)*4+j
>    end
> end
> print(x)

  1   2   3   4
  5   6   7   8
  9  10  11  12
 13  14  15  16
[torch.Tensor of dimension 4x4]

> print(nn.View(2, 8):forward(x))

  1   2   3   4   5   6   7   8
  9  10  11  12  13  14  15  16
[torch.DoubleTensor of dimension 2x8]

> print(nn.View(torch.LongStorage{8,2}):forward(x))

  1   2
  3   4
  5   6
  7   8
  9  10
 11  12
 13  14
 15  16
[torch.DoubleTensor of dimension 8x2]

nn.Squeeze()

module=nn.squeeze([dim, numInputDims])
or torch.squeeze([dim])

這個函式和torch.squeeze([dim])一樣，將高維陣列中單層的陣列去掉，壓縮陣列。如果給出了dim，那麼只會在dim維度上進行該操作。看一下官網的例子：

x=torch.rand(2,1,2,1,2)
> x
(1,1,1,.,.) =
  0.6020  0.8897

(2,1,1,.,.) =
  0.4713  0.2645

(1,1,2,.,.) =
  0.4441  0.9792

(2,1,2,.,.) =
  0.5467  0.8648
[torch.DoubleTensor of dimension 2x1x2x1x2]

這時，x 的形狀是這樣的：

+-------------------------------+
| +---------------------------+ |
| | +-----------------------+ | |
| | |   0.6020  0.8897      | | |
| | +-----------------------+ | |
| | +-----------------------+ | |
| | |   0.4441  0.9792      | | |
| | +-----------------------+ | |
| +---------------------------+ |
|                               |
| +---------------------------+ |
| | +-----------------------+ | |
| | |   0.4713  0.2645      | | |
| | +-----------------------+ | |
| | +-----------------------+ | |
| | |   0.5467  0.8648      | | |
| | +-----------------------+ | |
| +---------------------------+ |
+-------------------------------+

然後進行squeeze:

> torch.squeeze(x)
(1,.,.) =
  0.6020  0.8897
  0.4441  0.9792

(2,.,.) =
  0.4713  0.2645
  0.5467  0.8648
[torch.DoubleTensor of dimension 2x2x2]

這時，形狀變成了這樣的：

+-------------------------------+
|       0.6020  0.8897          |
|       0.4441  0.9792          |
+-------------------------------+
+-------------------------------+
|       0.4713  0.2645          |
|       0.5467  0.8648          |
+-------------------------------+

nn.JoinTable()

module = JoinTable(dimension, nInputDims)

這個函式建立了一個模組，將輸入的張量陣列在dimension維度上進行合併並輸出。可選的引數nInputDims表示接收的輸入的維數，這樣可以使得minibatch 或者非batch的輸入同樣可以通過這個模組。下面是dimension=1（按列）時的圖示：

+----------+             +-----------+
| {input1, +-------------> output[1] |
|          |           +-----------+-+
|  input2, +-----------> output[2] |
|          |         +-----------+-+
|  input3} +---------> output[3] |
+----------+         +-----------+

例子如下：

x = torch.randn(5, 1)
y = torch.randn(5, 1)
z = torch.randn(2, 1)

print(nn.JoinTable(1):forward{x, y})
print(nn.JoinTable(2):forward{x, y})
print(nn.JoinTable(1):forward{x, z})

>1.3965
 0.5146
-1.5244
-0.9540
 0.4256
 0.1575
 0.4491
 0.6580
 0.1784
-1.7362
[torch.DoubleTensor of dimension 10x1]

 1.3965  0.1575
 0.5146  0.4491
-1.5244  0.6580
-0.9540  0.1784
 0.4256 -1.7362
[torch.DoubleTensor of dimension 5x2]

 1.3965
 0.5146
-1.5244
-0.9540
 0.4256
-1.2660
 1.0869
[torch.Tensor of dimension 7x1]

nn.BatchNormalization()

module = nn.BatchNormalization(N [, eps] [, momentum] [,affine])

N是輸入的維數，eps是一個很小的值加在標準差上來避免分母為0，預設值是1e-5。affine是布林值，當為false時，仿射變換不可被學習，預設值為true。momentum 的預設值為0.1。

訓練時，這個層會動態計算它的均值和標準差，測試時這個均值和標準差用來歸一化。

歸一化之後的值為：

              x - mean(x)
y =  ----------------------------- * gamma + beta
      standard-deviation(x) + eps

nn.gModel()

nngraph(nn) 是一個基於有向無環圖的模組，所有的節點建立完後，需要使用nn.gModel()組成一個圖。

module=nn.gModule(input,output)

這裡的input 和output既可以是元素，也可以是列表。這個函式會生成一個從input到output的圖。其中此前的每一個模組後面加上該模組輸入，成為這個圖中的節點。
給出一個簡單的例子：

 x1 = nn.Identity()()
 x2 = nn.Identity()()
 a = nn.CAddTable()({x1, x2})
 m = nn.gModule({x1, x2}, {a})

這裡只是對兩個輸入進行了合併，然後進行輸出，一個有3個模組，對應3個圖節點，其中兩個是輸入節點。這樣會生成一個如下圖的計算圖，這樣在呼叫m的時候會根據這個圖進行計算，輸出a的值。

__|__   __|__
|    |  |    |
|____|  |____|
| x1    | x2
 \     /
  \z  /
  _\ /_
 |    |
 |____|
    |a

卷積

先說說時序卷積模組：
不算第一維batch維，時序卷積的輸入是一個2維的矩陣。第一維是輸入序列的數目(e.g.nInputFrame), 第二維是每個輸入序列包含的特徵維數 (e.g.inputFrameSize). 這個模組通常用來處理聲學訊號或者是單詞序列,例如用在Natural Language Processing.

nn.TemporalConvolution()

module = nn.TemporalConvolution(inputFrameSize, outputFrameSize, kW, [dW])

回到這個函式，這個函式對一個輸入序列進行一個維度上的卷積，這個輸入由nInputFrame個定長序列（Frame）組成。這個輸入向量在前饋過程中是一個二維的(nInputFrame* inputFrameSize)或者三維的(nBatchFrame*nInputFrame*inputFrameSize)張量。

引數的意義如下：

inputFrameSize: The input frame size expected in sequences given into forward().
outputFrameSize: The output frame size the convolution layer will produce.
kW: The kernel width of the convolution
dW: The step of the convolution. Default is 1.

注意：根據我們定的卷積核的大小，輸入序列中最後的一些Frame會不夠核的大小，這時我們需要進行一些補齊或者丟棄。

如果我們輸入的是一個二維的張量(nInputFrame*inputFrameSize) ，那麼經過卷積之後，輸出序列的大小為nOutputFrame *outputFrameSize，而nOutputFrame為

nOutputFrame = (nInputFrame - kW) / dW + 1

如果輸入序列是一個3維的張量nBatchFrame x nInputFrame x inputFrameSize, 那麼輸出的張量大小為 nBatchFrame x nOutputFrame x outputFrameSize.
下面是一個簡單的例子：

inp=5;  -- dimensionality of one sequence element
outp=1; -- number of derived features for one sequence element
kw=1;   -- kernel only operates on one sequence element per step
dw=1;   -- we step once and go on to the next sequence element

mlp=nn.TemporalConvolution(inp,outp,kw,dw)

x=torch.rand(7,inp) -- a sequence of 7 elements
print(mlp:forward(x))
>
-0.9109
-0.9872
-0.6808
-0.9403
-0.9680
-0.6901
-0.6387
[torch.Tensor of dimension 7x1]

nn.TemporalMaxPooling()

module = nn.TemporalMaxPooling(kW, [dW])

顧名思義，這個函式是進行maxpooling的，對輸入序列進行一個維度上的操作。每次pooling的區域大小為kW, 步長為dW。
如果我們輸入的是一個二維的張量(nInputFrame*inputFrameSize) ，那麼經過卷積之後，輸出序列的大小為nOutputFrame *inputFrameSize，而nOutputFrame為

nOutputFrame = (nInputFrame - kW) / dW + 1

nn.SpatialConvolution()

module = nn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW], [dH], [padW], [padH])
or  cudnn.SpatialConvolution(nInputPlane, nOutputPlane, width, height, [dW = 1], [dH = 1], [padW = 0], [padH = 0],[groups=1])

對多個輸入平面在空間上進行2維卷積，期待的出入大小為 (nInputPlane x height x width).
下面是各個引數：

nInputPlane: The number of expected input planes in the image given into forward().
nOutputPlane: The number of output planes the convolution layer will produce.
kW: The kernel width of the convolution
kH: The kernel height of the convolution
dW: The step of the convolution in the width dimension. Default is 1.
dH: The step of the convolution in the height dimension. Default is 1.
padW: The additional zeros added per width to the input planes. Default is 0, a good number is (kW-1)/2.
padH: The additional zeros added per height to the input planes. Default is padW, a good number is (kH-1)/2.

如果輸入是一個3維的張量(nInputPlane * height * width), 那麼輸出大小就是 nOutputPlane * oheight * owidth，其中

owidth  = floor((width  + 2*padW - kW) / dW + 1)
oheight = floor((height + 2*padH - kH) / dH + 1)

所以，這個和時序的卷積的區別就是，可以對多個平面進行卷積。通常為了加快速度，使用cudnn庫的卷積，因為可以GPU加速

同樣有cudnn.SpatialMaxPooling()，但是引數跟空間卷積一樣，作用和時序的maxpooling一樣，就不詳細說了。

nn.LookupTable()

module = nn.LookupTable(nIndex, size, [paddingValue], [maxNorm], [normType])

這個層的作用就是根據輸入的序號找都應的向量。當呼叫forward(input),它假設輸入是一個1維的或者2維的張量，裡面的元素都是序號。如果輸入是一個矩陣，那麼每一行被看作是給定batch的一個輸入樣本。序號從1開始，到nIndex。對每一個序號，輸出一個對應的大小為size的向量（矩陣）。

如果某些輸入跟其他輸入相比很頻繁的話，LookupTable 會很慢。這種情況在輸入補零時經常發生。在BP的時候，每個輸入會建立一個程序，這樣在生成n x size1 x size2 x … x sizeN形成瓶頸。n是一維輸入的維數。
看看例子：

-- a lookup table containing 10 tensors of size 3
 module = nn.LookupTable(10, 3)

 input = torch.Tensor{1,2,1,10}
 print(module:forward(input))
>
-1.4415 -0.1001 -0.1708
-0.6945 -0.4350  0.7977
-1.4415 -0.1001 -0.1708
-0.0745  1.9275  1.0915
[torch.DoubleTensor of dimension 4x3]

如果輸入是二維的話，每一個集合被看成一個batch：

-- a batch of 2 samples of 4 indices each
 input = torch.Tensor({{1,2,4,5},{4,3,2,10}})
 print(module:forward(input))
 >
 (1,.,.) =
 -0.0570 -1.5354  1.8555
 -0.9067  1.3392  0.6275
  1.9662  0.4645 -0.8111
  0.1103  1.7811  1.5969

(2,.,.) =
  1.9662  0.4645 -0.8111
  0.0026 -1.4547 -0.5154
 -0.9067  1.3392  0.6275
 -0.0193 -0.8641  0.7396
[torch.DoubleTensor of dimension 2x4x3]

Criterion

這個模組包含了各式各樣的訓練時的代價函式。最常用的是交叉熵，Negative log-likelihood 等。

nn.ClassNLLCriterion()

在分類問題中，這個Negative log-likelihood準則很常用。

criterion = nn.ClassNLLCriterion([weights])

通常這個函式用於n分類問題的log-softmax之後，即假定輸入都是log處理後的值。如果不想加一個log-softmax層，可以使用後面的CrossEntropyCriterion。如果指定了weight向量，則每個類別的值將乘以對應權重。代價函式可以描述為:

loss(x, class) = -x[class]

如果指定了權重，則代價函式為：

loss(x, class) = -weights[class] * x[class]

訓練一個簡單的模型

以這個準則為例，下面的程式碼告訴我們怎麼訓練一個mlp網路，並更新引數。輸入是x，輸出是pred，期望的輸出是y：

function gradUpdate(mlp, x, y, learningRate)
   local criterion = nn.ClassNLLCriterion()
   local pred = mlp:forward(x)
   local err = criterion:forward(pred, y)
   mlp:zeroGradParameters()
   local t = criterion:backward(pred, y)
   mlp:backward(x, t)
   mlp:updateParameters(learningRate)
end

通常，一個網路在torch中被稱為一個模組Module.裡面主要有4個函式：forward(),backward(),zeroGradParameters()和updateParameters()。這些函式的使用方法如下：

forward(input) 給定輸入，計算這個網路模組的輸出
backward(input, gradOutput)給定輸入和當前的引數，計算這個網路中引數的梯度.
zeroGradParameters()將網路模組中的引數的梯度置0
updateParameters(learningRate) 在執行完backward之後用它返回的梯度更新引數

總結上面的訓練過程，就是先進行mlp:forward()計算結果pred,然後使用criterion:forward(pred,y)計算誤差(這一步可以沒有),再使用criterion:backward(pred,y)計算輸出層梯度t，利用mlp:backward(x,t)計算整個網路的梯度，最後mlp:updateParameters()更新整個網路的引數~

**注意:在掉用backward()之前必須先呼叫forward()，否則計算出來的梯度是錯誤的！**

nn.CrossEntropyCriterion()

這個交叉熵準則結合了LogSoftMax 和ClassNLLCriterion.

criterion = nn.CrossEntropyCriterion([weights])

weights的作用與上面的ClassNLLCriterion 相同，不同的是代價函式：

loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))
               = -x[class] + log(\sum_j exp(x[j]))

lua，torch，nn模組入門筆記

nn模組

簡單操作

nn.Narrow()

nn.CMulTable()

nn.Identity()

nn.View()

nn.Squeeze()

nn.JoinTable()

nn.BatchNormalization()

nn.gModel()

卷積

nn.TemporalConvolution()

nn.TemporalMaxPooling()

nn.SpatialConvolution()

nn.LookupTable()

Criterion

nn.ClassNLLCriterion()

訓練一個簡單的模型

nn.CrossEntropyCriterion()

lua，torch，nn模組入門筆記

Python入門學習筆記————16(模組，包，名稱空間)

JS入門筆記+基本操作（剛接觸JS，很多筆記就是看到啥寫啥的，希望多多包涵）。

Oracle-4 - ：超級適合初學者的入門級筆記：plsql，基本語法，記錄類型，循環，遊標，異常處理，存儲過程，存儲函數，觸發器

Android圖片載入框架最全解析（六），探究Glide的自定義模組功能(筆記)

Python零基礎，從草根到大神的入門筆記，學習效果非常不錯！

Python入門學習筆記————09(元組，集合，字典)

騰訊十年Python開發經驗寫的Python入門筆記，是否對你有幫助？

torch和lua學習常見問題（重點是nn模組編譯和torch編譯）

Python入門筆記（2）_ 資料型別及取整運算 round，floor，ceil

ECMAScript 6 入門筆記（八）Proxy，Reflect

SQL入門筆記2——子查詢，JOIN,GROUP BY

學術寫作利器——LaTeX入門筆記整理（不定期更新，附加使用心得）

IPFS 怎麼挖礦，IPFS入門筆記

500多頁的機器學習入門筆記，下載超5萬次，都講了些什麼？

python3-tkinter模組錯誤筆記-text，entry等找不到相應方法

最近時間空閑，整理的一些Python入門級筆記分享給大家！

ETL-Kettle學習筆記（入門，簡介，簡單操作）

隨機生成數，摘自算法競賽入門經典P120-P123測試STL。

13.django入門01（django入門初探視圖，模板，路由）

lua，torch，nn模組入門筆記

nn模組

簡單操作

nn.Narrow()

nn.CMulTable()

nn.Identity()

nn.View()

nn.Squeeze()

nn.JoinTable()

nn.BatchNormalization()

nn.gModel()

卷積

nn.TemporalConvolution()

nn.TemporalMaxPooling()

nn.SpatialConvolution()

nn.LookupTable()

Criterion

nn.ClassNLLCriterion()

訓練一個簡單的模型

nn.CrossEntropyCriterion()

相關推薦