PyTorch實現Pointer Networks

阿新 • • 發佈：2018-11-29

pytorch實現簡單的pointer networks

部分程式碼參照該GitHub以及該部落格。純屬個人模仿實驗。
- python3
- pytorch 0.4.0

Pointer Networks

Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output.——

[ Pointer Networks ]

個人理解Pointer Networks是attention的變體，attention是把encoder的所有輸出加權求和然後對映到輸出字典每個詞的概率，但這樣無法處理變長輸入的情況，最好我們希望將decoder某個時間步的輸出以及encoder所有輸出共同對映到輸入序列長度的概率分佈，這樣我們不光考慮了上下文（encoder所有輸出，類似attention），最主要的是我們得到了輸入序列相關位置的概率，即該模型充分考慮了輸入序列的位置資訊。確實，不同於句子，有些問題的輸入中每個元素之間可能是不相關的，傳統的seq2seq模型可能無法很好的解決。

資料格式

這裡我仿照寫了一段pointer networks的seq2seq模型，主要用來判斷一個序列數值大小起伏的邊界。邊界值有兩個，左邊一段元素都在1~5之間，中間一段元素值都在6~10之間，右邊一段元素值都在1~5之間，每段長度都在5~10之間，即兩個邊界點是不固定的。最大序列長度為30，不足用0填充，例如：

input	target
[1,1,5,4,1,6,9,10,8,6,3,2,1]	[5, 9]
[2,3,4,1,4,3,7,8,6,7,9,10,6,2,5,4,2,4,1]	[6, 12]

def generate_single_seq(length=30, min_len=5, max_len=10):
    seq_before = [(random.randint(1, 5)) for x in range(random.randint(min_len, max_len))]
    seq_during = [(random.randint(6, 10)) for x in range(random.randint(min_len, max_len))]
    seq_after = [random.randint(1, 5) for x in range(random.randint(min_len, max_len))]
    seq = seq_before + seq_during + seq_after
    seq = seq + ([0] * (length - len(seq)))
    return seq, len(seq_before), len(seq_before) + len(seq_during) - 1

seq2seq模型

這裡我將encoder和decoder寫在了一起，decoder採用GRUCell迴圈計算目標序列長度次，訓練時每次用target作為decoder的輸入，測試時則用預測值作為輸入。注意每次計算的output被對映到了輸入序列長的概率（B, L）。

class PtrNet(nn.Module):
    def __init__(self, input_dim, output_dim, embedding_dim, hidden_dim):
        super().__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim

        self.encoder_embedding = nn.Embedding(input_dim, embedding_dim)
        self.decoder_embedding = nn.Embedding(output_dim, embedding_dim)
        self.encoder = nn.GRU(embedding_dim, hidden_dim)
        self.decoder = nn.GRUCell(embedding_dim, hidden_dim)

        self.W1 = nn.Linear(hidden_dim, hidden_dim, bias=False)
        self.W2 = nn.Linear(hidden_dim, hidden_dim, bias=False)
        self.v = nn.Linear(hidden_dim, 1, bias=False)

    def forward(self, inputs, targets):
        batch_size = inputs.size(1)
        max_len = targets.size(0)
        # (L, B)
        embedded = self.encoder_embedding(inputs)
        targets = self.decoder_embedding(targets)
        # (L, B, E)
        encoder_outputs, hidden = self.encoder(embedded)
        # (L, B, H), (1, B, H)
        # initialize 
        decoder_outputs = torch.zeros((max_len, batch_size, self.output_dim)).to(device)
        decoder_input = torch.zeros((batch_size, self.embedding_dim)).to(device)
        hidden = hidden.squeeze(0) # (B, H)
        for i in range(max_len):
            hidden = self.decoder(decoder_input, hidden)
            # (B, H)
            projection1 = self.W1(encoder_outputs)
            # (L, B, H)
            projection2 = self.W2(hidden)
            # (B, H)
            output = F.log_softmax(self.v(F.relu(projection1 + projection2)).squeeze(-1).transpose(0, 1), -1)
            # (B, L)
            decoder_outputs[i] = output
            decoder_input = targets[i]

        return decoder_outputs


    def predict(self, inputs, max_trg_len):
        batch_size = inputs.size(1)
        # (L, B)
        embedded = self.encoder_embedding(inputs)
        # (L, B, E)
        encoder_outputs, hidden = self.encoder(embedded)
        # (L, B, H), (1, B, H)
        # initialize 
        decoder_outputs = torch.zeros(max_trg_len, batch_size, self.output_dim).to(device)
        decoder_input = torch.zeros((batch_size, self.embedding_dim)).to(device)
        hidden = hidden.squeeze(0) # (B, H)
        for i in range(max_trg_len):
            hidden = self.decoder(decoder_input, hidden)
            # (B, H)
            projection1 = self.W1(encoder_outputs)
            # (L, B, H)
            projection2 = self.W2(hidden)
            # (B, H)
            a = self.v(F.relu(projection1 + projection2))
            output = F.log_softmax(self.v(F.relu(projection1 + projection2)).squeeze(-1).transpose(0, 1), -1)
            decoder_outputs[i] = output
            _, indices = torch.max(output, 1)
            decoder_input = self.decoder_embedding(indices)

        return decoder_outputs

測試結果

我用的訓練集9000，測試集1000。我同時比較了基本的seq2seq加不加attention的效果，發現基本的seq2seq難以收斂，甚至要迭代100~300個epoch才能到達較高的準確率。而pointer networks能夠迅速收斂，loss甚至能降為0，只要迭代20個epoch，準確率可以達到100%。loss結果如下，可以看出在第一個epoch，loss迅速下降，這是最明顯的不同。

epoch: 0 | total loss: 86.9745
epoch: 1 | total loss: 0.3416
epoch: 2 | total loss: 0.0915
epoch: 3 | total loss: 0.0412
epoch: 4 | total loss: 0.0231
epoch: 5 | total loss: 0.0147
epoch: 6 | total loss: 0.0101
epoch: 7 | total loss: 0.0073
epoch: 8 | total loss: 0.0055
epoch: 9 | total loss: 0.0043
epoch: 10 | total loss: 0.0034
epoch: 11 | total loss: 0.0028
epoch: 12 | total loss: 0.0023
epoch: 13 | total loss: 0.0019
epoch: 14 | total loss: 0.0016
epoch: 15 | total loss: 0.0014
epoch: 16 | total loss: 0.0012
epoch: 17 | total loss: 0.0010
epoch: 18 | total loss: 0.0009
epoch: 19 | total loss: 0.0008
epoch: 20 | total loss: 0.0007

Acc: 100.00% (1000/1000)

最後是3中模型loss在100個epoch的比較結果：
這裡寫圖片描述
完整程式碼見這裡。

PyTorch實現Pointer Networks

pytorch實現簡單的pointer networks

Pointer Networks

資料格式

seq2seq模型

測試結果

PyTorch實現Pointer Networks

pytorch實現LBCNN：Local Binary Convolutional Neural Networks

pytorch實現 | Deformable Convolutional Networks | CVPR | 2017

“你什麽意思”之基於RNN的語義槽填充(Pytorch實現)

Pytorch實現的語義分割器

自然語言推斷(NLI)、文本相似度相關開源項目推薦(Pytorch 實現)

群等變網絡的pytorch實現

hourglass pytorch 實現

pytorch 實現變分自動編碼器

AlexNet 講解及pytorch實現 ----1 AlexNet主要技術突破點

pytorch實現神經網路

資源 | 注意遷移的PyTorch實現

用pytorch實現多層感知機（MLP)（全連線神經網路FC）分類MNIST手寫數字體的識別

Pytorch實現PointNet中的點雲分類網路。

基於Pytorch實現風格遷移（CS231n assignment3）

基於Pytorch實現網路視覺化（CS231n assignment3）

Pytorch實現DenseNet結構

pytorch實現seq2seq時如何對loss進行mask

pytorch實現self-attention機制，並可視化

ssd模演算法的pytorch實現與解讀

PyTorch實現Pointer Networks

pytorch實現簡單的pointer networks

Pointer Networks

資料格式

seq2seq模型

測試結果

相關推薦