pytorch版的bilstm+crf實現sequence label

阿新 • • 發佈：2019-02-07

在理解CRF的時候費了一些功夫，將一些難以理解的地方稍微做了下標註，隔三差五看看加強記憶, 程式碼是pytorch文件上的example

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.optim as optim

def to_scalar(var): #var是Variable,維度是１
    # returns a python float
    return var.view(-1).data.tolist()[0]

def argmax(vec):
    # return the argmax as a python int 

    _, idx = torch.max(vec, 1)
    return to_scalar(idx)

def prepare_sequence(seq, to_ix):
    idxs = [to_ix[w] for w in seq]
    tensor = torch.LongTensor(idxs)
    return autograd.Variable(tensor)

# Compute log sum exp in a numerically stable way for the forward algorithm
def log_sum_exp(vec): #vec是1*5, type是Variable 


    max_score = vec[0, argmax(vec)]
    #max_score維度是１，　max_score.view(1,-1)維度是１＊１，max_score.view(1, -1).expand(1, vec.size()[1])的維度是１＊５
    max_score_broadcast = max_score.view(1, -1).expand(1, vec.size()[1]) # vec.size()維度是1*5
    return max_score + torch.log(torch.sum(torch.exp(vec - max_score_broadcast)))#為什麼指數之後再求和，而後才log呢 


class BiLSTM_CRF(nn.Module):
    def __init__(self, vocab_size, tag_to_ix, embedding_dim, hidden_dim):
        super(BiLSTM_CRF, self).__init__()
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.vocab_size = vocab_size
        self.tag_to_ix = tag_to_ix
        self.tagset_size = len(tag_to_ix)

        self.word_embeds = nn.Embedding(vocab_size, embedding_dim)

        self.lstm = nn.LSTM(embedding_dim, hidden_dim // 2, num_layers=1, bidirectional=True)

        # Maps the output of the LSTM into tag space.
        self.hidden2tag = nn.Linear(hidden_dim, self.tagset_size)

        # Matrix of transition parameters.  Entry i,j is the score of
        # transitioning *to* i *from* j.
        self.transitions = nn.Parameter(torch.randn(self.tagset_size, self.tagset_size))

        # These two statements enforce the constraint that we never transfer
        # to the start tag and we never transfer from the stop tag
        self.transitions.data[tag_to_ix[START_TAG], :] = -10000
        self.transitions.data[:, tag_to_ix[STOP_TAG]] = -10000

        self.hidden = self.init_hidden()

    def init_hidden(self):
        return (autograd.Variable(torch.randn(2, 1, self.hidden_dim // 2)),
                autograd.Variable(torch.randn(2, 1, self.hidden_dim // 2)))
    #預測序列的得分
    def _forward_alg(self, feats):
        # Do the forward algorithm to compute the partition function
        init_alphas = torch.Tensor(1, self.tagset_size).fill_(-10000.)

        # START_TAG has all of the score.
        init_alphas[0][self.tag_to_ix[START_TAG]] = 0.

        # Wrap in a variable so that we will get automatic backprop
        forward_var = autograd.Variable(init_alphas) #初始狀態的forward_var，隨著step t變化

        # Iterate through the sentence
        for feat in feats: #feat的維度是５
            alphas_t = []  # The forward variables at this timestep
            for next_tag in range(self.tagset_size):
                # broadcast the emission score: it is the same regardless of
                # the previous tag
                emit_score = feat[next_tag].view(1, -1).expand(1, self.tagset_size) #維度是1*5

                # the ith entry of trans_score is the score of transitioning to
                # next_tag from i
                trans_score = self.transitions[next_tag].view(1, -1) #維度是１＊５
                # The ith entry of next_tag_var is the value for the
                # edge (i -> next_tag) before we do log-sum-exp
                #第一次迭代時理解：
                # trans_score所有其他標籤到Ｂ標籤的概率
                # 由lstm執行進入隱層再到輸出層得到標籤Ｂ的概率，emit_score維度是１＊５，5個值是相同的
                next_tag_var = forward_var + trans_score + emit_score
                # The forward variable for this tag is log-sum-exp of all the
                # scores.
                alphas_t.append(log_sum_exp(next_tag_var))

            forward_var = torch.cat(alphas_t).view(1, -1)#到第（t-1）step時５個標籤的各自分數
        terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]
        alpha = log_sum_exp(terminal_var)

        return alpha

    #得到feats
    def _get_lstm_features(self, sentence):
        self.hidden = self.init_hidden()
        #embeds = self.word_embeds(sentence).view(len(sentence), 1, -1)
        embeds = self.word_embeds(sentence)

        embeds = embeds.unsqueeze(1)

        lstm_out, self.hidden = self.lstm(embeds, self.hidden)
        lstm_out = lstm_out.view(len(sentence), self.hidden_dim)

        lstm_feats = self.hidden2tag(lstm_out)

        return lstm_feats

    #得到gold_seq tag的score
    def _score_sentence(self, feats, tags):
        # Gives the score of a provided tag sequence
        score = autograd.Variable(torch.Tensor([0]))
        tags = torch.cat([torch.LongTensor([self.tag_to_ix[START_TAG]]), tags]) #將START_TAG的標籤３拼接到tag序列上

        for i, feat in enumerate(feats):
            #self.transitions[tags[i + 1], tags[i]] 實際得到的是從標籤i到標籤i+1的轉移概率
            #feat[tags[i+1]], feat是step i 的輸出結果，有５個值，對應B, I, E, START_TAG, END_TAG, 取對應標籤的值

            score = score + self.transitions[tags[i + 1], tags[i]] + feat[tags[i + 1]]
        score = score + self.transitions[self.tag_to_ix[STOP_TAG], tags[-1]]
        return score
    #解碼，得到預測的序列，以及預測序列的得分
    def _viterbi_decode(self, feats):
        backpointers = []

        # Initialize the viterbi variables in log space
        init_vvars = torch.Tensor(1, self.tagset_size).fill_(-10000.)
        init_vvars[0][self.tag_to_ix[START_TAG]] = 0

        # forward_var at step i holds the viterbi variables for step i-1
        forward_var = autograd.Variable(init_vvars)
        for feat in feats:
            bptrs_t = []  # holds the backpointers for this step
            viterbivars_t = []  # holds the viterbi variables for this step

            for next_tag in range(self.tagset_size):
                # next_tag_var[i] holds the viterbi variable for tag i at the
                # previous step, plus the score of transitioning
                # from tag i to next_tag.
                # We don't include the emission scores here because the max
                # does not depend on them (we add them in below)
                next_tag_var = forward_var + self.transitions[next_tag] #其他標籤（B,I,E,Start,End）到標籤next_tag的概率
                best_tag_id = argmax(next_tag_var)
                bptrs_t.append(best_tag_id)
                viterbivars_t.append(next_tag_var[0][best_tag_id])
            # Now add in the emission scores, and assign forward_var to the set
            # of viterbi variables we just computed
            forward_var = (torch.cat(viterbivars_t) + feat).view(1, -1)#從step0到step(i-1)時5個序列中每個序列的最大score
            backpointers.append(bptrs_t) #bptrs_t有５個元素

        # Transition to STOP_TAG
        terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]#其他標籤到STOP_TAG的轉移概率
        best_tag_id = argmax(terminal_var)
        path_score = terminal_var[0][best_tag_id]

        # Follow the back pointers to decode the best path.
        best_path = [best_tag_id]
        for bptrs_t in reversed(backpointers):#從後向前走，找到一個best路徑
            best_tag_id = bptrs_t[best_tag_id]
            best_path.append(best_tag_id)
        # Pop off the start tag (we dont want to return that to the caller)
        start = best_path.pop()
        assert start == self.tag_to_ix[START_TAG]  # Sanity check
        best_path.reverse()# 把從後向前的路徑正過來
        return path_score, best_path

    def neg_log_likelihood(self, sentence, tags):
        feats = self._get_lstm_features(sentence)
        forward_score = self._forward_alg(feats)
        gold_score = self._score_sentence(feats, tags)

        return forward_score - gold_score

    def forward(self, sentence):  # dont confuse this with _forward_alg above.
        # Get the emission scores from the BiLSTM
        lstm_feats = self._get_lstm_features(sentence)

        # Find the best path, given the features.
        score, tag_seq = self._viterbi_decode(lstm_feats)
        return score, tag_seq

START_TAG = "<START>"
STOP_TAG = "<STOP>"
EMBEDDING_DIM = 5
HIDDEN_DIM = 4

# Make up some training data
training_data = [("the wall street journal reported today that apple corporation made money".split(), "B I I I O O O B I O O".split()),
                 ("georgia tech is a university in georgia".split(), "B I O O O O B".split())]

word_to_ix = {}
for sentence, tags in training_data:
    for word in sentence:
        if word not in word_to_ix:
            word_to_ix[word] = len(word_to_ix)

tag_to_ix = {"B": 0, "I": 1, "O": 2, START_TAG: 3, STOP_TAG: 4}

model = BiLSTM_CRF(len(word_to_ix), tag_to_ix, EMBEDDING_DIM, HIDDEN_DIM)
optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)

# Check predictions before training
# precheck_sent = prepare_sequence(training_data[0][0], word_to_ix)
# precheck_tags = torch.LongTensor([tag_to_ix[t] for t in training_data[0][1]])
# print(model(precheck_sent))

# Make sure prepare_sequence from earlier in the LSTM section is loaded
for epoch in range(1):  # again, normally you would NOT do 300 epochs, it is toy data
    for sentence, tags in training_data:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance
        model.zero_grad()

        # Step 2. Get our inputs ready for the network, that is,
        # turn them into Variables of word indices.
        sentence_in = prepare_sequence(sentence, word_to_ix)
        targets = torch.LongTensor([tag_to_ix[t] for t in tags])

        # Step 3. Run our forward pass.
        neg_log_likelihood = model.neg_log_likelihood(sentence_in, targets)

        # Step 4. Compute the loss, gradients, and update the parameters by
        # calling optimizer.step()
        neg_log_likelihood.backward()
        optimizer.step()

# Check predictions after training
precheck_sent = prepare_sequence(training_data[0][0], word_to_ix)
print(model(precheck_sent)[0]) #得分
print('^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^')
print(model(precheck_sent)[1]) #tag sequence

轉：pytorch版的bilstm+crf實現sequence label

csdn score training cab som super com loaded today http://blog.csdn.net/appleml/article/details/78664824 在理解CRF的時候費了一些功夫，將一些難以理解的地方稍微做了下標

pytorch版的bilstm+crf實現sequence label

在理解CRF的時候費了一些功夫，將一些難以理解的地方稍微做了下標註，隔三差五看看加強記憶, 程式碼是pytorch文件上的example import torch import torch.autograd as autograd import torch.

TensorFlow RNN深度學習 BiLSTM+CRF 實現 sequence labeling 序列標註

在TensorFlow RNN 深度學習下 BiLSTM+CRF 實現 sequence labeling 雙向LSTM+CRF 序列標註問題原始碼去年底樣子一直在做NLP相關task,是個關於序列標註問題。這 sequence labeling屬於NLP的經典問題了，開始嘗

BILSTM+CRF實現命名實體識別NER

#第一步：資料處理 #pikle是一個將任意複雜的物件轉成物件的文字或二進位制表示的過程。 #同樣，必須能夠將物件經過序列化後的形式恢復到原有的物件。 #在 Python 中，這種序列化過程稱為 pickle， #可以將物件 pickle 成字串、磁碟上的檔案或者任何類似於檔案的物件， #也可以

BiLSTM-CRF 模型實現中文命名實體識別

三個月之前 NLP 課程結課，我們做的是命名實體識別的實驗。在MSRA的簡體中文NER語料（我是從這裡下載的，非官方出品，可能不是SIGHAN 2006 Bakeoff-3評測所使用的原版語料）上訓練NER模型，識別人名、地名和組織機構名。嘗試了兩種模型：一種是手工定義特徵模板後再用CRF++開源包訓練CR

dedecms織夢移動版偽靜態 - 實現與PC電腦版靜態地址url一致教程+偽靜態規則

intval 使用 nokia 說明 one 之前列表 exit 技術電腦版靜態效果移動版偽靜態效果以下教程所修改的文件(utf8/gbk)打包下載：修改或者覆蓋文件之前請備份以下4個文件\m\index.php\m\list.ph

VB.NET版+三層實現登陸

open() util rgb 減少 nbsp not exec name p s 三層已經學了一些時間了，開始認為自己能夠用C#敲代碼了，就用C#寫了一個實現登陸的，真正再用在機房中。還是認為非常吃力的，所以。決定用vb.net敲了。以下是我用vb.net實現

java處理大數據量任務時的可用思路--未驗證版，具體實現方法有待實踐

mapr 正向碰撞並且 aggregate pear 因此 mapreduce and 1.Bloom filter適用範圍：可以用來實現數據字典，進行數據的判重，或者集合求交集基本原理及要點：對於原理來說很簡單，位數組+k個獨立hash函數。將hash函數對應的值的

Effective Java 第三版——14.考慮實現Comparable接口

技術靜態類判斷 exc 反轉 fix 信息相同有序集合 Tips 《Effective Java, Third Edition》一書英文版已經出版，這本書的第二版想必很多人都讀過，號稱Java四大名著之一，不過第二版2009年出版，到現在已經將近8年的時間，但隨著

S2SH簡單例項註解版——登入功能實現

第一步：Spring4 整合 Hibernate4 Spring4 接管 Hibernate4 所有 Bean 例項，以及 SessionFactory，事務管理器；泛型注入； Entity package com.java1234.entity; import j

MFC版連結串列實現稀疏多項式相加減

連結串列實現多項式運算（加減）MFC視覺化版題目設計一個一元稀疏多項式簡單計算器。基本要求（1）輸入並建立兩個多項式；（2）多項式a與b相加，建立和多項式c；（3）多項式a與b相減，建立差多項式d；（4）輸出多項式a, b, c, d。輸出格式：比如多項式a為：A(x）=c1xe1

pytorch系列 -- 9 pytorch nn.init 中實現的初始化函式 uniform, normal, const, Xavier, He initialization

本文內容： 1. Xavier 初始化 2. nn.init 中各種初始化函式 3. He 初始化 torch.init https://pytorch.org/docs/stable/nn.html#torch-nn-init 1. 均勻分佈 torch.nn.init.u

酷客多企業版後臺開放實現BAT三平臺打通

技術分享 vpd 支付寶小程序企業版實現開放 oss 打通 alt 好消息！酷客多單商戶企業版開放百度小程序！現企業版支持百度小程序和支付寶小程序，做到與旗艦版一致三平臺打通，為商家帶來更加全面的電商運營流量。歡迎各位合作夥伴推廣。酷客多企業版後臺開放實現BAT三平臺

打磚塊（1）橫版移動程式碼實現

public float speed = 10.0f;//橫版移動速度 void Update() { if(Input.GetKey(KeyCode.LeftArrow) { if(transform.position.x > -5.2) //上下文的5.2是板子

使用pytorch版faster-rcnn訓練自己資料集

使用pytorch版faster-rcnn訓練自己資料集引言 faster-rcnn pytorch程式碼下載訓練自己資料集接下來工作參考文獻引言最近在復現目標檢測程式碼（師兄強烈推薦F

短連結的java版完整功能實現及ngx_lua版本的簡單實現

短連結 ngx_lua版本 https://github.com/feiyangtianyao/short_url 對lua不熟，只實現了最基本的功能。 java版本 https://gitee.com/xiaoyaofeiyang/short_url 執行sql語句

CRF以及BiLSTM+CRF

BiLSTM+CRF 現在比較流行的是bilstm+crf，即在bilstm後面接了crf層經過bilstm得到隱狀態序列 (

PyTorch使用seq2seq+attention實現時間格式轉換

pytorch實現seq2seq+attention轉換日期使用keras實現加入注意力機制的seq2seq比較麻煩，所以這裡我嘗試使用機器翻譯的seq2seq+attention模型實現人造日期對標準日期格式的轉換。所copy的程式碼來自practical-pytorch

[PyTorch小試牛刀]實戰二·實現邏輯迴歸對鳶尾花進行分類

[PyTorch小試牛刀]實戰二·實現邏輯迴歸對鳶尾花進行分類程式碼使用均方根誤差 import numpy as np import pandas as pd import matplotlib.pyplot as plt import torch as t fr

【演算法】BILSTM+CRF中的條件隨機場

BILSTM+CRF中的條件隨機場 tensorflow中crf關鍵的兩個函式是訓練函式tf.contrib.crf.crf_log_likelihood和解碼函式tf.contrib.crf.viterbi_decode crf_log_likelihood(inputs, tag_indices, s

pytorch版的bilstm+crf實現sequence label

相關推薦