文字分類（一）：使用Pytorch進行文字分類——BiLSTM+Attention

阿新 • • 發佈：2021-08-09

一、架構圖

二、程式碼

class TextBILSTM(nn.Module):
    
    def __init__(self,
                 config:TRNNConfig,
                 char_size = 5000,
                 pinyin_size = 5000):
        super(TextBILSTM, self).__init__()
        self.num_classes = config.num_classes
        self.learning_rate = config.learning_rate
        self.keep_dropout  
= config.keep_dropout
        self.char_embedding_size = config.char_embedding_size
        self.pinyin_embedding_size = config.pinyin_embedding_size
        self.l2_reg_lambda = config.l2_reg_lambda
        self.hidden_dims = config.hidden_dims
        self.char_size = char_size
        self.pinyin_size  
= pinyin_size
        self.rnn_layers = config.rnn_layers

        self.build_model()


    def build_model(self):
        # 初始化字向量
        self.char_embeddings = nn.Embedding(self.char_size, self.char_embedding_size)
        # 字向量參與更新
        self.char_embeddings.weight.requires_grad = True
         
# 初始化拼音向量
        self.pinyin_embeddings = nn.Embedding(self.pinyin_size, self.pinyin_embedding_size)
        self.pinyin_embeddings.weight.requires_grad = True
        # attention layer
        self.attention_layer = nn.Sequential(
            nn.Linear(self.hidden_dims, self.hidden_dims),
            nn.ReLU(inplace=True)
        )
        # self.attention_weights = self.attention_weights.view(self.hidden_dims, 1)

        # 雙層lstm
        self.lstm_net = nn.LSTM(self.char_embedding_size, self.hidden_dims,
                                num_layers=self.rnn_layers, dropout=self.keep_dropout,
                                bidirectional=True)
        # FC層
        # self.fc_out = nn.Linear(self.hidden_dims, self.num_classes)
        self.fc_out = nn.Sequential(
            nn.Dropout(self.keep_dropout),
            nn.Linear(self.hidden_dims, self.hidden_dims),
            nn.ReLU(inplace=True),
            nn.Dropout(self.keep_dropout),
            nn.Linear(self.hidden_dims, self.num_classes)
        )

    def attention_net_with_w(self, lstm_out, lstm_hidden):
        '''

        :param lstm_out:    [batch_size, len_seq, n_hidden * 2]
        :param lstm_hidden: [batch_size, num_layers * num_directions, n_hidden]
        :return: [batch_size, n_hidden]
        '''
        lstm_tmp_out = torch.chunk(lstm_out, 2, -1)
        # h [batch_size, time_step, hidden_dims]
        h = lstm_tmp_out[0] + lstm_tmp_out[1]
        # [batch_size, num_layers * num_directions, n_hidden]
        lstm_hidden = torch.sum(lstm_hidden, dim=1)
        # [batch_size, 1, n_hidden]
        lstm_hidden = lstm_hidden.unsqueeze(1)
        # atten_w [batch_size, 1, hidden_dims]
        atten_w = self.attention_layer(lstm_hidden)
        # m [batch_size, time_step, hidden_dims]
        m = nn.Tanh()(h)
        # atten_context [batch_size, 1, time_step]
        atten_context = torch.bmm(atten_w, m.transpose(1, 2))
        # softmax_w [batch_size, 1, time_step]
        softmax_w = F.softmax(atten_context, dim=-1)
        # context [batch_size, 1, hidden_dims]
        context = torch.bmm(softmax_w, h)
        result = context.squeeze(1)
        return result

    def forward(self, char_id, pinyin_id):
        # char_id = torch.from_numpy(np.array(input[0])).long()
        # pinyin_id = torch.from_numpy(np.array(input[1])).long()

        sen_char_input = self.char_embeddings(char_id)
        sen_pinyin_input = self.pinyin_embeddings(pinyin_id)

        sen_input = torch.cat((sen_char_input, sen_pinyin_input), dim=1)
        # input : [len_seq, batch_size, embedding_dim]
        sen_input = sen_input.permute(1, 0, 2)
        output, (final_hidden_state, final_cell_state) = self.lstm_net(sen_input)
        # output : [batch_size, len_seq, n_hidden * 2]
        output = output.permute(1, 0, 2)
        # final_hidden_state : [batch_size, num_layers * num_directions, n_hidden]
        final_hidden_state = final_hidden_state.permute(1, 0, 2)
        # final_hidden_state = torch.mean(final_hidden_state, dim=0, keepdim=True)
        # atten_out = self.attention_net(output, final_hidden_state)
        atten_out = self.attention_net_with_w(output, final_hidden_state)
        return self.fc_out(atten_out)

三、解釋

1、將BILSTM網路輸出的結果（shape：[batch_size, time_step, hidden_dims * num_directions(=2)]）拆成兩個大小為[batch_size, time_step, hidden_dims]的Tensor；
2、將第一步拆出的兩個Tensor進行相加運算得到h（shape：[batch_size, time_step, hidden_dims]）；
3、將BILSTM網路最後一個隱狀態（shape：[batch_size, num_layers * num_directions, hidden_dims]）在第二維度進行求和，得到新的lstm_hidden（shape：[batch_size, hidden_dims]）；
4、將lstm_hidden的維度從[batch_size, n_hidden]擴充套件到[batch_size, 1, hidden_dims]；
5、使用slef.atten_layer(h)獲得用於後續計算權重的向量atten_w（shape：[batch_size, 1, hidden_dims]）；
6、將h進行tanh啟用，得到m（shape：[batch_size, time_step, hidden_dims]）；
7、使用torch.bmm(atten_w, m.transpose(1, 2)) 得到atten_context（shape：[batch_size, 1, time_step]）；
8、將atten_context使用F.softmax(atten_context, dim=-1)進行歸一化，得到基於上下文權重的softmax_w（shape：[batch_size, 1, time_step]）；
9、使用torch.bmm(softmax_w, h)得到基於權重的BILSTM輸出context（shape：[batch_size, 1, hidden_dims]）;
10、將context的第二維度消掉，得到result（shape：[batch_size, hidden_dims]） ;
11、返回result；

四、經驗值

模型效果
1層BILSTM在訓練集準確率：99.8%，測試集準確率：96.5%；
2層BILSTM在訓練集準確率：99.9%，測試集準確率：97.3%；
調參
dropout的值要在 0.1 以下（經驗之談，筆者在實踐中發現，dropout取0.1時比dropout取0.3時在測試集準確率能提高0.5%）。
https://blog.csdn.net/dendi_hust/article/details/94435919

文字分類（一）：使用Pytorch進行文字分類——BiLSTM+Attention

一、架構圖二、程式碼 class TextBILSTM(nn.Module): def __init__(self, config:TRNNConfig, char_size = 5000,

文字分類（六）：不平衡文字分類，Focal Loss理論及PyTorch實現

轉載於：https://zhuanlan.zhihu.com/p/361152151 轉載於：https://www.jianshu.com/p/30043bcc90b6 摘要：本篇主要從理論到實踐解決文字分類中的樣本不均衡問題。首先講了下什麼是樣本不均衡現象以及可能帶來的問題

文字分類（二）：使用Pytorch進行文字分類——TextCNN

一、架構圖二、程式碼實現 class TextCNN(nn.Module): def __init__(self, config:TCNNConfig, char_size = 5000, pinyin_size=5000):

文字分類（三）：使用Pytorch進行文字分類——Transformer

一、前言文字分類不是生成式的任務，因此只使用Transformer的編碼部分（Encoder）進行特徵提取。如果不熟悉Transformer模型的原理請移步。

Bert文字分類實踐（一）：實現一個簡單的分類模型

寫在前面文字分類是nlp中一個非常重要的任務，也是非常適合入坑nlp的第一個完整專案。雖然文字分類看似簡單，但裡面的門道好多好多，作者水平有限，只能將平時用到的方法和trick在此做個記錄和分享，希望大家看過都

深度學習loss值變為0_TF2.0深度學習實戰（一）：分類問題之手寫數字識別

技術標籤：深度學習loss值變為0 點選上面“藍字”關注我們本專輯持續更新，歡迎關注。本著學習的心，希望和大家相互交流，一起進步！

文字挖掘學習筆記（一）：文字分詞和詞雲展示

技術標籤：大資料python資料分析注：學習筆記基於文彤老師文字挖掘的系列課程

機器學習sklearn（三十二）：演算法例項（一）分類（一）分類決策樹（一）簡介

1 概述 1.1 決策樹是如何工作的　　決策樹（Decision Tree）是一種非引數的有監督學習方法，它能夠從一系列有特徵和標籤的資料中總結出決策規則，並用樹狀圖的結構來呈現這些規則，以解決分類和迴歸問題。決策樹演

Pytorch實戰學習（一）：用Pytorch實現線性迴歸

《PyTorch深度學習實踐》完結合集_嗶哩嗶哩_bilibili P5--用Pytorch實現線性迴歸建立模型四大步驟

文字分類（五）：transformers庫BERT實戰，基於BertForSequenceClassification

一、程式碼一 import pandas as pd import codecs from config.root_path import root import os from utils.data_process import get_label,text_preprocess

好大一棵樹，新春的祝福（一）：n級分類的資料結構

這個樹的結構幾年前在csdn裡面也發過了一次，現在看看，主體結構居然沒有什麼變化，用了這麼長的時間，自我感覺還是很好用的。而且在這個基礎之上把其他的功能也都給聯絡起來了，比如“通用許可權”、配置

機器學習基礎+實踐（一）：鳶尾花分類

一些概念機器學習的前身人為指定決策規則。人為指定決策規則適合處理人們非常熟悉處理過程的應用，一旦資料量過大、資料處理過程複雜或者任務有所變化，就需要機器學習上場了

API介面開發（一）：介面開發返回結果解決方案

摘要採用前後端分離的方式進行專案開發，那麼前後端互動比較好的方式是採用HTTP+JSON。如何介面返回結果更加簡潔，更加優雅，也更加合理，並且讓前端開發人員看得明白，後端開發者也並不會因此而增加工作量呢？

IO Monad 設計淺析（一）：Monad 和 MonadError

ZIO 是最近 Scala 社群非常熱門且與眾不同的 IO Monad 實現，本專題我們會從各個角度分析 ZIO 和 Cats-Effect 等 IO Monad 的設計。

Spring Boot（一）：快速開始

本系列文章旨在使用最小依賴、最簡單配置，幫助初學者快速掌握Spring Boot各元件使用，達到快速入門的目的。全部文章所使用示例程式碼均同步Github倉庫和Gitee倉庫。

Java 雜記（一）：Java Core

基本型別 Boxing Java 支援自動裝箱，但是用過 C# 的人就會明白它和程式設計師真正理想的還差很遠（做到了無裝箱類），它只會在賦值時呼叫valueOf。比如說，我們有一個IntStream，而我們想轉成一個int[]，此時呼叫to

Python使用指南（一）：在微信頭像上新增紅旗貼畫

指南宣告《Python使用指南》系列文章基於我的Python學習過程總結而成，主要內容是對各個實際需求從需求分析到使用Python進行實現的全幅記錄。由於技術和語文水平有限，該指南可能存在各種技術或表達問題，歡迎你留言

從零寫一個編譯器（一）：輸入系統和詞法分析

前言從半抄半改的完成一個把C語言編譯到Java位元組碼到現在也有些時間，一直想寫一個系列來回顧整理一下寫一個編譯器的過程，也算是學習筆記吧。就從今天開始動筆吧。

RocketMQ深度解析（一）：RocketMQ總體設計

設計理念 RocketMQ設計基於主題的釋出和訂閱模式，其核心功能包括訊息傳送、訊息儲存、訊息消費，整體設計追求簡單與效能第一，主要體現在三個方面：

Spring Boot實戰（一）：只需兩步！Eclipse+Maven快速構建第一個Spring Boot專案

隨著使用Spring進行開發的個人和企業越來越多，Spring從一個單一簡潔的框架變成了一個大而全的開源軟體，最直觀的變化就是Spring需要引入的配置也越來越多。配置繁瑣，容易出錯，讓人無比頭疼，簡化Spring配置簡直可

文字分類（一）：使用Pytorch進行文字分類——BiLSTM+Attention

一、架構圖

二、程式碼

三、解釋

四、經驗值

相關推薦