筆記：Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction

阿新 • • 發佈：2022-03-08

Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction

作者：Du.J et al.EMNLP 2018.

1 Introduction

發現問題：之前有很多針對DS噪聲資料問題的paper，其中與attention結合的方法效果顯著，但之前的attention model都是1 D的vector，作者認為1 D vector 的attention對於無論是句子的表示還是multi-instances的表示都不足，因此為解決這一問題，本文作者提出了multi-level(2-D matrix，D為特徵維度) structured self-attention 機制，

2 Method

首先針對之前的paper方法，如Lin.2016\(^{[2]}\)採用sentence-level multi-instances attention，但針對每一個句子只是直接將其扔到CNN中並沒有對句子做word-level的attention 來表示句子。對於Zhou 2016.\(^{[3]}\)雖然提出使用attention做word-level的sentence representation進而抽取relation，但其只是針對單個句子沒有采用mutil-instances即並沒有針對distant supervision的mutil-instances中有噪聲的問題。

所以作者認為在DNN-based DS RE中有兩個重要的的表示的學習的問題：(1) 從單個句子中學習到的面向entity pair的context 表示學習；(2) 針對mutil-instances學習一個有效句子的選擇的表示--即對bag中所有句子attention，選擇即給bag中所有句子分配不同的權重，最後得到multi-instances的表示：bag_rep.

對於這兩個問題之前已經有一些paper提出解決的方法了，如Lin.2016\(^{[2]}\), Zhou 2016.\(^{[3]}\)等，本文只是在此基礎之上，針對Introdution中的問題，提出了mutil-level 的attention。因為之前的無論是sentence-level、word-level attention，它的attention都是1-D的，對bag中的句子或每個句子中的所有詞加權求和後，得到的是1-D的vector(在不考慮batch的情況下)即可以理解為只是對multi-instances或sentence中所有words，在一個aspect的表示。

2.1 Architecture

整體結構如下圖Figure1所示，從整體上來看可以將其分為三個部分。first part：包括embedding layer在內的Bi-LSTM層，目的是對輸入的seq進行資訊的提取，將其轉換為 by time steps的 LSTM hidden states vector \(H = (h_1,h_2,...,h_N)^T\), 其中h維度2u，u為LSTM的單元數)。second part：即word-level的attention包括後續的context representation layer、flatten layer。third part：即sentence-level attention，包括後續的averaged attention、selection layer。最後將整個bag_rep交給softmax做關係分類預測。

2.2 Structured Word-Level Self-Attention

整個模型輸入以一個bag為基本單位，對於其中的某一個句子\(S_j\) ,可以被embedding為\(S_j = (e_1,e_2,...,e_N)\) ,經過BiLSTM之後得到對應的hidden state表示，\(H = (h_1,h_2,...,h_N)^T\).
對於word-level attenion以H為輸入，維度為\(2u*N\), 這層的目的就是對一個句子中的所有單詞加權處理得到attention後的句子表示。那麼首先要進行權重的計算，如下公式Eq (3).

\(A_{L1}\)為word-level的attention，是一個大小為\(r^{L1}\times N\)的annotation矩陣，其中\(L1\)表示first-level的attention機制。\(W_{s2}^{L1}\)是大小為\(d_a^{L1}\times2u\)的權重矩陣，其中\(d_a^{L1}\)為超參--attention網路的神經元個數，\(W_{s2}^{L1}\)也是權重矩陣，大小為\(r^{L1}\times{d_a^{L1}}\), 其中\(r^{L1}\)也為超參，代表2D attention matrix中mutilple vectors的數量即2D attention matrix有多少行，\(r^{L1}\)的大小則基於我們需要focus sentence多少個不同的aspects，即matrix每一行r代表一種sentence的表示。

拿到權重之後，\(A_{L1}\)與H相乘得到\(r^{L1}\)個權重和，如下公式Eq (4)，其中\(M_{L1}\)的shape為\(r^{L1}\times2u\)。因此我們可以說由傳統的1-D sentence 表示，擴充套件到了2-D的表示(\(r^{L1}\)>1)。

最後我們把\(M_{L1}\)交給平層Flaten layer後再經過線性非線性變換得到second part最後的輸出\(O_j^{L1}\)，如下公式Eq (5)。其中，\(M_{L1}^{FT}\)為經過flat之後的\(M_{L1}\)(\(二維矩陣: (r^{L1}\times2u)\;\overrightarrow{flat}\;一維vector： r^{L1}\ast2u\))，\(W_o^{L1}\)為二維權重矩陣大小為\(v\times{(r^{L1}\ast2u)}\)，b為大小為\(v\)的一維偏置向量。\(O_j^{L1}\)為bag中第j個instance的聚合句子表示，\(size = v\).

以上只是bag中一個instance的attention變換流程，那麼整個bag中的所有instance經過變換後如下公式Eq (6)，其中\(O^{L1}\)為\(v\times{J}\)大小的矩陣。

2.3 Structured Sentence-Level Self-Attention and Averaged Selection Representation

second part的attention結構類似first part，我們採用\(O^{L1}\)作為輸入，sentence-level attention矩陣的計算如下公式Eq (8)，其中\(W_{s1}^{L2}\)是大小為\(d_a^{L2}\times{v}\)的權重矩陣，\(d_a^{L2}\)為attention網路的神經元個數，\(W_{s2}^{L2}\)是大小為\(r^{L2}\times{d_a^{L2}}\)的權重矩陣，其中\(r^{L2}\)是超參，表明2-D sentence-level的attention matrix中的multiple vectors的大小，即有多少行，每行代表一種attention模式或者說一種selective 模式--bag中valid句子的選擇。我們希望這\(r^{L2}\)個vectors能夠關注具有不同資訊的instances，即能夠選擇不同的instance的組合--不同的權重。\(A_{L2}\)是一個句子級別的annotation矩陣大小為\(r_{L2}\times{J}\), 因此我們可以看到有傳統的1-D sentence-level attention 擴充套件成為一個mutil-vector的attention\((r_{L2}>1)\)。對比Lin.2016\(^{[2]}\)。

然後我們對這個2-D的\(A_{L2}\)取平均，變為1-D的\(\overline{A}_{L2}\), 因此進行維度變換之後，\(\overline{A}_{L2}\)與聚合的句子表示\(O^{L1}\)相乘計算得到averaged weighted sum，即最終的instance selection representation--\(M_{L2}\)，如下公式Eq (9), 其中\(M_{L2}\)是一個大小為\(v\)的一維向量。

如下Eq (10)為最終預測relation type的概率分散式表示。

補充：由此我們可以看到其實所謂的muti-level等就是從1-D變為了2-D，而公式中的各個權重矩陣也就是為了達成這一mutil-level以及計算而新增設計的(主要指維度的設計), 其實就是為了維度的變換嘛使得網路計算不出錯，各個權重矩陣也都是隨機初始化的，那麼其實整個結構除了LSTM這種網路結構，那維度變來變去變得還是和權重引數以及與其運算的資料，而資料是給定的，權重是隨著訓練不斷更新的，那麼可以見得權重還是很重要的，那麼就這麼隨機初始化會不會太草率，如果用個跨任務的遷移，用遷移的權重取初始化會不會好一些，跨任務比如關係抽取輸入就要涉及兩個實體嘛，那跨NER任務可行麼？？？

3 Experiments

本文作者使用兩個distantly supervised datasets--NYT、DBpedia，採用多種不同的評價指標進行對比實驗，詳細實驗引數配置及流程，見原文說明。

4 Conclusion

本文的主體結構以及方法沒什麼創新，都是之前已經有的只是針對multi-instances的兩個關鍵點，將之前的方法結合了一下。主要創新點就只在mutil-level 的attention吧，多aspect的sentence間和sentence內的表示。

參考

[1] Jinhua Du, Jingguang Han, Andy Way, Dadong Wan.Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction.EMNLP 2018.

[2] Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, Maosong Sun.Neural Relation Extraction with Selective Attention over Instances.ACL 2016.

[3] Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, Bo Xu.Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification.ACL 2016.

筆記：Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction

Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction

作者：Du.J et al.EMNLP 2018.

目錄

1 Introduction

2 Method

2.1 Architecture

2.2 Structured Word-Level Self-Attention

2.3 Structured Sentence-Level Self-Attention and Averaged Selection Representation

3 Experiments

4 Conclusion

參考

筆記：Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction

論文閱讀筆記：《CRNet: Cross-Reference Networks for Few-Shot Segmentation》

筆記：Matching the Blanks: Distributional Similarity for Relation Learning

筆記：Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification

筆記：A Frustratingly Easy Approach for Entity and Relation Extraction

《AdaptSegNet：Learning to Adapt Structured Output Space for Semantic Segmentation》論文筆記

筆記：Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction

C/C++程式設計筆記：C語言 for 迴圈精講！例項講解帶你吃透

論文筆記005-《Multi-view Knowledge Graph Embedding for Entity Alignment》

論文閱讀筆記：《SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation》

論文閱讀筆記：Social Collaborative Filtering for Cold-start Recommendations

讀書筆記-多工學習-A Novel Multi-task Deep Learning Model for Skin Lesion Segmentation and Classification

論文筆記：Towards Practical Differential Privacy for SQL Queries FLEX工具 PrivSql主要參考和對比的物件

Java筆記：主要包括集合，迭代器，增強for，泛型

學習筆記：C#入門（四）迴圈練習--用for迴圈寫三角形

論文筆記3：SegFormer Simple and Efficient Design for Semantic Segmentation with Transformers

第五課第四周筆記3：Multi-Head Attention多頭注意力

筆記：Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification

筆記：Enriching Pre-trained Language Model with Entity Information for Relation Classification

筆記：Prototypical Networks for Few-shot Learning

筆記：Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction

Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction

作者：Du.J et al.EMNLP 2018.

目錄

1 Introduction

2 Method

2.1 Architecture

2.2 Structured Word-Level Self-Attention

2.3 Structured Sentence-Level Self-Attention and Averaged Selection Representation

3 Experiments

4 Conclusion

參考

相關推薦