詞向量——ELMo

阿新 • • 發佈：2018-11-26

ELMo（ Embeddings from Language Models ）詞向量模型，2018年3月在Deep contextualized word representations這篇論文中被提出，下面就幾個方面來介紹ELMo模型。

1.產生場景（為什麼產生）

word2vec、glove等詞向量模型有以下缺點：

（1）沒有捕捉到詞性等語法資訊，比如glove中

（2）每個詞對應一個詞向量，沒有解決一詞多義問題

2.特徵

ELMo是一種是基於特徵的語言模型，用預訓練好的語言模型，生成更好的特徵。

較高層的LSTM學習到了不同上下文情況下的詞彙多義性（在WSD task上表現很好），而較低層捕捉了到了語法方面資訊（可用作詞性標註任務中）。

與傳統的每個token被分配一個詞向量表示不同，ELMo中每一個詞語的表徵都是整個輸入句子的函式。

3.訓練

（1）語料庫：a corpus with approximately 30 million sentences (Chelba et al., 2014)

（2）訓練方法：雙向LSTM在大文字語料庫上用耦合語言模型（LM）目標訓練。

其中： $S^{task}$ 是softmax-normalized權重，標量引數 $\gamma ^{_{task}}$ 允許任務模型來縮放整個ELMo向量。

4.評估

把ELMo預訓練的表示作為特徵加入到model中，表上baseline為不加，ELMo+baseline為新增ELMo為特徵，最右格給出了performance的絕對和相對提高。

文章第五部分還比較了adding position，正則化引數等的不同選擇給效能提升帶來的差異

評估用到的task主要是以下六種型別的NLP任務

（1）QA

dataset：The Stanford Question Answering Dataset (SQuAD),contains 100K+ crowd sourced questionanswer pairs where the answer is a span in a given Wikipedia paragraph

（2）Textual entailment（考慮到前提，假設是否屬實）

dataset：The Stanford Natural Language Inference (SNLI) corpus ，provides approximately 550K hypothesis/premise pairs.

（3）Semantic role labeling（模擬句子的謂詞 - 引數結構，通常被描述為回答“誰對誰做了什麼”）

dataset：the OntoNotes benchmark (Pradhan et al., 2013)

（4）Coreference resolution（clustering mentions in text that refer to the same underlying real world entities）

dataset：the OntoNotes coreference annotations from the CoNLL 2012 shared task (Pradhan et al., 2012)

（5）Named entity extraction

dataset：The CoNLL 2003 NER task (Sang and Meulder, 2003)，consists of newswire from the Reuters RCV1 corpus tagged
with four different entity types (PER, LOC, ORG,MISC)

（6）Sentiment analysis

dataset：Stanford Sentiment Treebank (SST-5) involves selecting one of five labels (from very negative to very positive) to describe a sentence from a movie review.

另外，為了說明此模型很好的捕捉到了詞性和詞義，文章給出了兩個實驗，分別是WSD和POS tagging，用到的dataset分別是

SemCor 3.0和Wall Street Journal portion of the Penn Treebank (PTB)

詞向量——ELMo

1.產生場景（為什麼產生）

2.特徵

3.訓練

4.評估

詞向量——ELMo

最簡單的方式獲取Elmo得到的詞向量

Elmo詞向量中文訓練過程雜記

詞向量技術-從word2vec到Glove到ELMo

ELMo詞向量用於中文

詞向量技術(從word2vec到ELMo)以及句嵌入技術

如何將ELMo詞向量用於中文

第一節——詞向量與ELmo(轉）

詞向量-LRWE模型

95、自然語言處理svd詞向量

[Algorithm & NLP] 文本深度表示模型——word2vec&doc2vec詞向量模型

機器不學習：word2vec是如何得到詞向量的？

CountVectorizer，Tf-idfVectorizer和word2vec構建詞向量的區別

Python Word2Vec使用訓練好的模型生成詞向量

機器學習之路： python 實踐 word2vec 詞向量技術

詞向量降維

基線系統需要受到更多關註：基於詞向量的簡單模型

自然語言處理詞向量模型-word2vec

詞向量之Word2vector原理淺析

word2vec 和 doc2vec 詞向量表示

詞向量——ELMo

1.產生場景（為什麼產生）

2.特徵

3.訓練

4.評估

相關推薦