xlm跨語言模型

阿新 • • 發佈：2020-10-08

Models like BERT (Devlin et. al.) or GPT (Radford et. al.) have achieved the state of the art in language understanding. However, these models are pre-trained only on one language. Recently, efforts have been made towards mitigating monolingual representations and building universal cross-lingual models that would be capable of encoding any sentence into a shared embedding space.

像BERT( Devlin等人 )或GPT( Radford等人 )這樣的模型已經在語言理解方面達到了最先進的水平。但是，這些模型僅在一種語言上進行了預訓練。近來，已經做出努力以減輕單語言表示並建立通用的跨語言模型，該模型能夠將任何句子編碼到共享的嵌入空間中。

In this article, we will be discussing the paper, Cross-lingual Language Model Pretraining, proposed by Facebook AI. The authors propose 2 approaches for cross-lingual language modeling:

在本文中，我們將討論Facebook AI提出的論文《跨語言模型預訓練》。作者提出了兩種用於跨語言建模的方法：

Unsupervised, relies on monolingual data
無監督，依靠單語資料
Supervised, relies on parallel data.
受監督，依賴於並行資料。

跨語言語言模型(XLM) (Cross-lingual Language Model (XLM))

In this section, we will discuss the approaches proposed for training the XLM.

在本節中，我們將討論為訓練XLM提出的方法。

共享子詞詞彙 (Shared Sub-Word Vocabulary)

The model uses the same shared vocabulary for all the languages. This helps in establishing a common embedding space for tokens from all languages. Hence, it is evident that languages that have the same script (alphabets), or similar words map better to this common embedding space.

該模型對所有語言使用相同的共享詞彙 。這有助於為來自所有語言的令牌建立通用的嵌入空間。因此，很明顯，具有相同指令碼(字母)或類似單詞的語言可以更好地對映到此公共嵌入空間。

For tokenizing the corpora, Byte-Pair Encoding (BPE) is used.

為了標記語料庫，使用了位元組對編碼(BPE)。

因果語言建模(CLM) (Causal Language Modeling (CLM))

This is the regular Language Modeling objective where we maximize the probability of a token x_t to appear at the ‘t’th position in a given sequence given all the tokens x_<t (all the tokens preceding the ‘t’th token) in that sequence. i.e.

這是常規語言建模的目標，在該目標中，在給定所有標記x_ <t (所有在' t '標記之前的所有標記)的情況下，我們最大化標記x_t在給定序列中出現在第' t '位置的概率序列。即

Image for post — XLNet Paper XLNet論文進行因果語言建模

OpenAI’s GPT and GPT-2 are trained on this objective. You can refer to my articles on GPT and GPT-2 if you’re interested in the details of this objective.

OpenAI的GPT和GPT-2就此目標進行了培訓。如果您對此目標的細節感興趣，可以參考我在GPT和GPT-2上的文章。

遮蔽語言建模(MLM) (Masked Language Modeling (MLM))

This is a type of the Denoising Autoencoding objective, also known as the Cloze task. Here, we maximize the probability of a given masked token x_t to appear at the ‘t’th position in a given sequence given all the tokens in that sequence, x_hat. i.e.

這是“降噪自動編碼”目標的一種，也稱為“結束任務”。在這裡，我們給定被遮蔽令牌x_t出現在給定序列x_hat中所有令牌的給定序列中第t個位置的概率。即

BERT and RoBERTa are trained on this objective. You can refer to my articles on BERT and RoBERTa if you’re interested in the details of this objective.

BERT和RoBERTa就此目標進行了培訓。如果您對該目標的細節感興趣，可以參考我在BERT和RoBERTa上的文章。

Note that the only difference between BERT’s and XLM’s approach is that BERT uses pairs of sentences whereas XLM uses streams of an arbitrary number of sentences and truncate once the length is 256.

請注意，BERT和XLM的方法之間的唯一區別是BERT使用成對的句子，而XLM使用任意數量的句子的流並在長度為256時截斷。

翻譯語言建模(TLM) (Translation Language Modeling (TLM))

The CLM and MLM tasks work well on monolingual corpora, however, they do not take advantage of the available parallel translation data. Hence, the authors propose a Translation Language Modeling objective wherein we take a sequence of parallel sentences from the translation data and randomly mask tokens from the source as well as from the target sentence. For example, in the figure above, we have masked words from English as well as from the French sentence. All the words in the sequence contribute to the prediction of a given masked word, hence establishing a cross-lingual mapping among the tokens.

CLM和MLM任務在單語語料庫上可以很好地工作，但是它們沒有利用可用的並行翻譯資料。因此，作者提出了一種翻譯語言建模目標，其中我們從翻譯資料中提取了一系列平行句子，並從源和目標句子中隨機遮蔽了標記 。例如，在上圖中，我們遮蔽了英語和法語句子中的單詞。 序列中的所有單詞都有助於預測給定的遮蔽單詞 ，從而在標記之間建立跨語言對映。

XLM (XLM)

In this work, we consider cross-lingual language model pretraining with either CLM, MLM, or MLM used in combination with TLM.

在這項工作中，我們考慮使用CLM，MLM或與TLM結合使用的MLM進行跨語言語言模型預訓練。

— XLM Paper

— XLM紙

XLM預培訓 (XLM Pre-training)

In this section, we’ll discuss how XLM Pre-training is leveraged for downstream tasks like:

在本節中，我們將討論如何將XLM預培訓用於下游任務，例如：

Zero-shot cross-lingual classification
零鏡頭跨語言分類
Supervised and unsupervised neural machine translation
有監督和無監督的神經機器翻譯
Language models for low-resource languages
資源匱乏的語言的語言模型
Unsupervised cross-lingual word embeddings
無監督的跨語言詞嵌入

零鏡頭跨語言分類 (Zero-shot Cross-lingual Classification)

Just like in any other Transformer-based monolingual model, XLM too, is fine-tuned on the XNLI dataset for obtaining the cross-lingual classification.

就像在任何其他基於Transformer的單語言模型中一樣，XLM也在XNLI資料集上進行了微調，以獲取跨語言分類。

A classification layer is added on top of XLM and it is trained on the English NLI training dataset. Then the model is evaluated on 15 XNLI languages.

在XLM之上添加了一個分類層，並在英語NLI訓練資料集中對其進行了訓練。然後，使用15種XNLI語言對模型進行評估。

Since the model hasn’t been tuned to classify sentences from any of these languages, it is a zero-shot learning example.

由於尚未對模型進行調整以對來自這些語言中的任何一種的句子進行分類 ，因此這是零學習示例。

無人監督NMT (Unsupervised NMT)

For this task, the authors propose pre-training a complete encoder-decoder architecture with a cross-lingual language modeling objective. The model is evaluated on several translation benchmarks including WMT’14 English-French, WMT’16 English-German, and WMT’16 English-Romanian.

為此，作者提出了使用跨語言建模目標對完整的編解碼器架構進行預訓練的建議。該模型在多個翻譯基準上進行了評估，包括WMT'14英語-法語，WMT'16英語-德語和WMT'16英語-羅馬尼亞語。

受監管的NMT (Supervised NMT)

Here, the encoder and decoder are loaded with pre-trained weights from XLM and then fine-tuned over the supervised translation dataset. This essentially achieves multi-lingual language translation.

在這裡， 編碼器和解碼器會載入來自XLM的預訓練權重 ，然後在監督的翻譯資料集中進行微調。這實質上實現了多語言翻譯。

For more on multi-lingual NMT, refer to this blog.

有關多語言NMT的更多資訊，請參閱此部落格。

低資源語言建模 (Low-resource language modeling)

Here’s where “languages with the same script or similar words provide better mapping” comes into the picture. For example, there are 100k sentences written in Nepali on Wikipedia and about 6 times more in Hindi. Moreover, these languages have 80% of tokens in common.

這就是“具有相同指令碼或相似單詞的語言提供更好的對映”的地方。例如，在Wikipedia上用尼泊爾語寫的句子有10萬個，在印地語中寫的句子大約是6萬個。而且，這些語言有80％的共同標記。

Hence, a cross-lingual language model will be evidently beneficial for a language model in Nepali as it is trained on relatively more data of similar correspondence.

因此，跨語言模型對於尼泊爾語中的語言模型顯然是有利的，因為它是在相對較多的相似對應資料上進行訓練的。

無監督的跨語言詞嵌入 (Unsupervised Cross-lingual Word Embeddings)

Finally, since we have a shared vocabulary, the lookup table (or embedding matrix) of the XLM model gives us the cross-lingual word embeddings.

最後，由於我們有一個共享的詞彙表，因此XLM模型的查詢表 (或嵌入矩陣)為我們提供了跨語言的詞嵌入。

結論 (Conclusion)

In this article, we discussed how a cross-lingual language model is beneficial not only for obtaining better results in generic downstream tasks but also for the fact that it improves the quality of the model for low-resource languages by training on similar high-resource languages, hence getting exposure to more relevant data.

在本文中，我們討論了跨語言模型如何不僅有益於在通用下游任務中獲得更好的結果，而且還因為它通過在相似的高資源資源上進行培訓而提高了低資源語言模型的質量。語言，因此可以接觸更多相關資料。

Here is a link to the original XLM GitHub repository.

這是原始XLM GitHub儲存庫的連結。

Here is a link to huggingface’s XLM architecture implementation and pre-trained weights.

這是擁抱面部的XLM體系結構實現和預訓練權重的連結。

翻譯自: https://towardsdatascience.com/xlm-cross-lingual-language-model-33c1fd1adf82

xlm跨語言模型

跨語言語言模型(XLM) (Cross-lingual Language Model (XLM))

共享子詞詞彙 (Shared Sub-Word Vocabulary)

因果語言建模(CLM) (Causal Language Modeling (CLM))

遮蔽語言建模(MLM) (Masked Language Modeling (MLM))

翻譯語言建模(TLM) (Translation Language Modeling (TLM))

XLM (XLM)

XLM預培訓 (XLM Pre-training)

零鏡頭跨語言分類 (Zero-shot Cross-lingual Classification)

無人監督NMT (Unsupervised NMT)

受監管的NMT (Supervised NMT)

低資源語言建模 (Low-resource language modeling)

無監督的跨語言詞嵌入 (Unsupervised Cross-lingual Word Embeddings)

結論 (Conclusion)

xlm跨語言模型

thrift的使用：(Java、Python之間跨語言呼叫)

《pytorch 入門學習——2. 詞向量和語言模型》

springcloud之 sidecar實現跨語言微服務呼叫

自然語言處理4-3:語言模型之n-gram模型

自然語言處理4-4：語言模型之模型評估perplexity

自然語言處理4-5：語言模型之平滑操作

馬斯克指責微軟“俘獲 OpenAI ”：微軟此前獲 GPT-3 自迴歸語言模型獨家授權

直播預告：探究句法資訊對於基於句法距離的語言模型影響 | AI TIME PhD

vs2019 Com元件初探之簡單的COM編寫及實現跨語言呼叫的方法

Go語言模型:string的底層資料結構與高效操作詳解

Kaldi(A5)語言模型及HCLG.fst生成

史上最強大 AI 模型，OpenAI 萬能語言模型 GPT-3 起底：寫作神器 or 魔鬼化身

日產 45 億詞，“地表最強語言模型”GPT-3 已落地 300 多個應用

放話挑戰 GPT-3：以色列推出引數多 30 億、詞條多 5 倍的新語言模型

動手學深度學習 | 語言模型 | 53

vivo工具開發線上計算語言模型打分

微軟和英偉達推出迄今為止訓練最大最強的語言模型 MT-NLG

Dapr + .NET Core實戰（十三）跨語言開發

自然語言的分詞方法之N-gram語言模型

xlm跨語言模型

跨語言語言模型(XLM) (Cross-lingual Language Model (XLM))

共享子詞詞彙 (Shared Sub-Word Vocabulary)

因果語言建模(CLM) (Causal Language Modeling (CLM))

遮蔽語言建模(MLM) (Masked Language Modeling (MLM))

翻譯語言建模(TLM) (Translation Language Modeling (TLM))

XLM (XLM)

XLM預培訓 (XLM Pre-training)

零鏡頭跨語言分類 (Zero-shot Cross-lingual Classification)

無人監督NMT (Unsupervised NMT)

受監管的NMT (Supervised NMT)

低資源語言建模 (Low-resource language modeling)

無監督的跨語言詞嵌入 (Unsupervised Cross-lingual Word Embeddings)

結論 (Conclusion)

相關推薦