使用pytorch的seq2sequence建模進行神經機器翻譯的全面指南

阿新 • • 發佈：2020-10-08

目錄：(Table of Contents:)

1.簡介(1. Introduction)

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

神經機器翻譯(NMT)是一種機器翻譯方法，它使用人工神經網路來預測單詞序列的可能性，通常在單個整合模型中對整個句子進行建模。

It was one of the hardest problems for computers to translate from one language to another with a simple rule-based system because they were not able to capture the nuances involved in the process. Then shortly we were using statistical models but after the entry of deep learning the field is collectively called Neural Machine Translation and now it has achieved State-Of-The-Art results.

使用基於規則的簡單系統將計算機從一種語言轉換為另一種語言是最困難的問題之一，因為它們無法捕獲過程中涉及的細微差別。然後不久我們就使用了統計模型，但是在深度學習開始後，該領域統稱為神經機器翻譯，現在它已經取得了最新的成果。

I want this post to be beginner-friendly, so a specific kind of architecture (Seq2Seq) showed a good sign of success, is what we are going to implement here.

我希望這篇文章對初學者友好，因此我們將在此處實現的一種特定體系結構(Seq2Seq)顯示出成功的好兆頭。

So the Sequence to Sequence (seq2seq) model in this post uses an encoder-decoder architecture, which uses a type of RNN called LSTM (Long Short Term Memory), where the encoder neural network encodes the input language sequence into a single vector, also called as a Context Vector.

因此，本文中的序列到序列(seq2seq)模型使用了編碼器-解碼器體系結構，該體系結構使用一種稱為LSTM(長短期記憶)的RNN，其中編碼器神經網路將輸入語言序列編碼為單個向量，稱為上下文向量。

This Context Vector is said to contain the abstract representation of the input language sequence.

據說此文字向量包含輸入語言序列的抽象表示。

This vector is then passed into the decoder neural network, which is used to output the corresponding output language translation sentence, one word at a time.

然後將此向量傳遞到解碼器神經網路，該網路用於輸出對應的輸出語言翻譯語句，一次輸出一個單詞。

Here I am doing a German to English neural machine translation. But the same concept can be extended to other problems such as Named Entity Recognition (NER), Text Summarization, even other language models, etc.

在這裡，我正在做德語到英語的神經機器翻譯。但是相同的概念可以擴充套件到其他問題，例如命名實體識別(NER)，文字摘要，甚至其他語言模型等。

2.資料準備和預處理 (2. Data Preparation and Pre-processing)

For getting the data in the best way we want, I am using SpaCy (Vocabulary Building), TorchText (text Pre-processing) libraries, and multi30k dataset which contains the translation sequences for English, German and French languages

為了以我們想要的最佳方式獲取資料，我正在使用SpaCy(詞彙構建)，TorchText(文字預處理)庫和multi30k資料集，其中包含英語，德語和法語的翻譯序列

Torch text is a powerful library for making the text data ready for a variety of NLP tasks. It has all the tools to perform preprocessing on the textual data.

火炬文字是一個強大的庫，用於使文字資料為各種NLP任務做好準備。它具有用於對文字資料執行預處理的所有工具。

Let’s see some of the processes it can do,

讓我們看看它可以做的一些過程，

1. Train/ Valid/ Test Split: partition your data into a specified train/ valid/ test set.

1.訓練/有效/測試拆分：將資料劃分為指定的訓練/有效/測試集。

2. File Loading: load the text corpus of various formats (.txt,.json,.csv).

2.檔案載入：載入各種格式(.txt，.json，.csv)的文字語料庫。

3. Tokenization: breaking sentences into a list of words.

3.標記化：將句子分成單詞列表。

4. Vocab: Generate a list of vocabulary from the text corpus.

4.翻譯：由該文字語料庫詞彙的列表。

5. Words to Integer Mapper: Map words into integer numbers for the entire corpus and vice versa.

5.單詞到整數對映器：將單詞對映為整個語料庫的整數，反之亦然。

6. Word Vector: Convert a word from a higher dimension to a lower dimension (Word Embedding).

6.單詞向量：將單詞從較高維度轉換為較低維度(詞嵌入)。

7. Batching: Generate batches of the sample.

7.批處理：生成一批樣品。

So once we get to understand what can be done in torch text, let’s talk about how it can be implemented in the torch text module. Here we are going to make use of 3 classes under torch text.

因此，一旦我們瞭解了在割炬文字中可以執行的操作，就讓我們討論如何在割炬文字模組中實現它。在這裡，我們將在火炬文字下使用3個類。

1. Fields :

1.欄位：

This is a class under the torch text, where we specify how the preprocessing should be done on our data corpus.
這是手電筒文字下的一類，我們在其中指定如何對資料集進行預處理。

2. TabularDataset :

2. TabularDataset ：

Using this class, we can actually define the Dataset of columns stored in CSV, TSV, or JSON format and also map them into integers.
使用此類，我們可以實際定義以CSV，TSV或JSON格式儲存的列的資料集，並將它們對映為整數。

3. BucketIterator :

3. BucketIterator ：

Using this class, we can perform padding our data for approximation and make batches with our data for model training.
使用此類，我們可以對資料進行填充以進行近似，並使用我們的資料進行批處理以進行模型訓練。

Here our source language (SRC — Input) is German and the target language (TRG — Output) is English. We also add 2 extra tokens “start of sequence” <sos> and “end of sequence” <EOS> for effective model training.

在這裡，我們的源語言(SRC-輸入)是德語，目標語言(TRG-輸出)是英語。我們還添加了兩個額外的令牌“序列開始” <sos>和“序列結束” <EOS>，以進行有效的模型訓練。

After setting the language pre-processing criteria, the next step is to create batches of training, testing, and validation data using iterators.

設定語言預處理標準後，下一步是使用迭代器建立一批培訓，測試和驗證資料。

Creating batches is an exhaustive process, luckily we can make use of TorchText’s iterator library.

建立批處理是一個詳盡的過程，幸運的是，我們可以使用TorchText的迭代器庫。

Here we are using BucketIterator for effective padding of source and target sentences. We can access the source (german) batch of data using the .src attribute and it's corresponding (English) batch of data using the .trg attribute. Also, we can see the data before tokenizing it.

在這裡，我們使用BucketIterator來有效填充源句子和目標句子。我們可以使用.src屬性訪問源(德語)資料批次，而使用.trg屬性訪問相應的(英語)資料批次。此外，我們可以在標記資料之前檢視資料。

I just experimented with a batch size of 32 and a sample batch is shown below. The sentences are tokenized into a list of words and indexed according to the vocabulary. The “pad” token gets an index of 1.

我剛剛試驗了32個批次，下面顯示了一個示例批次。句子被標記為單詞列表，並根據詞彙索引。 “填充”令牌的索引為1。

Each column corresponds to a sentence, indexed into numbers and we have 32 such sentences in a single target batch and the number of rows corresponds to the maximum length of that sentence. Short sentences are padded with 1's to compensate for the length.

每列對應一個句子，索引成數字，我們在一個目標批次中有32個這樣的句子，行數對應於該句子的最大長度。短句用1填充以補償長度。

The table below (Idx.csv) contains the numerical indices of the batch, which is later fed into the word embedding and converted into dense representation for Seq2Seq processing.

下表(Idx.csv)包含該批次的數字索引，該索引隨後被饋送到單詞嵌入中，並轉換為用於Seq2Seq處理的密集表示形式。

Sample Target Batch with indices

帶有索引的樣本目標批次

The table below (Words.csv) contains the corresponding words mapped with the numerical indices of the batch.

下表(Words.csv)包含與批次的數字索引對應的對應單詞。

Sample Target Batch with Words

帶有單詞的樣本目標批次

3.長期短期記憶(LSTM)(3. Long Short Term Memory (LSTM))

Image for post — LSTM — Under the hood. Source — Author

The above picture shows the units present under a single LSTM Cell. I will add some references to learn more about LSTM in the last and why it works well for long sequences.

上圖顯示了單個LSTM單元下的單位。我將新增一些參考資料，以在最後瞭解更多有關LSTM的知識，以及為什麼它對長序列有效。

But to simply put, Vanilla RNN, Gated Recurrent Unit (GRU) is not able to capture the long term dependencies due to its nature of design and suffers heavily by the Vanishing Gradient problem, which makes the rate of change in weights and bias values negligible, resulting in poor generalization.

但是簡單地說，由於其設計的性質，Vanilla RNN門控迴圈單元(GRU)無法捕獲長期依賴關係，並且遭受Vanishing Gradient問題的嚴重困擾，這使得權重和偏差值的變化率可忽略不計，導致泛化不佳。

Inside the LSTM cell, we have a bunch of mini neural networks with sigmoid and TanH activations at the final layer and few vector adder, Concat, multiplications operations.

在LSTM單元內部，我們有一堆微型神經網路，它們的最後一層具有S型和TanH啟用，並且很少進行向量加法器，Concat和乘法運算。

Sigmoid NN → Squishes the values between 0 and 1. Say a value closer to 0 means to forget and a value closer to 1 means to remember.

乙狀結腸 →壓縮介於0和1之間的值。說接近0的值表示忘記，而接近1的值表示記住。

Embedding NN → Converts the input word indices into word embedding.

嵌入NN →將輸入的單詞索引轉換為單詞嵌入。

TanH NN → Squishes the values between -1 and 1. Helps to regulate the vector values from either getting exploded to the maximum or shrank to the minimum.

TanH NN →壓縮-1和1之間的值。有助於調節向量值，使其免於爆炸至最大值或縮小至最小值。

蓋茨： (Gates:)

But LSTM has some special units called gates (Remember (Add) gate, Forget gate, Update gate), which helps to overcome the problems stated before.

但是LSTM有一些稱為門的特殊單元(記住(新增)門，忘記門，更新門)，這有助於克服前面提到的問題。

Forget Gate → Has sigmoid activation in it and range of values between (0–1) and it is multiplied over the cell state to forget some elements. (“Vector” * 0 = 0)
忘記門→具有S型啟用，其值範圍介於(0–1)之間，並與單元狀態相乘以忘記某些元素。 (“向量” * 0 = 0)
Add Gate → Has TanH activation in it and range of values between
新增門→激活了TanH，其值範圍介於

(-1 to +1) and it is
(-1至+1)，它是

added over the cell state to remember some elements. (“Vector” * 1= “Vector”)
新增到單元格狀態以記住一些元素。 (“向量” * 1 =“向量”)
Update Hidden → Updates the Hidden State based on the Cell State.
更新隱藏→根據單元狀態更新隱藏狀態。

The hidden state and the cell state are referred to here as the context vector, which are the outputs from the LSTM cell. The input is the sentence’s numerical indexes fed into the embedding NN.

隱藏狀態和單元狀態在這裡稱為上下文向量，它們是LSTM單元的輸出。輸入是輸入到嵌入NN中的句子的數字索引。

4.編碼器模型架構(Seq2Seq) (4. Encoder Model Architecture (Seq2Seq))

Before moving to build the seq2seq model, we need to create an Encoder, Decoder, and create an interface between them in the seq2seq model.

在開始構建seq2seq模型之前，我們需要建立一個Encoder，Decoder，並在seq2seq模型中建立它們之間的介面。

Let’s pass the german input sequence “Ich Liebe Tief Lernen” which translates to “I love deep learning” in English.

讓我們通過德語輸入序列“ Ich Liebe Tief Lernen ”，該序列翻譯成英語的“我愛深度學習”。

For a lighter note, let’s explain the process happening in the above image. The Encoder of the Seq2Seq model takes one input at a time. Our input German word sequence is “ich Liebe Tief Lernen”.

為了便於說明，讓我們解釋上圖中發生的過程。 Seq2Seq模型的編碼器一次只接受一個輸入。我們輸入的德語單詞序列為“ ich Liebe Tief Lernen”。

Also, we append the start of sequence “SOS” and the end of sentence “EOS” tokens in the starting and in the ending of the input sentence.

另外，我們在輸入句子的開頭和結尾處附加序列“ SOS”的開頭和句子“ EOS”標記的結尾。

Therefore at

因此在

At time step-0, the ”SOS” token is sent,
在時間步0，傳送“ SOS”令牌，
At time step-1 the token “ich” is sent,
在步驟1，令牌“ ich”被髮送，
At time step-2 the token “Liebe” is sent,
在時間步驟2，令牌“ Liebe”被髮送，
At time step-3 the token “Tief” is sent,
在步驟3中，傳送了令牌“ Tief”，
At time step-4 the token “Lernen” is sent,
在第4步，令牌“ Lernen”被髮送，
At time step-4 the token “EOS” is sent.
在時間步驟4，傳送令牌“ EOS”。

And the first block in the Encoder architecture is the word embedding layer [shown in green block], which converts the input indexed word into a dense vector representation called word embedding (sizes — 100/200/300).

編碼器體系結構中的第一個塊是單詞嵌入層(以綠色塊顯示)，該層將輸入的索引詞轉換為稱為詞嵌入的密集向量表示(大小為100/200/300)。

Then our word embedding vector is sent to the LSTM cell, where it is combined with the hidden state (hs), and the cell state (cs) of the previous time step and the encoder block outputs a new hs and a cs which is passed to the next LSTM cell. It is understood that the hs and cs captured some vector representation of the sentence so far.

然後我們的詞嵌入向量被髮送到LSTM單元，在這裡它與隱藏狀態(hs)組合，並且前一個時間步的單元狀態(cs)組合，編碼器塊輸出新的hs和cs到下一個LSTM單元。可以理解，到目前為止，hs和cs捕獲了該句子的某些矢量表示。

At time step-0, the hidden state and cell state are either initialized fully of zeros or random numbers.

在時間步-0，隱藏狀態和單元狀態被完全初始化為零或隨機數。

Then after we sent pass all our input German word sequence, a context vector [shown in yellow block] (hs, cs) is finally obtained, which is a dense representation of the word sequence and can be sent to the decoder’s first LSTM (hs, cs) for corresponding English translation.

然後，在我們傳送完所有輸入的德語單詞序列之後，最終獲得上下文向量[以黃色塊顯示](hs，cs)，該上下文向量是單詞序列的密集表示形式，可以傳送到解碼器的第一個LSTM(hs ，cs)進行相應的英語翻譯。

In the above figure, we use 2 layer LSTM architecture, where we connect the first LSTM to the second LSTM and we then we obtain 2 context vectors stacked on top as the final output. This is purely experimental, you can manipulate it.

在上圖中，我們使用2層LSTM體系結構，其中將第一個LSTM連線到第二個LSTM，然後獲得2個上下文向量，這些向量堆疊在頂部作為最終輸出。這純粹是實驗性的，您可以對其進行操作。

It is a must that we design identical encoder and decoder blocks in the seq2seq model.

我們必須在seq2seq模型中設計相同的編碼器和解碼器模組。

The above visualization is applicable for a single sentence from a batch.

以上視覺化適用於批處理中的單個句子。

Say we have a batch size of 5 (Experimental), then we pass 5 sentences with one word at a time to the Encoder, which looks like the below figure.

假設我們的批處理大小為5(實驗性)，然後一次將5個句子(每個單詞帶有一個單詞)傳遞給編碼器，如下圖所示。

5.編碼器程式碼實現(Seq2Seq)(5. Encoder Code Implementation (Seq2Seq))

6.解碼器模型架構(Seq2Seq) (6. Decoder Model Architecture (Seq2Seq))

The decoder also does a single step at a time.

解碼器一次也執行單個步驟。

The Context Vector from the Encoder block is provided as the hidden state (hs) and cell state (cs) for the decoder’s first LSTM block.

提供來自編碼器塊的上下文向量，作為解碼器的第一個LSTM塊的隱藏狀態(hs)和單元狀態(cs)。

The start of sentence “SOS” token is passed to the embedding NN, then passed to the first LSTM cell of the decoder, and finally, it is passed through a linear layer [Shown in Pink color], which provides an output English token prediction probabilities (4556 Probabilities) [4556 — as in the total vocabulary size of English language], hidden state (hs), Cell State (cs).

句子“ SOS”令牌的開頭被傳遞到嵌入的NN，然後傳遞到解碼器的第一個LSTM單元，最後，它經過一個線性層[以粉紅色顯示]，該層提供輸出的英語令牌預測概率( 4556概率)[4556 —如英語的總詞彙量一樣]，隱藏狀態(hs)，單元狀態(cs)。

The output word with the highest probability out of 4556 values is chosen, hidden state (hs), and Cell State (cs) is passed as the inputs to the next LSTM cell and this process is executed until it reaches the end of sentences “EOS”.

選擇4556個值中概率最高的輸出單詞，將隱藏狀態(hs)和單元狀態(cs)作為輸入傳遞到下一個LSTM單元，並執行此過程，直到到達句子“ EOS”的結尾” 。

The subsequent layers will use the hidden and cell state from the previous time steps.

後續層將使用先前時間步驟中的隱藏狀態和單元狀態。

示教力比： (Teach Force Ratio:)

In addition to other blocks, you will also see the block shown below in the Decoder of the Seq2Seq architecture.

除其他塊外，您還將在Seq2Seq架構的解碼器中看到以下所示的塊。

While model training, we send the inputs (German Sequence) and targets (English Sequence). After the context vector is obtained from the Encoder, we send them Vector and the target to the Decoder for translation.

在進行模型訓練時，我們傳送輸入(德語序列)和目標(英語序列)。從編碼器獲得上下文向量後，我們將它們和目標傳送給解碼器進行翻譯。

But during model Inference, the target is generated from the decoder based on the generalization of the training data. So the output predicted words are sent as the next input word to the decoder until a <EOS> token is obtained.

但是在模型推斷期間，目標是根據訓練資料的一般性從解碼器生成的。因此，將輸出的預測單詞作為下一個輸入單詞傳送到解碼器，直到獲得<EOS>令牌。

So in model training itself, we can use the teach force ratio (tfr), where we can actually control the flow of input words to the decoder.

因此，在模型訓練本身中，我們可以使用示教力比(tfr) ，在這裡我們可以實際控制輸入字到解碼器的流向。

We can send the actual target words to the decoder part while training (Shown in Green Color).
我們可以在訓練時將實際的目標詞傳送到解碼器部分(以綠色顯示)。
We can also send the predicted target word, as the input to the decoder (Shown in Red Color).
我們還可以傳送預測的目標詞，作為解碼器的輸入(以紅色顯示)。

Sending either of the word (actual target word or predicted target word) can be regulated with a probability of 50%, so at any time step, one of them is passed during the training.

傳送單詞(實際目標單詞或預測目標單詞)的可能性可以控制為50％，因此在任何時間步長，在訓練過程中都會通過其中一個。

This method acts like a Regularization. So that the model trains efficiently and fastly during the process.

此方法的作用類似於正則化。因此，在此過程中，模型可以快速有效地進行訓練。

The above visualization is applicable for a single sentence from a batch. Say we have a batch size of 4(Experimental), then we pass 4sentences at a time to the Encoder, which provides 4 sets of Context Vectors, and they all are passed into the Decoder, which looks like the below figure.

以上視覺化適用於批處理中的單個句子。假設我們的批處理大小為4(實驗性)，然後一次將4個句子傳遞給編碼器，該編碼器提供4組上下文向量，它們都被傳遞到解碼器中，如下圖所示。

7.解碼器程式碼實現(Seq2Seq)(7. Decoder Code Implementation (Seq2Seq))

8. Seq2Seq(編碼器+解碼器)介面 (8. Seq2Seq (Encoder + Decoder) Interface)

The final seq2seq implementation for a single input sentence looks like the figure below.

單個輸入語句的最終seq2seq實現如下圖所示。

Provide both input (German) and output (English) sentences.
提供輸入(德語)和輸出(英語)句子。
Pass the input sequence to the encoder and extract context vectors.
將輸入序列傳遞給編碼器並提取上下文向量。
Pass the output sequence to the Decoder, context vectors from the Encoder to produce the predicted output sequence.
將輸出序列傳遞給解碼器，以及來自編碼器的上下文向量，以生成預測的輸出序列。

The above visualization is applicable for a single sentence from a batch. Say we have a batch size of 4 (Experimental), then we pass 4 sentences at a time to the Encoder, which provide 4 sets of Context Vectors, and they all are passed into the Decoder, which looks like the below figure.

9. Seq2Seq(編碼器+解碼器)程式碼實現(9. Seq2Seq (Encoder + Decoder) Code Implementation)

10. Seq2Seq模型訓練(10. Seq2Seq Model Training)

例句訓練進度：(Training Progress for a sample sentence:)

訓練損失： (Training loss:)

11. Seq2Seq模型推論 (11. Seq2Seq Model Inference)

Now let us compare our trained model with that of SOTA Google Translate.

現在，讓我們將我們訓練有素的模型與SOTA Google Translate的模型進行比較。

Not bad, but clearly the model is not able to comprehend complex sentences. So in the upcoming series of posts, I will be enhancing the above model’s performance by altering the model’s architecture, like using Bi-directional LSTM, adding attention mechanism, or replacing LSTM with the Transformers model to overcome these apparent shortcomings.

不錯，但是很明顯，該模型不能理解複雜的句子。因此，在接下來的系列文章中，我將通過更改模型的體系結構來提高上述模型的效能，例如使用雙向LSTM，添加註意機制或用Transformers模型替換LSTM來克服這些明顯的缺點。

12.資源和參考 (12. Resources & References)

I hope I was able to provide some visual understanding of how the Seq2Seq model processes the data, let me know your thoughts in the comment section.

希望我能夠對Seq2Seq模型如何處理資料有一些直觀的瞭解，在評論部分告訴我您的想法。

Check out the Notebooks that contains the entire code implementation and feel free to break it.

簽出包含整個程式碼實現的筆記本，可以隨意破壞它。

Complete Code Implementation is available at,

完整的程式碼實施可在以下網址獲得：

@ GitHub

@ Colab

@ Kaggle

For those who are curious, visualizations in this article were made possible by Figma & Google Drawing.

對於那些好奇的人， Figma和Google Drawing使本文中的視覺化成為可能。

Complete Visualization files created on Figma (.fig) [LSTM, ENCODER+DECODER, SEQ2SEQ] is available @ Github.

在Github上可獲得在Figma(.fig) [LSTM，ENCODER + DECODER，SEQ2SEQ]上建立的完整視覺化檔案。

References : LSTM, WORD_EMBEDDING, DEEP_LEARNING_MODEL_DEPLOYMENT_ON_AWS

參考文獻： LSTM ， WORD_EMBEDDING ， DEEP_LEARNING_MODEL_DEPLOYMENT_ON_AWS

Until then, see you next time.

在那之前，下次見。

Article By:

文章作者：

BALAKRISHNAKUMAR V

Co-Founder — DeepScopy (An AI-Based Medical Imaging Startup)

聯合創始人— DeepScopy (基於AI的醫學成像初創公司)

Connect with me → LinkedIn, GitHub, Twitter, Medium

與我聯絡→ LinkedIn ， GitHub ， Twitter ，中

Visit us → DeepScopy

訪問我們→ DeepScopy

Connect with us → Twitter, LinkedIn, Medium

與我們聯絡→ Twitter ， LinkedIn ，中

翻譯自: https://towardsdatascience.com/a-comprehensive-guide-to-neural-machine-translation-using-seq2sequence-modelling-using-pytorch-41c9b84ba350

使用pytorch的seq2sequence建模進行神經機器翻譯的全面指南

目錄：(Table of Contents:)

1.簡介(1. Introduction)

2.資料準備和預處理 (2. Data Preparation and Pre-processing)

3.長期短期記憶(LSTM)(3. Long Short Term Memory (LSTM))

蓋茨： (Gates:)

4.編碼器模型架構(Seq2Seq) (4. Encoder Model Architecture (Seq2Seq))

5.編碼器程式碼實現(Seq2Seq)(5. Encoder Code Implementation (Seq2Seq))

6.解碼器模型架構(Seq2Seq) (6. Decoder Model Architecture (Seq2Seq))

示教力比： (Teach Force Ratio:)

7.解碼器程式碼實現(Seq2Seq)(7. Decoder Code Implementation (Seq2Seq))

8. Seq2Seq(編碼器+解碼器)介面 (8. Seq2Seq (Encoder + Decoder) Interface)

9. Seq2Seq(編碼器+解碼器)程式碼實現(9. Seq2Seq (Encoder + Decoder) Code Implementation)

10. Seq2Seq模型訓練(10. Seq2Seq Model Training)

例句訓練進度：(Training Progress for a sample sentence:)

訓練損失： (Training loss:)

11. Seq2Seq模型推論 (11. Seq2Seq Model Inference)

12.資源和參考 (12. Resources & References)

使用pytorch的seq2sequence建模進行神經機器翻譯的全面指南

神經機器翻譯的直觀解釋

PostgreSQL Shared Buffers 全面指南（譯）

用PyTorch對Leela Zero進行神經網路訓練

多語種神經機器翻譯

神經機器翻譯中有用的技巧

神經機器翻譯中的曝光偏差，幻覺翻譯與跨域穩定性

Flink編碼：FlinkSQL全面指南

《全面戰爭：羅馬》重製版與原版對比：建模貼圖UI全面加強

Spring Cloud Sleuth 和 Zipkin 進行分散式跟蹤使用指南

推特公司：已收到馬斯克的最新提案，將進行仔細、全面的審查

NLP教程(6) - 神經機器翻譯、seq2seq與注意力機制

Python函數語言程式設計指南:對生成器全面講解

Pytorch 搭建分類迴歸神經網路並用GPU進行加速的例子

Tensorflow暑期實踐——使用卷積神經網路對CIFAR-10資料集進行分類

最全面 think php 實現微信公眾號回覆編號進行投票，自定義選單功能

數學建模省賽小結：資料預處理（按照關鍵字提取行/列並進行簡單運算）

Python進行統計建模

孿生網路：使用雙頭神經網路進行元學習

如何使用卷積神經網路進行影象處理？

使用pytorch的seq2sequence建模進行神經機器翻譯的全面指南

目錄：(Table of Contents:)

1.簡介(1. Introduction)

2.資料準備和預處理 (2. Data Preparation and Pre-processing)

3.長期短期記憶(LSTM)(3. Long Short Term Memory (LSTM))

蓋茨： (Gates:)

4.編碼器模型架構(Seq2Seq) (4. Encoder Model Architecture (Seq2Seq))

5.編碼器程式碼實現(Seq2Seq)(5. Encoder Code Implementation (Seq2Seq))

6.解碼器模型架構(Seq2Seq) (6. Decoder Model Architecture (Seq2Seq))

示教力比： (Teach Force Ratio:)

7.解碼器程式碼實現(Seq2Seq)(7. Decoder Code Implementation (Seq2Seq))

8. Seq2Seq(編碼器+解碼器)介面 (8. Seq2Seq (Encoder + Decoder) Interface)

9. Seq2Seq(編碼器+解碼器)程式碼實現(9. Seq2Seq (Encoder + Decoder) Code Implementation)

10. Seq2Seq模型訓練(10. Seq2Seq Model Training)

例句訓練進度：(Training Progress for a sample sentence:)

訓練損失： (Training loss:)

11. Seq2Seq模型推論 (11. Seq2Seq Model Inference)

12.資源和參考 (12. Resources & References)

相關推薦