1. 程式人生 > 其它 >Attension Is All You Need

Attension Is All You Need

attention機制將整個句子作為輸入,從中抽取有用的資訊。

每個輸出都跟整個句子優化,輸出的值為輸入的句子的詞向量的一個加權求和值。

“This is what attention does, it extracts information from the whole sequence, aweighted sum of all the past encoder states”

https://towardsdatascience.com/attention-is-all-you-need-discovering-the-transformer-paper-73e5ff5e0634

https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

self-attention:

Self-attention is a sequence-to-sequence operation: a sequence of vectors goes in, and a sequence of vectors comes out. Let’s call the input vectorsx1,x2,…,xtand the corresponding output vectorsy1,y2,…,yt. The vectors all have dimension k.To produce output vectoryi, the self attention operation simply takesa weighted average over all the input vectors,

the simplest option is the dot product.

Q, K, V:

Every input vector is used in three different ways in the self-attention mechanism: the Query, the Key and the Value. In every role, it is compared to the other vectors to get its own outputyi(Query), to get the j-th outputyj(Key) and to compute each output vector once the weights have been established (Value).