Keras實現注意力機制

阿新 • • 發佈：2018-11-29

Keras實現注意力機制

這裡主要記錄幾種Keras的注意力機制的實現，僅作為個人記錄。

python 3
keras 2.1.0 (tensorflow backend)

寫法1

這種寫法比較簡單，參考自這裡。相似度函式採用的是一層全連線層。全連線層的輸出經過softmax啟用函式計算權重。他對隱層向量的每一維在每個時間步上進行了softmax操作，這裡函式的返回值是三維的，也就是說這裡只是乘上了權重，但並沒有求和。

def attention_3d_block(inputs, single_attention_vector=False) 
:
    # 如果上一層是LSTM，需要return_sequences=True
    # inputs.shape = (batch_size, time_steps, input_dim)
    time_steps = K.int_shape(inputs)[1]
    input_dim = K.int_shape(inputs)[2]
    a = Permute((2, 1))(inputs)
    a = Dense(time_steps, activation='softmax')(a)
    if single_attention_vector:
        a = Lambda(lambda 
 x: K.mean(x, axis=1))(a)
        a = RepeatVector(input_dim)(a)

    a_probs = Permute((2, 1))(a)
    # 乘上了attention權重，但是並沒有求和，好像影響不大
    # 如果分類任務，進行Flatten展開就可以了
    # element-wise
    output_attention_mul = Multiply()([inputs, a_probs]) 
    return output_attention_mul

注意：這裡直接對經過線性變換後的所有向量進行了softmax，並沒有乘上context vector後再做，好像在分類例子裡影響不大。這裡time_steps有時候不太容易取，需要事先手動指定，而且有時候可能需要可變的time_steps，這就可能有問題。

寫法2

這裡我仿照上面的寫法，自定義了一個Keras層。不同的是，這裡在softmax之前還加了tanh啟用函式，而且將輸出進行了求和，所以輸出是二維的。借用這篇文章中的圖，應該大致描述了這個意思。
注意力機制

class AttentionLayer(Layer):
    def __init__(self, **kwargs):
        super(AttentionLayer, self).__init__(** kwargs)

    def build(self, input_shape):
        assert len(input_shape)==3
        # W.shape = (time_steps, time_steps)
        self.W = self.add_weight(name='att_weight', 
                                 shape=(input_shape[1], input_shape[1]),
                                 initializer='uniform',
                                 trainable=True)
        self.b = self.add_weight(name='att_bias', 
                                 shape=(input_shape[1],),
                                 initializer='uniform',
                                 trainable=True)
        super(AttentionLayer, self).build(input_shape)

    def call(self, inputs):
        # inputs.shape = (batch_size, time_steps, seq_len)
        x = K.permute_dimensions(inputs, (0, 2, 1))
        # x.shape = (batch_size, seq_len, time_steps)
        a = K.softmax(K.tanh(K.dot(x, self.W) + self.b))
        outputs = K.permute_dimensions(a * x, (0, 2, 1))
        outputs = K.sum(outputs, axis=1)
        return outputs

    def compute_output_shape(self, input_shape):
        return input_shape[0], input_shape[2]

Hierarchical Attention Networks

這篇論文提出了一種層級的注意力機制，在詞彙級別和句子級別應用注意力機制對文件進行分類，得到了不錯的效果。這裡寫圖片描述
這篇部落格好像做了實現，使用的資料集來自這裡。我也按照他的程式碼進行了實驗，並且用到了上文提到的自定義的Attention層。程式碼在這裡，簡單試了一下，準確率為90.52%。

sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(GRU(100, return_sequences=True))(embedded_sequences)
l_dense = TimeDistributed(Dense(200))(l_lstm)  # 對句子中的每個詞
l_att = AttentionLayer()(l_dense)
sentEncoder = Model(sentence_input, l_att)

review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32')
review_encoder = TimeDistributed(sentEncoder)(review_input)  # 對文件中每個句子
l_lstm_sent = Bidirectional(GRU(100, return_sequences=True))(review_encoder)
l_dense_sent = TimeDistributed(Dense(200))(l_lstm_sent)
l_att_sent = AttentionLayer()(l_dense_sent)
preds = Dense(2, activation='softmax')(l_att_sent)
model = Model(review_input, preds)

Keras實現注意力機制

Keras實現注意力機制

寫法1

寫法2

Hierarchical Attention Networks

Keras實現注意力機制

用卷積神經網路和自注意力機制實現QANet（問答網路）

注意力機制的基本思想和實現原理（很詳細）(第二篇)

Keras之注意力模型實現

PHP實現事件機制實例分析

Redis Sentinel實現的機制與原理詳解

Flask中的session ,自定義實現 session機制, 和 flask-session組件

keras實現多個模型融合（非keras自帶模型，這裡以3個自己的模型為例）

簡明條件隨機場CRF介紹（附帶純Keras實現）

Keras實現VGG16

keras實現VGG 13

使用keras實現深度殘差網路

Keras實現GoogleNet

Keras 實現AlexNet

使用Keras實現機器翻譯（英語—>法語）

DeepLearning.ai筆記:(5-3) -- 序列模型和注意力機制

幾篇不錯的注意力機制文獻

常見的兩種注意力機制

為啥抖音，淘寶總喜歡把你喜歡的呈現給你？Python來實現這個機制

自然語言處理中的自注意力機制（Self-attention Mechanism）

Keras實現注意力機制

Keras實現注意力機制

寫法1

寫法2

Hierarchical Attention Networks

相關推薦