tensorflow學習筆記（十一）：seq2seq Model相關介面介紹

阿新 • • 發佈：2019-01-06

呼叫外部的函式介紹

tf.sampled_softmax_loss()

tf.sampled_softmax_loss()中呼叫了_compute_sampled_logits() 關於__compute_sampled_logits()

#此函式和nce_loss是差不多的, 取樣求loss
def sampled_softmax_loss(weights, #[num_classes, dim]
                         biases,  #[num_classes]
                         inputs,  #[batch_size, dim]
                         labels,  #[batch_size, num_true]
                         num_sampled,
                         num_classes,
                         num_true=1 
,
                         sampled_values=None,
                         remove_accidental_hits=True,
                         partition_strategy="mod",
                         name="sampled_softmax_loss"):
#return: [batch_size]

關於引數labels:一般情況下，num_true為1, labels的shpae為[batch_size, 1]。假設我們有1000個類別，使用one_hot形式的label的話，我們的labels的shape是[batch_size, num_classes]。顯然，如果num_classes非常大的話，會影響計算效能。所以，這裡採用了一個稀疏的方式,即：使用3代表了[0,0,0,1,0….]

tf.nn.seq2seq.embedding_attention_seq2seq（）

建立了input embedding matrix 和 output embedding matrix

def embedding_attention_seq2seq(encoder_inputs, #[T， batch_size]
                                decoder_inputs, #[out_T， batch_size]
                                cell,
                                num_encoder_symbols,
                                num_decoder_symbols,
                                embedding_size,
                                num_heads=1 
, #只採用一個read head
                                output_projection=None,
                                feed_previous=False,
                                dtype=None,
                                scope=None,
                                initial_state_attention=False):
#output_projection: (W, B) W:[output_size, num_decoder_symbols]
#B: [num_decoder_symbols]

(1)這個函式建立了一個inputs 的 embedding matrix.
(2)計算了encoder的 output，並儲存起來，用於計算attention

encoder_cell = rnn_cell.EmbeddingWrapper(
      cell, embedding_classes=num_encoder_symbols,
      embedding_size=embedding_size)# 建立了inputs的 embedding matrix
encoder_outputs, encoder_state = rnn.rnn(
      encoder_cell, encoder_inputs, dtype=dtype) #return [T ，batch_size，size]

（3）生成attention states

  top_states = [array_ops.reshape(e, [-1, 1, cell.output_size])
                for e in encoder_outputs]  # T * batch_size * 1 * size
  attention_states = array_ops.concat(1, top_states) # batch_size*T*size

（4）剩下的工作交給embedding_attention_decoder,embedding_attention_decoder中建立了decoder的embedding matrix

# Decoder.
  output_size = None
  if output_projection is None:
    cell = rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)
    output_size = num_decoder_symbols

  if isinstance(feed_previous, bool):
    return embedding_attention_decoder(
        decoder_inputs,
        encoder_state,
        attention_states,
        cell,
        num_decoder_symbols,
        embedding_size,
        num_heads=num_heads,
        output_size=output_size,
        output_projection=output_projection,
        feed_previous=feed_previous,
        initial_state_attention=initial_state_attention)

tf.nn.rnn_cell.EmbeddingWrapper()

embedding_attention_seq2seq中呼叫了這個類
使用了這個類之後，rnn 的inputs就可以是[batch_size]了，裡面儲存的是word的id。
此類就是在 cell 前加了一層embedding

class EmbeddingWrapper(RNNCell):
  def __init__(self, cell, embedding_classes, embedding_size, initializer=None):
  def __call__(self, inputs, state, scope=None):#生成embedding矩陣[embedding_classes,embedding_size]
  #inputs: [batch_size, 1]
  #return : (output, state)

tf.nn.rnn_cell.OutputProgectionWrapper()

將rnn_cell的輸出對映成想要的維度

class OutputProjectionWrapper(RNNCell):
  def __init__(self, cell, output_size): # output_size:對映後的size
  def __call__(self, inputs, state, scope=None):
#init 返回一個帶output projection的 rnn_cell

tf.nn.seq2seq.embedding_attention_decoder()

#生成embedding matrix ：[num_symbols, embedding_size]
def embedding_attention_decoder(decoder_inputs, # T*batch_size
                                initial_state,
                                attention_states,
                                cell,
                                num_symbols,
                                embedding_size,
                                num_heads=1,
                                output_size=None,
                                output_projection=None,
                                feed_previous=False,
                                update_embedding_for_previous=True,
                                dtype=None,
                                scope=None,
                                initial_state_attention=False):
#核心程式碼
  embedding = variable_scope.get_variable("embedding",
                                          [num_symbols, embedding_size])  #output embedding
  loop_function = _extract_argmax_and_embed(
      embedding, output_projection,
      update_embedding_for_previous) if feed_previous else None
  emb_inp = [
      embedding_ops.embedding_lookup(embedding, i) for i in decoder_inputs]
  return attention_decoder(
      emb_inp,
      initial_state,
      attention_states,
      cell,
      output_size=output_size,
      num_heads=num_heads,
      loop_function=loop_function,
      initial_state_attention=initial_state_attention)

可以看到，此函式先為 decoder symbols 建立了一個embedding矩陣。然後定義了loop_function。
emb_in是embedded input ：[T, batch_size, embedding_size]
函式的主要工作還是交給了attention_decoder()

tf.nn.attention_decoder()

def attention_decoder(decoder_inputs, #[T, batch_size, input_size]
                      initial_state,  #[batch_size, cell.states]
                      attention_states, #[batch_size , attn_length , attn_size]
                      cell,
                      output_size=None,
                      num_heads=1,
                      loop_function=None,
                      dtype=None,
                      scope=None,
                      initial_state_attention=False):

論文中，在計算attention distribution的時候，提到了三個公式

(1) u_{i}^{t} = v^{T} * t a n h (W_{1} * h_{i} + W_{2} * d_{t})

(2) s_{i}^{t} = s o f t m a x (a_{i}^{t})

(3) d^{'} = \sum_{i = 1}^{T_{A}} s_{i}^{t} * h_{i}

其中，

W_{1}

維度是[attn_vec_size, size],

h_{i}

:[size,1]，我們日常表示輸入資料，都是用列向量表示，但是在tensorflow中，趨向用行向量表示。在這個函式中，為了計算

W_{1} * h_{i}

，使用了卷積的方式。

hidden = array_ops.reshape(
      attention_states, [-1, attn_length, 1, attn_size]) #[batch_size * T * 1 * input_size]
  hidden_features = []
  v = []
  attention_vec_size = attn_size  # Size of query vectors for attention.
  for a in xrange(num_heads):
    k = variable_scope.get_variable("AttnW_%d" % a,
                                    [1, 1, attn_size, attention_vec_size])
    hidden_features.append(nn_ops.conv2d(hidden, k, [1, 1, 1, 1], "SAME"))
    v.append(
        variable_scope.get_variable("AttnV_%d" % a, [attention_vec_size])) #attention_vec_size = attn_size

使用conv2d之後，返回的tensor的形狀是[batch_size, attn_length, 1, attention_vec_size]
此函式是這麼求 $W_{2} * d_{t}$ 和 $s_{i}$ 的。

     y = linear(query, attention_vec_size, True)
     y = array_ops.reshape(y, [-1, 1, 1, attention_vec_size])
     # Attention mask is a softmax of v^T * tanh(...).
     s = math_ops.reduce_sum(
         v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3]) #[batch_size, attn_length, 1, attn_size]
     a = nn_ops.softmax(s) #s" [batch_size * attn_len]
     # Now calculate the attention-weighted vector d.
     d = math_ops.reduce_sum(
         array_ops.reshape(a, [-1, attn_length, 1, 1]) * hidden,
         [1, 2])
     ds.append(array_ops.reshape(d, [-1, attn_size]))

$y = W_{2} * d_{t}, d = d^{'}$

def rnn()

from tensorflow.python.ops import rnn
rnn.rnn()
def rnn(cell, inputs, initial_state=None, dtype=None,
        sequence_length=None, scope=None):
#inputs: A length T list of inputs, each a `Tensor` of shape`[batch_size, input_size]`
#sequence_length: [batch_size], 指定sample 序列的長度
#return : (outputs, states), outputs: T*batch_size*output_size. states:batch_size*state

seq2seqModel

建立對映引數 proj_w, proj_b
宣告：sampled_loss，看了word2vec的就會理解
宣告：seq2seq_f()，構建了inputs的embedding和outputs的embedding，進行核心計算
使用model_with_buckets(),model_with_buckets中呼叫了seq2seq_f和 sampled_loss

model_with_buckets()

呼叫方法 tf.nn.seq2seq.model_with_buckets()

def model_with_buckets(encoder_inputs, decoder_inputs, targets, weights,
                       buckets, seq2seq, softmax_loss_function=None,
                       per_example_loss=False, name=None):
"""Create a sequence-to-sequence model with support for bucketing.
The seq2seq argument is a function that defines a sequence-to-sequence model,
e.g., seq2seq = lambda x, y: basic_rnn_seq2seq(x, y, rnn_cell.GRUCell(24))
Args:
encoder_inputs: A list of Tensors to feed the encoder; first seq2seq input.
decoder_inputs: A list of Tensors to feed the decoder; second seq2seq input.
targets: A list of 1D batch-sized int32 Tensors (desired output sequence).
weights: List of 1D batch-sized float-Tensors to weight the targets.
buckets: A list of pairs of (input size, output size) for each bucket.
seq2seq: A sequence-to-sequence model function; it takes 2 input that agree with encoder_inputs and decoder_inputs, and returns a pair consisting of outputs and states (as, e.g., basic_rnn_seq2seq).
softmax_loss_function: Function (inputs-batch, labels-batch) -> loss-batch
to be used instead of the standard softmax (the default if this is None).
per_example_loss: Boolean. If set, the returned loss will be a batch-sized
tensor of losses for each sequence in the batch. If unset, it will be
a scalar with the averaged loss from all examples.
name: Optional name for this operation, defaults to "model_with_buckets".
Returns:
A tuple of the form (outputs, losses), where:
outputs: The outputs for each bucket. Its j'th element consists of a list
of 2D Tensors. The shape of output tensors can be either
[batch_size x output_size] or [batch_size x num_decoder_symbols]
depending on the seq2seq model used.
losses: List of scalar Tensors, representing losses for each bucket, or,
if per_example_loss is set, a list of 1D batch-sized float Tensors.
Raises:
ValueError: If length of encoder_inputsut, targets, or weights is smaller
than the largest (last) bucket.
"""

記住，tensorflow的編碼方法是：先構圖，再訓練。訓練是根據feed確定的

tensorflow學習筆記（十一）：seq2seq Model相關介面介紹

呼叫外部的函式介紹

tf.sampled_softmax_loss()

tf.nn.seq2seq.embedding_attention_seq2seq（）

tf.nn.rnn_cell.EmbeddingWrapper()

tf.nn.rnn_cell.OutputProgectionWrapper()

tf.nn.seq2seq.embedding_attention_decoder()

tf.nn.attention_decoder()

def rnn()

seq2seqModel

model_with_buckets()

tensorflow學習筆記（十一）：seq2seq Model相關介面介紹

機器學習筆記（十一）： TensorFlow實戰三（MNIST數字識別問題）

EF學習筆記（十一）：實施繼承

R語言學習筆記（十一）：廣義線性模型

hadoop學習筆記（十一）：MapReduce數據類型

javaweb學習筆記（十一）：JSP（1）

學習筆記（十一）：使用K-Means演算法檢測DGA域名

Python學習筆記（十一）：Python函式

各種音視訊編解碼學習詳解之編解碼學習筆記（十一）：Flash Video系列

pytorch學習筆記（十一）：fine-tune 預訓練的模型

Java for Web學習筆記（十一）：JSP（1）何為JSP

cortex_m3_stm32嵌入式學習筆記（十一）：TFTLCD顯示（初涉顯示器）

Unity3D學習筆記（十一）：布料和協程

Java高並發程序設計學習筆記（十一）：Jetty分析

機器學習筆記（十九）：TensorFlow實戰十一（多執行緒輸入資料）

機器學習筆記（十五）：TensorFlow實戰七（經典卷積神經網路：VGG）

機器學習筆記（十四）：TensorFlow實戰六（經典卷積神經網路：AlexNet ）

機器學習筆記（十二）：TensorFlow實戰四（影象識別與卷積神經網路）

機器學習筆記（十八）：TensorFlow實戰十（影象資料處理）

機器學習筆記（十七）：TensorFlow實戰九（經典卷積神經網路：ResNet）

tensorflow學習筆記（十一）：seq2seq Model相關介面介紹

呼叫外部的函式介紹

tf.sampled_softmax_loss()

tf.nn.seq2seq.embedding_attention_seq2seq（）

tf.nn.rnn_cell.EmbeddingWrapper()

tf.nn.rnn_cell.OutputProgectionWrapper()

tf.nn.seq2seq.embedding_attention_decoder()

tf.nn.attention_decoder()

def rnn()

seq2seqModel

model_with_buckets()

相關推薦