TensorFlow中RNN樣例程式碼詳解

阿新 • • 發佈：2018-11-05

　　關於RNN的理論部分已經在上一篇文章中講過了，本文主要講解RNN在TensorFlow中的實現。與theano不同，TensorFlow在一個更加抽象的層次上實現了RNN單元，所以呼叫tensorflow的API來實現RNN是比較容易的。這裡先介紹TensorFlow中與RNN相關的幾個比較常用的函式，

　　(1)cell = tf.nn.rnn_cell.BasicLSTMCell(num_units, forget_bias, input_size, state_is_tuple, activation)
　　　　 num_units: int, The number of units in the LSTM cell（就是指cell中隱藏層神經元的個數）；
　　　　 forget_bias: float, The bias added to forget gates (新增到“forget gates”的偏置，這裡的“forget gates”指lstm網路中的component)；
　　　　 input_size: Deprecated and unused（這個引數以後會被廢棄掉，就不用考慮了）；
　　　　 state_is_tuple: 為真表示，狀態值是(c_state, m_state)構成的元組，比如每一個time step有K層，那麼state結構為((c0, m0), (c1, m1), …, (ck, mk))；
　　　　 activation: cell中的激勵函式；
　　　注：這個函式用於生成RNN網路的最基本的組成單元，這個類物件中還有一個比較重要的method，call

(self, inputs, state, scope=None)，它確定
　　　　　了在forward propagation過程中，呼叫BasicLSTMCell物件時的輸入輸出引數。

　　(2) cell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=True)
　　　　 cells: list of RNNCells that will be composed in this order（根據cells列表中的LSTMCell生成MultiRNNCell的基本組成單元，這裡的MultiRNNCell是
　　　　　指每一時刻的輸出由多層LSTMCell級聯而成。顯然，列表中的每個LSTMCell可以含有不同的權重引數）；
　　　　 state_is_tuple: 同上；

　　(3) state = tf.nn.rnn_cell.MultiRNNCell.zero_state(batch_size, dtype)
　　　　 batch_size: 訓練塊的大小；
　　　　 dtype: 指定待返回的state變數的資料型別；
　　　注：這個函式用於返回全0的state tensor。state tensor的尺寸與層數、hidden units num、batch size有關係，前面兩個在定義cell物件時已經指定過
　　　　　了，故這裡要指定batch_size引數。

　　在Github上有RNN的TensorFlow官方原始碼，主要包括了兩個檔案，一個是reader.py，另外一個是ptb_word_lm.py。本篇就先來學習一下大牛們提供的原始碼，因為程式碼比較長，這裡主要對理解上可能有困難的地方進行解析，希望能對大家有所幫助。

reader.py檔案中的子函式

　　在NLP領域中，自然語言模型是比較經典的應用，在訓練RNN模型前，需要把輸入資料檔案進行預處理，即先設定詞庫大小vocabulary_size，再根據訓練庫中單詞出現的頻數，找到出現次數最多的前vocabulary_size個單詞，並把他們對映到0，、、、，vocabulary_size-1，而其他出現頻數較少的單詞，均設定成“unknown”，索引設定為vocabulary_size。通常情況下，訓練資料包含了很多段語句，每段語句的長度可以不一樣（用列表和array物件儲存矩陣資料時，矩陣中元素的長度可以不一致，所以語料庫的儲存不存在問題）。當模型訓練過程結束時，所學到的模型引數，就是使得訓練庫中所有的sentence出現概率都非常大時的引數解。值得一提的是，TF僅支援定長輸入的RNN（theano中的scan函式支援不定長輸入的RNN，但在實際應用中，通常都是提前給inputs加個padding改成定長的訓練語料庫，因為這樣做會使訓練速度更快）。

def ptb_raw_data(data_path=None):
  train_path = os.path.join(data_path, "ptb.train.txt") #定義檔案路徑
  valid_path = os.path.join(data_path, "ptb.valid.txt")
  test_path = os.path.join(data_path, "ptb.test.txt")
    #_build_vocab函式對字典物件，先按value(頻數)降序，頻數相同的單詞再按key(單詞)升序。函式返回的是字典物件， 
    # 函式返回的是字典物件，key為單詞，value為對應的唯一的編號
  word_to_id = _build_vocab(train_path)
    # _file_to_word_ids函式，用於把檔案中的內容轉換為索引列表。在轉換過程中，若檔案中的某個單詞不在word_to_id查詢字典中，
    # 則不進行轉換。返回list物件，list中的每一個元素均為int型資料，代表單詞編號
  train_data = _file_to_word_ids(train_path, word_to_id)
  valid_data = _file_to_word_ids(valid_path, word_to_id)
  test_data = _file_to_word_ids(test_path, word_to_id)
  vocabulary = len(word_to_id) #vocabulary size，對於PTB資料集，大小為10k
  return train_data, valid_data, test_data, vocabulary

def ptb_producer(raw_data, batch_size, num_steps, name=None):
    # raw_data: one of the raw data outputs from ptb_raw_data.
  with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]): #定義context manager
    raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)
    data_len = tf.size(raw_data) 
    # 這裡的batch_size指某一時刻輸入單詞的個數。因為程式在執行時要利用GPU的平行計算能力提高效率，所以程式設定了這個引數
    batch_len = data_len // batch_size
    # 這裡的data指所有訓練樣例
    data = tf.reshape(raw_data[0 : batch_size * batch_len],
                      [batch_size, batch_len])
     # TF僅支援定長輸入，這裡設定RNN網路的序列長度為num_steps
    epoch_size = (batch_len - 1) // num_steps  #在訓練過程中，一個週期所含的mini-batchs數量，也即週期內迭代次數
    assertion = tf.assert_positive(
        epoch_size,
        message="epoch_size == 0, decrease batch_size or num_steps")
    # tf.control_dependencies函式，用於先執行assertion操作，再執行當前context中的命令
    with tf.control_dependencies([assertion]):
      epoch_size = tf.identity(epoch_size, name="epoch_size")
     # tf.train.range_input_producer函式返回queue物件，裡面存放的是int型資料0,..., epoch_size-1。好處是把資料輸入部分“隱藏”起來了，
    # 在訓練模型的時候，只用考慮模型部分，而不需要關注如何讀取訓練資料。關於tensorflow的
    # queue runner資料輸入機制，在前面的部落格中做了介紹
    i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
    # 這裡要注意，語言模型的輸出序列是輸入序列延遲1個時間戳的結果
    x = tf.slice(data, [0, i * num_steps], [batch_size, num_steps])
    y = tf.slice(data, [0, i * num_steps + 1], [batch_size, num_steps])
    return x, y

ptb_word_lm.py檔案中的子函式

ptb_word_lm.py檔案中的函式比較容易理解，在看的時候主要有兩個地方，需要注意一下。
（1）PTBModel類物件的init（）函式中，有下面兩小段程式碼，這裡分別做一下說明。
　　程式碼段1中，在定義embedding變數時，繼承了variable_scope中的initializer，即embedding中為均勻分佈的隨機初始化數。tf.nn.embedding_lookup函式用於把N維的input_data轉換為N+1維的tensor物件inputs，增加的一個維度用於把單詞索引對映為embedding中的向量。
　　程式碼段2中，就是資料forward popagation的實現部分。在每傳播一個time step時，就會更新狀態引數state，並儲存當前時刻的輸出。所以最終會得到所有時刻的輸出和終點時刻的state。

# code block 1 begin....
with tf.device("/cpu:0"):
      embedding = tf.get_variable("embedding", [vocab_size, size], dtype=data_type())
      inputs = tf.nn.embedding_lookup(embedding, input_.input_data)
# code block 1 end....        

# code block 2 begin....
  outputs = []
    state = self._initial_state
    with tf.variable_scope("RNN"):
      for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
# code block 2 end....

（2）run_epoch函式中，有如下這段語句，其中model.initial_state為Tuple物件，其中的每一個元素為LSTMStateTuple(c=(tf.Tensor ‘zeros_14:0’ shape=() dtype=float32), h=(tf.Tensor ‘zeros_15:0’ shape=() dtype=float32))。而state的結構類似，不同之處在於LSTMStateTuple中的c和h為具體的array陣列。所以構造feed_dict字典物件，可以用來在每個epoch的訓練結束時，更新RNN的state陣列值。可能大家會想，state的值不是在訓練模型的過程中，自動更新的嗎？可以這麼理解，tensorflow在模型定義階段生成Tensor Graph,然後在訓練階段就按照Graph中的資訊流向執行，所以如果不更新初始狀態model.initial_state的話，在每次訓練過程中，TensorFlow會向上找到self._initial_state = cell.zero_state(batch_size, data_type())語句，將其視為全0矩陣，這樣就不能保證此次訓練迭代過程中，state變數數值的連續性了。

def run_epoch(session, model, eval_op=None, verbose=False):
  """Runs the model on the given data."""
  start_time = time.time()
  costs = 0.0
  iters = 0
  state = session.run(model.initial_state)

  fetches = {
      "cost": model.cost,
      "final_state": model.final_state,
  }
  if eval_op is not None:
    fetches["eval_op"] = eval_op

  for step in range(model.input.epoch_size):
    feed_dict = {}
    for i, (c, h) in enumerate(model.initial_state):
      feed_dict[c] = state[i].c
      feed_dict[h] = state[i].h

    vals = session.run(fetches, feed_dict)
    cost = vals["cost"]
    state = vals["final_state"]

    costs += cost
    iters += model.input.num_steps

    if verbose and step % (model.input.epoch_size // 10) == 10:
      print("%.3f perplexity: %.3f speed: %.0f wps" %
            (step * 1.0 / model.input.epoch_size, np.exp(costs / iters),
             iters * model.input.batch_size / (time.time() - start_time)))

  return np.exp(costs / iters)

參考資料：https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb
　　　　　https://www.tensorflow.org/versions/r0.11/tutorials/recurrent/index.html

TensorFlow中RNN樣例程式碼詳解

reader.py檔案中的子函式

ptb_word_lm.py檔案中的子函式

TensorFlow中RNN樣例程式碼詳解

TensorFlow中Sequence-to-Sequence樣例程式碼詳解

TensorFlow中cnn-cifar10樣例程式碼詳解

tensorflow中的tf.train.batch詳解

tensorflow-deeplab-resnet 原理及程式碼詳解

tensorflow中多層lstm專案程式碼詳解（進度:1/4）

TensorFlow函式之tf.nn.conv2d()（附程式碼詳解）

OpenCV中copyTo()函式及Mask詳解（附程式碼詳解）

設計模式（建立型）：Java常用23種設計模式之單例模式詳解以及Java程式碼實現

UML---用例圖中的Include和Extend詳解

Tensorflow官網CIFAR-10資料分類教程程式碼詳解

TensorFlow 製作自己的TFRecord資料集讀取、顯示及程式碼詳解

【TensorFlow】TFRecord資料集的製作：讀取、顯示及程式碼詳解

Nginx學習之路（八）Nginx中的事件驅動過程詳解-----以listenfd註冊過程為例

vue-element-admin 程式碼詳解待更新中... ....

Mapper中map方法下context.write的流程與程式碼詳解

實現遷徙學習－《Tensorflow 實戰Google深度學習框架》程式碼詳解

Java中的String，javap&cfr.jar反編譯，javap反編譯後二進位制指令程式碼詳解，Java8常量池的位置

關於JAVA中RSA加簽解籤，私鑰加密公鑰解密和公鑰加密私鑰解密程式碼詳解

java中工廠模式和單例模式詳解

TensorFlow中RNN樣例程式碼詳解

reader.py檔案中的子函式

ptb_word_lm.py檔案中的子函式

相關推薦