日常填坑之TF模型載入“Key Variable_xxx not found in checkpoint”
阿新 • • 發佈:2018-11-08
儲存模型的時候一切正常,但是載入的時候就會出現“Key Variable_xxx not found in checkpoint”錯誤。首先要分析錯誤原因,一般情況下model.ckpt檔案肯定都有的,都是載入的時候出的問題。所以先把ckpt檔案中的變數打印出來看看。這裡有個前提條件,定義變數的時候需要指定name引數,不然打印出來的都是“Variable_xxx:0”之類的!
import os
from tensorflow.python import pywrap_tensorflow
current_path = os.getcwd()
model_dir = os.path.join(current_path, 'model' )
checkpoint_path = os.path.join(model_dir,'embedding.ckpt-0') # 儲存的ckpt檔名,不一定是這個
# Read data from checkpoint file
reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path)
var_to_shape_map = reader.get_variable_to_shape_map()
# Print tensor name and values
for key in var_to_shape_map:
print("tensor_name: " , key)
# print(reader.get_tensor(key)) # 列印變數的值,對我們查詢問題沒啥影響,打印出來反而影響找問題
我的輸出:
tensor_name: w_1_1/Adam_1
tensor_name: w_2/Adam_1
tensor_name: b_2
tensor_name: w_1_1
tensor_name: w_out/Adam_1
tensor_name: b_1_1/Adam_1
tensor_name: w_out
tensor_name: w_1
tensor_name: b_out
tensor_name: b_2/Adam
tensor_name: b_1
tensor_name: b_out/Adam_1
tensor_name: b_1_1/Adam
tensor_name: w_1_1/Adam
tensor_name: b_1_1
tensor_name: w_2/Adam
tensor_name: w_2
tensor_name: w_out/Adam
tensor_name: beta1_power
tensor_name: b_out/Adam
tensor_name: b_2/Adam_1
tensor_name: beta2_power
這就很明顯了,我的網路裡只有”b_1,b_2,w_1,w_2”這種變數,由於使用了tf.train.AdamOptimizer()來更新梯度,所以在儲存檢查點的時候如果不指定則是全域性儲存,把優化的變數“w_out/Adam”這種命名規則的變數也一併儲存了,自然在恢復的時候就會出現找不到XX變數。解決辦法,在宣告 saver = tf.train.Saver()的時候帶上引數,即需要儲存的變數
def ann_net(w_alpha=0.01, b_alpha=0.1):
# 隱藏層_1
w_1 = tf.Variable(w_alpha * tf.random_normal(shape=(input_size, hidden1_size)), name='w_1')
b_1 = tf.Variable(b_alpha * tf.random_normal(shape=[hidden1_size]),name='b_1')
hidden1_output = tf.nn.tanh(tf.add(tf.matmul(X, w_1), b_1))
hidden1_output = tf.nn.dropout(hidden1_output, keep_prob)
# 隱藏層_2
shp1 = hidden1_output.get_shape()
w_2 = tf.Variable(w_alpha * tf.random_normal(shape=(shp1[1].value, hidden2_size)), name='w_2')
b_2 = tf.Variable(b_alpha * tf.random_normal(shape=[hidden2_size]),name='b_2')
hidden2_output = tf.nn.tanh(tf.add(tf.matmul(hidden1_output, w_2), b_2))
hidden2_output = tf.nn.dropout(hidden2_output, keep_prob)
# 輸出層
shp2 = hidden2_output.get_shape()
w_output = tf.Variable(w_alpha * tf.random_normal(shape=(shp2[1].value, embeding_size)), name='w_out')
b_output = tf.Variable(b_alpha * tf.random_normal(shape=[embeding_size]),name='b_out')
output = tf.add(tf.matmul(hidden2_output, w_output), b_output)
variables_dict = {'b_2': b_2, 'w_out': w_output, 'w_1': w_1, 'b_out': b_output, 'b_1': b_1, 'w_2': w_2}
return output,variables_dict
在train()函式裡,使用variables_dict初始化saver
with tf.device('/cpu:0'):
saver = tf.train.Saver(var_dict)
with tf.Session(config=tf.ConfigProto(device_count={'cpu': 0})) as sess:
sess.run(tf.global_variables_initializer())
step = 0
ckpt = tf.train.get_checkpoint_state('model/')
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
step = int(ckpt.model_checkpoint_path.rsplit('-',1)[1])
print("Model restored.")
# 訓練程式碼
# ... ...
saver.save(sess, 'model/embedding.model', global_step=step)
如果是從網上down的模型比如vgg-16之類的,只想載入前面的幾層,而且用自己定義的變數,方法一樣,指定一個變數列表或者字典,傳給tf.train.Saver()。
如果是LSTM,道理也一樣,不過系統儲存的時候有tf自己的規則,LSTM預設的variable_scope叫做“bidirectional_rnn”如果沒額外操作過的話變數前會自動帶上這個名字,所以儲存的模型裡的名字就類似於下面這樣:
tensor_name: train/train_1/fc_b/Adam
tensor_name: train_1/fc_b
tensor_name: train/fc_b
tensor_name: train/train/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam
tensor_name: train/bidirectional_rnn/fw/basic_lstm_cell/kernel
tensor_name: train/train/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam
tensor_name: train/beta2_power
tensor_name: train/train/fc_w/Adam
tensor_name: train_1/beta1_power
tensor_name: train/train/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam_1
tensor_name: train/train/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam_1
tensor_name: train/beta1_power
tensor_name: train/train_1/fc_w/Adam_1
tensor_name: train/train/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam_1
tensor_name: train_1/beta2_power
tensor_name: train/train/fc_w/Adam_1
tensor_name: train/bidirectional_rnn/bw/basic_lstm_cell/kernel
tensor_name: train/train/fc_b/Adam
tensor_name: train/bidirectional_rnn/bw/basic_lstm_cell/bias
tensor_name: train/fc_w
tensor_name: train_1/fc_w
tensor_name: train/bidirectional_rnn/fw/basic_lstm_cell/bias
tensor_name: train/train/fc_b/Adam_1
tensor_name: train/train/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam_1
tensor_name: train/train/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam
tensor_name: train/train/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam
tensor_name: train/train_1/fc_b/Adam_1
tensor_name: train/train_1/fc_w/Adam
前面的”train”是我新增的variable_scope,所以恢復的時候可以這樣:
include = ['train/fc_b', 'train/fc_w',
'train/bidirectional_rnn/bw/basic_lstm_cell/bias',
'train/bidirectional_rnn/bw/basic_lstm_cell/kernel',
'train/bidirectional_rnn/fw/basic_lstm_cell/bias',
'train/bidirectional_rnn/fw/basic_lstm_cell/kernel']
variables_to_restore = tf.contrib.slim.get_variables_to_restore(include=include)
saver = tf.train.Saver(variables_to_restore)
with tf.Session(config=tf.ConfigProto(device_count={'cpu': 0})) as sess:
sess.run(tf.global_variables_initializer())
# ... ...