日常填坑之TF模型載入“Key Variable_xxx not found in checkpoint”

阿新 • • 發佈：2018-11-08

儲存模型的時候一切正常，但是載入的時候就會出現“Key Variable_xxx not found in checkpoint”錯誤。首先要分析錯誤原因，一般情況下model.ckpt檔案肯定都有的，都是載入的時候出的問題。所以先把ckpt檔案中的變數打印出來看看。這裡有個前提條件，定義變數的時候需要指定name引數，不然打印出來的都是“Variable_xxx:0”之類的！

import os
from tensorflow.python import pywrap_tensorflow

current_path = os.getcwd()
model_dir = os.path.join(current_path, 'model' 
)
checkpoint_path = os.path.join(model_dir,'embedding.ckpt-0') # 儲存的ckpt檔名，不一定是這個
# Read data from checkpoint file
reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path)
var_to_shape_map = reader.get_variable_to_shape_map()
# Print tensor name and values
for key in var_to_shape_map:
    print("tensor_name: " 
, key)
    # print(reader.get_tensor(key)) # 列印變數的值，對我們查詢問題沒啥影響，打印出來反而影響找問題

我的輸出：

tensor_name:  w_1_1/Adam_1
tensor_name:  w_2/Adam_1
tensor_name:  b_2
tensor_name:  w_1_1
tensor_name:  w_out/Adam_1
tensor_name:  b_1_1/Adam_1
tensor_name:  w_out
tensor_name:  w_1
tensor_name:  b_out
tensor_name:  b_2/Adam 

tensor_name:  b_1
tensor_name:  b_out/Adam_1
tensor_name:  b_1_1/Adam
tensor_name:  w_1_1/Adam
tensor_name:  b_1_1
tensor_name:  w_2/Adam
tensor_name:  w_2
tensor_name:  w_out/Adam
tensor_name:  beta1_power
tensor_name:  b_out/Adam
tensor_name:  b_2/Adam_1
tensor_name:  beta2_power

這就很明顯了，我的網路裡只有”b_1,b_2,w_1,w_2”這種變數，由於使用了tf.train.AdamOptimizer()來更新梯度，所以在儲存檢查點的時候如果不指定則是全域性儲存，把優化的變數“w_out/Adam”這種命名規則的變數也一併儲存了，自然在恢復的時候就會出現找不到XX變數。解決辦法，在宣告 saver = tf.train.Saver()的時候帶上引數，即需要儲存的變數

def ann_net(w_alpha=0.01, b_alpha=0.1):
    # 隱藏層_1
    w_1 = tf.Variable(w_alpha * tf.random_normal(shape=(input_size, hidden1_size)), name='w_1')
    b_1 = tf.Variable(b_alpha * tf.random_normal(shape=[hidden1_size]),name='b_1')
    hidden1_output = tf.nn.tanh(tf.add(tf.matmul(X, w_1), b_1))
    hidden1_output = tf.nn.dropout(hidden1_output, keep_prob)

    # 隱藏層_2
    shp1 = hidden1_output.get_shape()
    w_2 = tf.Variable(w_alpha * tf.random_normal(shape=(shp1[1].value, hidden2_size)), name='w_2')
    b_2 = tf.Variable(b_alpha * tf.random_normal(shape=[hidden2_size]),name='b_2')
    hidden2_output = tf.nn.tanh(tf.add(tf.matmul(hidden1_output, w_2), b_2))
    hidden2_output = tf.nn.dropout(hidden2_output, keep_prob)

    # 輸出層
    shp2 = hidden2_output.get_shape()
    w_output = tf.Variable(w_alpha * tf.random_normal(shape=(shp2[1].value, embeding_size)), name='w_out')
    b_output = tf.Variable(b_alpha * tf.random_normal(shape=[embeding_size]),name='b_out')
    output = tf.add(tf.matmul(hidden2_output, w_output), b_output)

    variables_dict = {'b_2': b_2, 'w_out': w_output, 'w_1': w_1, 'b_out': b_output, 'b_1': b_1, 'w_2': w_2}
    return output,variables_dict

在train()函式裡，使用variables_dict初始化saver

with tf.device('/cpu:0'):
    saver = tf.train.Saver(var_dict)
    with tf.Session(config=tf.ConfigProto(device_count={'cpu': 0})) as sess:
        sess.run(tf.global_variables_initializer())
        step = 0

        ckpt = tf.train.get_checkpoint_state('model/')
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
            step = int(ckpt.model_checkpoint_path.rsplit('-',1)[1])
            print("Model restored.")
    # 訓練程式碼
    # ... ...
    saver.save(sess, 'model/embedding.model', global_step=step)

如果是從網上down的模型比如vgg-16之類的，只想載入前面的幾層，而且用自己定義的變數，方法一樣，指定一個變數列表或者字典，傳給tf.train.Saver()。
如果是LSTM，道理也一樣，不過系統儲存的時候有tf自己的規則，LSTM預設的variable_scope叫做“bidirectional_rnn”如果沒額外操作過的話變數前會自動帶上這個名字，所以儲存的模型裡的名字就類似於下面這樣：

tensor_name:  train/train_1/fc_b/Adam
tensor_name:  train_1/fc_b
tensor_name:  train/fc_b
tensor_name:  train/train/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam
tensor_name:  train/bidirectional_rnn/fw/basic_lstm_cell/kernel
tensor_name:  train/train/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam
tensor_name:  train/beta2_power
tensor_name:  train/train/fc_w/Adam
tensor_name:  train_1/beta1_power
tensor_name:  train/train/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam_1
tensor_name:  train/train/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam_1
tensor_name:  train/beta1_power
tensor_name:  train/train_1/fc_w/Adam_1
tensor_name:  train/train/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam_1
tensor_name:  train_1/beta2_power
tensor_name:  train/train/fc_w/Adam_1
tensor_name:  train/bidirectional_rnn/bw/basic_lstm_cell/kernel
tensor_name:  train/train/fc_b/Adam
tensor_name:  train/bidirectional_rnn/bw/basic_lstm_cell/bias
tensor_name:  train/fc_w
tensor_name:  train_1/fc_w
tensor_name:  train/bidirectional_rnn/fw/basic_lstm_cell/bias
tensor_name:  train/train/fc_b/Adam_1
tensor_name:  train/train/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam_1
tensor_name:  train/train/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam
tensor_name:  train/train/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam
tensor_name:  train/train_1/fc_b/Adam_1
tensor_name:  train/train_1/fc_w/Adam

前面的”train”是我新增的variable_scope，所以恢復的時候可以這樣：

include = ['train/fc_b', 'train/fc_w',
           'train/bidirectional_rnn/bw/basic_lstm_cell/bias',
           'train/bidirectional_rnn/bw/basic_lstm_cell/kernel',
           'train/bidirectional_rnn/fw/basic_lstm_cell/bias',
           'train/bidirectional_rnn/fw/basic_lstm_cell/kernel']
variables_to_restore = tf.contrib.slim.get_variables_to_restore(include=include)
saver = tf.train.Saver(variables_to_restore)
with tf.Session(config=tf.ConfigProto(device_count={'cpu': 0})) as sess:
    sess.run(tf.global_variables_initializer())
    # ... ...

日常填坑之TF模型載入“Key Variable_xxx not found in checkpoint”

日常填坑之TF模型載入“Key Variable_xxx not found in checkpoint”

Key Variable not found in checkpoint [[{{node save/RestoreV2}} = RestoreV2[dtypes=

解決tensorflow中報錯NotFoundError (see above for traceback): Key v1_1 not found in checkpoint問題

解決模型載入NotFoundError (see above for traceback) Key v1 not found in checkp錯誤

Key in_hidden/batch_normalization/beta not found in checkpoint

[Bug集合]NotFoundError: Key dense_１/kernel not found in checkpoint

問題記錄：恢復某些層引數，遇到NotFoundError: Key conv2d_168/bias not found in checkpoint

easyrecyclerview 重新整理載入功能程式碼分析(填坑之旅)

微信JSSDK分享--挖坑填坑之小結

記錄一下openvpn，填坑之路

tensorboard的可視化小白踩坑填坑之路

[日常填坑]centos7.0+版本服務器安裝jdk9或者jdk10

[日常填坑]部署使用Idea開發的spring框架的多模塊項目到服務器

[日常填坑]圖像分類實戰-服務器環境配置

[日常填坑]阿裏雲centos6.5服務器掛載數據盤

開發小程序(填坑之路，遇到一點就更新一點)

vue填坑之全局引入less，scss，styl文件

填坑之由hosts問題引發的問題

小程式填坑之路——文字超出部分隱藏（已解決）

小程式填坑之路—input密碼可見與不可見（已解決）

日常填坑之TF模型載入“Key Variable_xxx not found in checkpoint”

相關推薦