基於LSTM和遷移學習的文字分類模型說明(Tensorflow)

阿新 • • 發佈：2019-01-19

考慮到在實際應用場景中，資料有可能後續增加，另外，類別也有可能重新分配，比如銀行業務中的[取款兩萬以下]和[取款兩萬以上]後續可能合併為一類[取款]，而重新訓練模型會浪費大量時間，因此我們考慮使用遷移學習來縮短訓練時間。即保留LSTM層的各權值變數，然後重新構建全連線層，即圖中的Softmax層。

分類器模型結構圖

具體遷移過程如下（程式碼基於Python3.5/Tensorflow1.2 github程式碼地址）：
Step1 構建網路模型

            with tf.name_scope("Train"):
                with tf.variable_scope("Model", reuse=None, initializer=initializer):
                    model = RNN_Model(config=config, num_classes=num_classes, is_training=True)


            with tf.name_scope("Valid"):
                with tf.variable_scope("Model", reuse=True, initializer=initializer):
                    valid_model = RNN_Model(config=valid_config, num_classes=num_classes, is_training=False)

Step1 構建網路模型

Step2 初始化變數（這一步要先做，以免覆蓋後續載入的Variable）

Step3 restore之前儲存的網路權值，這裡做了判斷

如果沒有模型檔案的話就從頭開始訓練

有模型檔案存在，但是輸出類別沒有發生變化的話，就接著訓練

有模型檔案，同時輸出類別發生了改變，就進行遷移學習

            if os.path.exists(checkpoint_dir):
                classes_file = codecs.open(os.path.join(config.out_dir, "classes"), "r", "utf-8")
                classes = list(line.strip() for line in classes_file.readlines())
                classes_file.close()

                # 類別是否發生改變
                if sorted(classify_names) == sorted(classes):
                    print('-----continue training-----')

                    new_classify_files = []
                    for c in classes:
                        idx = classify_names.index(c)
                        new_classify_files.append(classify_files[idx])

                    # classify_files = new_classify_files

                    restored_saver = tf.train.Saver(tf.global_variables())
                    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
                    if ckpt and ckpt.model_checkpoint_path:
                        print('restore model: '.format(ckpt.model_checkpoint_path))
                        restored_saver.restore(session, ckpt.model_checkpoint_path)
                    else:
                        print('-----train from beginning-----')
                else:
                    print('-----change network-----')
                    not_restore = ['softmax_w:0', 'softmax_b:0']
                    restore_var = [v for v in tf.global_variables() if v.name.split('/')[-1] not in not_restore]
                    restored_saver = tf.train.Saver(restore_var)
                    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
                    if ckpt and ckpt.model_checkpoint_path:
                        print('restore model: '.format(ckpt.model_checkpoint_path))
                        restored_saver.restore(session, ckpt.model_checkpoint_path)
                    else:
                        pass

                    classes_file = codecs.open(os.path.join(config.out_dir, "classes"), "w", "utf-8")
                    for classify_name in classify_names:
                        classes_file.write(classify_name)
                        classes_file.write('\n')
                    classes_file.close()
            else:
                print('-----train from begin-----')
                os.makedirs(checkpoint_dir)
                classes_file = codecs.open(os.path.join(config.out_dir, "classes"), "w", "utf-8")
                for classify_name in classify_names:
                    classes_file.write(classify_name)
                    classes_file.write('\n')
                classes_file.close()

Step4 開始訓練

經驗證，很快loss就收斂了，由於資料的變動不是很大，因此一個epoch就能到達收斂，持續好幾個小時的重複訓練可以縮短至幾分鐘。

另外，在寫程式碼的過程中，發現restored_saver.restore()這個函式的作用是載入之前儲存模型的各Variable，而Graph需要自己重新畫，這個函式的好處是，可以只加載你想要的Variable，不想要的可以丟掉，例如本文中，需要捨棄Softmax層的w 和b，可以這樣寫：

                    not_restore = ['softmax_w:0', 'softmax_b:0']
                    restore_var = [v for v in tf.global_variables() if v.name.split('/')[-1] not in not_restore]
                    restored_saver = tf.train.Saver(restore_var)
                    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
                    if ckpt and ckpt.model_checkpoint_path:
                        print('restore model: '.format(ckpt.model_checkpoint_path))
                        restored_saver.restore(session, ckpt.model_checkpoint_path)

如果不希望重複定義圖上的運算，也可以使用tf.train.import_meta_graph()直接載入已經持久化的圖，之前那篇部落格在呼叫訓練好的模型進行分類時，就是這麼做的：

                saver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file))
                saver.restore(self.session, checkpoint_file)

這個函式會把整個Graph連同裡面的各個量一股腦載入進來，這樣就導致不能對模型進行微調（fine tuning），就連batch size也是不能改，考慮到這一點，那時候我在訓練的時候驗證集對應的model只能設成1了。

對比感覺還是用restored_saver.restore()更方便、靈活一點，也不容易出錯。

基於LSTM和遷移學習的文字分類模型說明(Tensorflow)

基於LSTM和遷移學習的文字分類模型說明(Tensorflow)

基於RNN的文字分類模型（Tensorflow）

文字處理——基於 word2vec 和 CNN 的文字分類：綜述 & 實踐（一）

基於深度學習和遷移學習的遙感影象場景分類實踐（AlexNet、ResNet）

基於Tensorflow的LSTM-CNN文字分類模型

【NLP】語言模型和遷移學習

從零開始的文字TF-IDF向量構造和基於餘弦相似度的文字分類

一種基於CNN的自動化提取n-gram feanture的文字分類模型

基於迴圈神經網路 (LSTM) 的情感評論文字分類

乾貨 | 基於貝葉斯推斷的分類模型& 機器學習你會遇到的“坑”

量價線性模型假設-基於Adaboost和線性迴歸弱分類器

使用Keras和預訓練的詞向量訓練新聞文字分類模型

深度學習之文字分類模型-前饋神經網路(Feed-Forward Neural Networks)

inceptionv3 /v4遷移學習影象分類

機器視覺 OpenCV—python 基於LSTM網路的OCR文字檢測與識別

北大人工智慧網課攻略[4]:基於VGG16的遷移學習

tensorflow遷移學習-使用現有模型進行新專案訓練

幾種使用了CNN（卷積神經網路）的文字分類模型

基於Python的機器學習之分類學習

TensorFlow從入門到放棄（二）——基於InceptionV3的遷移學習以及影象特徵的提取

基於LSTM和遷移學習的文字分類模型說明(Tensorflow)

相關推薦