論文Multi-Perspective Sentence Similarity Modeling with Convolution Neural Networks實現之網路模型搭建及訓練

阿新 • • 發佈：2018-11-21

環境：

Python3.6

Tensorflow-GPU 1.8.0

本文所實現的網路模型是在https://blog.csdn.net/liuchonge/article/details/64440110的基礎上搭建的，不同的是為了應對loss為NAN的情況，本文在每一層卷積的後面都添加了一層BN，且comU1只計算cosine距離和L1距離，comU2只計算cosine距離。

基於此，本文只列出訓練檔案的程式碼。

首先，對GPU進行設定，並對模型涉及的引數進行設定，如下所示：

#GPU設定
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES'] = "0"


"""各項引數設定"""
#網路引數
tf.flags.DEFINE_integer('sentence_length', 100, 'The length of sentence')
tf.flags.DEFINE_integer('embedding_size', 50, 'The dimension of the word embedding')
tf.flags.DEFINE_integer('num_filters_A', 20, 'The number of filters in block A')
tf.flags.DEFINE_integer('num_filters_B', 20, 'The number of filters in block B')
tf.flags.DEFINE_string('filter_sizes', '1,2,100', 'The size of filter')
tf.flags.DEFINE_integer('num_classes', 6, 'The number of lables')
tf.flags.DEFINE_integer('n_hidden', 150, 'The number of hidden units in the fully connected layer')
tf.flags.DEFINE_float('dropout_keep_prob', 0.5, 'The proability of dropout')
#訓練引數
tf.flags.DEFINE_integer('num_epochs', 10, 'The number of epochs to be trained')
tf.flags.DEFINE_integer('batch_size', 32, 'The size of mini batch')
tf.flags.DEFINE_integer('evaluate_every', 100, 'Evaluate model on dev set after this many steps(default:100)')
tf.flags.DEFINE_integer('checkpoint_every', 100, 'Save model after this many steps(default;100)')
tf.flags.DEFINE_integer('num_checkpoints', 5, 'The number of checkpoints to store(default:5)')
#L2正則項
tf.flags.DEFINE_float('lr', 1e-3, 'The learning rate of this model')
tf.flags.DEFINE_float('l2_reg_lambda', 1e-4, 'The regulatization parameter')
#裝置引數
tf.flags.DEFINE_boolean('allow_soft_placement', True, 'Allow device soft device placement')
tf.app.flags.DEFINE_boolean('log_device_placement', False, 'Log placement of ops on devices')
"""各項引數設定"""

FLAGS = tf.flags.FLAGS
FLAGS.flag_values_dict()
print('\nParameters:')
for attr, value in sorted(FLAGS.__flags.items()):
    print('{}={}'.format(attr.upper(), value))
print('')

設定完引數之後，我們需要讀取資料：

#glove是載入的詞向量。glove.d是單詞索引字典<word, index>，glove.g是詞向量矩陣<詞個數,300>
print('loading glove...')
glove = emb.GloVe(N=50)
print('==============GloVe模型載入完畢！===================')

print("Loading data...")
Xtrain, ytrain = data_helper.load_set(glove, path='./sts/semeval-sts/all')
Xtrain[0], Xtrain[1], ytrain = shuffle(Xtrain[0], Xtrain[1], ytrain)
#[22592, 句長]
Xtest, ytest = data_helper.load_set(glove, path='./sts/semeval-sts/2016')
Xtest[0], Xtest[1], ytest = shuffle(Xtest[0], Xtest[1], ytest)
#[1186, 句長]
print('==============資料載入完畢！===================')

建議採用將讀取GloVe模型進行封裝已具備更好的程式碼健壯性。

繼而，我們就需要編寫訓練模型的程式碼：

"""Start the MPCNN model"""
with tf.Graph().as_default():
    session_config = tf.ConfigProto(allow_soft_placement = FLAGS.allow_soft_placement,
                                    log_device_placement = FLAGS.log_device_placement)
    session_config.gpu_options.allow_growth = True
    session = tf.Session(config=session_config)
    with session.as_default():
        """定義輸入輸出等placeholder"""
        input_1 = tf.placeholder(tf.int32, shape=[None, FLAGS.sentence_length], name='input_x1')
        input_2 = tf.placeholder(tf.int32, shape=[None, FLAGS.sentence_length], name='input_x2')
        input_3 = tf.placeholder(tf.float32, shape=[None, FLAGS.num_classes], name='input_y')
        dropout_keep_prob = tf.placeholder(tf.float32, name='dropout_keep_prob')
        print('佔位符構建完畢！')

        with tf.device('/cpu:0'), tf.name_scope('embedding'):
            s0 = tf.nn.embedding_lookup(glove.g, input_1) #此時輸入變數的shape為3維
            s1 = tf.nn.embedding_lookup(glove.g, input_2)
            print('embedding轉換完畢！')

        with tf.name_scope('reshape'):
            # input_x1 = tf.expand_dims(s0, -1) #將輸入變數轉換為符合的Tensor4維變數
            # input_x2 = tf.expand_dims(s1, -1)
            input_x1 = tf.reshape(s0, [-1, FLAGS.sentence_length, FLAGS.embedding_size, 1])
            input_x2 = tf.reshape(s1, [-1, FLAGS.sentence_length, FLAGS.embedding_size, 1])
            input_y = tf.reshape(input_3, [-1, FLAGS.num_classes])
        print('reshape完畢！')

        #構建MPCNN模型
        model = MPCNN(num_classes=FLAGS.num_classes, embedding_size=FLAGS.embedding_size,
                      filter_sizes=[int(size) for size in FLAGS.filter_sizes.split(',')],
                      num_filters=[FLAGS.num_filters_A, FLAGS.num_filters_B],
                      n_hidden = FLAGS.n_hidden,
                      input_x1=input_x1,
                      input_x2=input_x2,
                      input_y=input_y,
                      dropout_keep_prob=FLAGS.dropout_keep_prob,
                      l2_reg_lambda = FLAGS.l2_reg_lambda)
        print('MPCNN模型構建完畢！')


        global_step = tf.Variable(0, name='global_step', trainable=False)
        # 獲得模型輸出
        print('================模型計算相似性得分====================')
        model.similarity_measure_layer()
        print('===============模型計算完畢========================')
        optimizer = tf.train.AdamOptimizer(FLAGS.lr)
        grads_and_vars = optimizer.compute_gradients(model.loss)
        train_step = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

        timestamp = str(int(time.time()))
        out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", timestamp))
        # print("Writing to {}\n".format(out_dir))
        #
        loss_summary = tf.summary.scalar("loss", model.loss)
        acc_summary = tf.summary.scalar("accuracy", model.accuracy)
        #
        train_summary_op = tf.summary.merge([loss_summary, acc_summary])
        train_summary_dir = os.path.join(out_dir, "summaries", "train")
        train_summary_writer = tf.summary.FileWriter(train_summary_dir, session.graph)
        #
        dev_summary_op = tf.summary.merge([loss_summary, acc_summary])
        dev_summary_dir = os.path.join(out_dir, "summaries", "dev")
        dev_summary_writer = tf.summary.FileWriter(dev_summary_dir, session.graph)


        #
        # checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
        # checkpoint_prefix = os.path.join(checkpoint_dir, "model")
        # if not os.path.exists(checkpoint_dir):
        #     os.makedirs(checkpoint_dir)
        # saver = tf.train.Saver(tf.global_variables(), max_to_keep=conf.num_checkpoints)

        def train(x1_batch, x2_batch, y_batch):
            """
            A single training step
            """
            feed_dict = {
                input_1: x1_batch,
                input_2: x2_batch,
                input_3: y_batch,
                dropout_keep_prob: 0.5
            }
            _, step, summaries, batch_loss, accuracy = session.run(
                [train_step, global_step, train_summary_op, model.loss, model.accuracy],
                feed_dict)
            time_str = datetime.datetime.now().isoformat()
            print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, batch_loss, accuracy))
            train_summary_writer.add_summary(summaries, step)


        def dev_step(x1_batch, x2_batch, y_batch, writer=None):
            """
            Evaluates model on a dev set
            """
            feed_dict = {
                input_1: x1_batch,
                input_2: x2_batch,
                input_3: y_batch,
                dropout_keep_prob: 1
            }
            _, step, summaries, batch_loss, accuracy = session.run(
                [train_step, global_step, dev_summary_op, model.loss, model.accuracy],
                feed_dict)
            time_str = datetime.datetime.now().isoformat()
            dev_summary_writer.add_summary(summaries, step)
            # if writer:
            #     writer.add_summary(summaries, step)

            return batch_loss, accuracy


        session.run(tf.global_variables_initializer())
        print('模型引數初始化完畢！')
        print('生成batch')
        batches = data_helper.batch_iter(list(zip(Xtrain[0], Xtrain[1], ytrain)), FLAGS.batch_size, FLAGS.num_epochs)
        print('batch生成完畢！')
        print('Start Training......')
        for batch in batches:
            x1_batch, x2_batch, y_batch = zip(*batch)
            train(x1_batch, x2_batch, y_batch)
            current_step = tf.train.global_step(session, global_step)
            if current_step % FLAGS.evaluate_every == 0:
                total_dev_loss = 0.0
                total_dev_accuracy = 0.0

                print("\nEvaluation:")
                dev_batches = data_helper.batch_iter(list(zip(Xtest[0], Xtest[1], ytest)), FLAGS.batch_size, 1)
                for dev_batch in dev_batches:
                    x1_dev_batch, x2_dev_batch, y_dev_batch = zip(*dev_batch)
                    dev_loss, dev_accuracy = dev_step(x1_dev_batch, x2_dev_batch, y_dev_batch)
                    total_dev_loss += dev_loss
                    total_dev_accuracy += dev_accuracy
                total_dev_accuracy = total_dev_accuracy / (len(ytest) / FLAGS.batch_size)
                print("dev_loss {:g}, dev_acc {:g}, num_dev_batches {:g}".format(total_dev_loss, total_dev_accuracy,
                                                                               len(ytest) / FLAGS.batch_size))
   

        print("Optimization Finished!")

如此，整個訓練檔案就編寫完畢！

執行之後，結果如下所示：

論文Multi-Perspective Sentence Similarity Modeling with Convolution Neural Networks實現之網路模型搭建及訓練

環境： Python3.6 Tensorflow-GPU 1.8.0 本文所實現的網路模型是在https://blog.csdn.net/liuchonge/article/details/64440110的基礎上搭建的，不同的是為了應對loss為NAN的情況，本文在每一層卷積的後面

論文Multi-Perspective Sentence Similarity Modeling with Convolution Neural Networks實現之資料集製作

1.資料集本文采用的是STS資料集，如下圖所示，包括所有的2012-2016年的資料，而all資料夾包含2012-2015的所有資料。每一個檔案的具體資料如下所示，每一行為一個三元組：<相似性得分，句子1，句子2>. 在實現時將all資料夾中的所有資料當作

Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks的理解以及翻譯

模型關於語句的相似度，由於變異以及長短不同表達，設計了一個這些中間，探究了輸入的多個角度運用多個卷積型別以及多種型別的pooling，類似於運用了多個相似度函式。模型包括兩個組成部分如圖：如圖1，兩個輸入的句子由兩個並行的神經網路處理，輸出句子representati

Language Modeling with Gated Convolutional Networks

raw eight 性問題 2個預測 out vanish 上下文模型語言模型所謂的語言模型，即是指在得知前面的若幹個單詞的時候，下一個位置上出現的某個單詞的概率。最樸素的方法是N-gram語言模型，即當前位置只和前面N個位置的單詞相關。如此，問題便是，N小

[CVPR2015] Is object localization for free? – Weakly-supervised learning with convolutional neural networks論文筆記

sed pooling was 技術分享 sco 評測 5.0 ict highest p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px "Helvetica Neue"; color: #323333 } p.p2

論文筆記：Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

感想最近深度學習面試的時候，有個面試官問了我LSTM，我一下子傻眼了，確實不怎麼好懂，學LSTM已經有半年的時間了，但是對這個玩意兒卻還不怎麼明白，可能是沒用過它的緣故吧，我找了一篇它和GRU比較的論文，這篇論文沒有從理論上證明哪個模型的好壞，只是從實驗，應用場景的角度發現GRU在一些場景比LST

Ranking with Recursive Neural Networks and Its Application to Multi-document Summarization

Cao Z, Wei F, Dong L, et al. Ranking with recursive neural networks and its application to multi-document summarization[C]// Twenty-Ninth AAAI Con

論文筆記 / Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks

僅供參考，如有翻譯不到位的地方敬請指出。轉載請標明出處！論文地址：https://link.springer.com/chapter/10.1007/978-3-642-40763-5_51 摘要我們使用含有最大池化層的深度卷積神經網路來檢測乳腺組織學影象中的有絲分裂。訓練網路以

【論文筆記1】RNN在影象壓縮領域的運用——Variable Rate Image Compression with Recurrent Neural Networks

一、引言隨著網際網路的發展，網路圖片的數量越來越多，而使用者對網頁載入的速度要求越來越高。為了滿足使用者對網頁載入快速性、舒適性的服務需求，如何將影象以更低的位元組數儲存（儲存空間的節省意味著更快的傳輸速度）並給使用者一個低解析度的thumbnails（縮圖）的previ

【論文閱讀】Bag of Tricks for Image Classification with Convolutional Neural Networks

Bag of Tricks for Image Classification with Convolutional Neural Networks 論文：https://arxiv.org/pdf/1812.01187.pdf 本文作者總結了模型訓練過程中可以提高準確率的方法,如題，

【論文筆記2】影象壓縮神經網路在Kodak資料集上首次超越JPEG——Full Resolution Image Compression with Recurrent Neural Networks

一、引言這篇論文提出了一種基於神經網路的全解析度的有損影象壓縮方法，在變壓縮比的情況下無需重複訓練，所以說整個網路只需要訓練一次。論文的內容主要包括如下三個部分：（1）提出了三種影象壓縮框架，分別是基於LSTM的RNN網路、基於關聯LSTM（associative

【論文閱讀】Learning Spatiotemporal Features with 3D Convolutional Networks

【論文閱讀】Learning Spatiotemporal Features with 3D Convolutional Networks 這是一篇15年ICCV的論文，本篇論文提出的C3D卷積網路是3D卷積網路的里程碑，以3D卷積核為基礎的3D卷積網路從此發展起來。論文地址：下載地址

【醫學影像】《Dermatologist-level classification of skin cancer with deep neural networks》論文筆記

這是一篇關於面板癌分類的文章，核心就是分類器，由斯坦福大學團隊發表，居然發到了nature上，讓我驚訝又佩服，雖然在方法上沒什麼大的創新，但是論文字身的工作卻意義重大，並且這篇17年見刊的文章，引用量已經達到1300多，讓人佩服，值得學習。【出發點】現有的面板癌分類系統由於資料量不夠，同時只針對標準化的影

AlphaGo論文的譯文，用深度神經網路和樹搜尋征服圍棋：Mastering the game of Go with deep neural networks and tree search

前言：圍棋的英文是 the game of Go，標題翻譯為：《用深度神經網路和樹搜尋征服圍棋》。譯者簡介：大三，211，電腦科學與技術專業，平均分92分，專業第一。為了更好地翻譯此文，譯者查看了很多資料。譯者翻譯此論文已盡全力，不足之處希望讀者指出

Mastering the game of Go with deep neural networks and tree search

深度策略參數初始化技術以及 -1 簡單 cpu 網絡 Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.758

Sentiment Analysis with Recurrent Neural Networks in TensorFlow 利用TensorFlow迴歸神經網路進行情感分析 Pluralsigh

Sentiment Analysis with Recurrent Neural Networks in TensorFlow 中文字幕利用TensorFlow迴歸神經網路進行情感分析中文字幕Sentiment Analysis with Recurrent Neural Netwo

Mastering the game of Go with deep neural networks and tree search譯文

用深度神經網路和樹搜尋征服圍棋作者：David Silver 1 , Aja Huang 1 , Chris J. Maddison 1 , Arthur Guez 1 , Laurent Sifre 1 , George van den Driessche

Sentiment Analysis with Recurrent Neural Networks in TensorFlow 利用TensorFlow迴歸神經網路進行情感分析 Pluralsigh

Sentiment Analysis with Recurrent Neural Networks in TensorFlow 中文字幕利用TensorFlow迴歸神經網路進行情感分析中文字幕Sentiment Analysis with Recurrent

Bag of Tricks for Image Classification with Convolutional Neural Networks

Bag of Tricks for Image Classification with Convolutional Neural Networks，李沐大神18年12月的新作，用卷積神經網路進行影象分類的一些技巧。論文：Bag of Tricks for Image Classific

論文閱讀-(CVPR 2017) Kernel Pooling for Convolutional Neural Networks

在這篇論文中，作者提出了一種更加通用的池化框架，以核函式的形式捕捉特徵之間的高階資訊。同時也證明了使用無引數化的緊緻清晰特徵對映，以指定階形式逼近核函式，例如高斯核函式。本文提出的核函式池化可以和CNN網路聯合優化。 Network Structure Overview Kernel Pooling

論文Multi-Perspective Sentence Similarity Modeling with Convolution Neural Networks實現之網路模型搭建及訓練

相關推薦