用CNN做句子分類：CNN Sentence Classification (with Theano code)

阿新 • • 發佈：2022-05-03

01 Intro

本篇文章來細說CNN在NLP中的一大應用————句子分類。通過Yoon Kim的論文介紹一個應用，分析程式碼，並重構程式碼。

重構後的程式碼放在github（https://github.com/applenob/CNN_sentence），另附io博文地址（https://applenob.github.io/cnn_sc.html）

傳統的句子分類器一般使用SVM和Naive Bayes。傳統方法使用的文字表示方法大多是“詞袋模型”。即只考慮文字中詞的出現的頻率，不考慮詞的序列資訊。傳統方法也可以強行使用N-gram的方法，但是這樣會帶來稀疏問題，意義不大。

CNN（卷積神經網路），雖然出身於影象處理，但是它的思路，給我們提供了在NLP應用上的參考。“卷積”這個術語本身來自於訊號處理，它的物理意義可以參考知乎上關於“複利”的回答（https://www.zhihu.com/question/22298352?rf=21686447），或者參考colah大神的部落格（http://colah.github.io/posts/2014-07-Understanding-Convolutions/）。

簡單地說就是一系列的輸入訊號進來之後，系統也會有一系列的輸出。但是並不是某一時刻的輸出只對應該時刻的輸入，而是根據系統自身的特徵，每一個時刻的輸出，都和之前的輸入相關。那麼如果文字是一些列輸入，我們當然希望考慮詞和詞的序列特徵，比如“Tom 的手機 ”，使用卷積，系統就會知道“手機是tom”的，而不是僅僅是一個“手機”。

或者更直觀地理解，在CNN模型中，卷積就是拿kernel在影象上到處移動，每移動一次提取一次特徵，組成feature map，這個提取特徵的過程，就是卷積。

接下來，我們看看Yoon Kim的paper：Convolutional Neural Networks for Sentence Classification (EMNLP 2014)

論文框架介紹

Yoon Kim 自己畫的結構圖：

模型結構.png

具體結構介紹：

1、輸入層

可以把輸入層理解成把一句話轉化成了一個二維的影象：每一排是一個詞的word2vec向量，縱向是這句話的每個詞按序排列。輸入資料的size，也就是影象的size，n×k，n代表訓練資料中最長的句子的詞個數，這裡是64（不夠64個詞的句子採用zero padding），k是embbeding的維度，這裡是300。所謂的static和non-static的chanel解釋如下：

CNN-rand: 所有的word vector都是隨機初始化的，同時當做訓練過程中優化的引數；

CNN-static: 所有的word vector直接使用無監督學習即Google的Word2Vector工具(COW模型)得到的結果，並且是固定不變的；

CNN-non-static: 所有的word vector直接使用無監督學習即Google的Word2Vector工具(COW模型)得到的結果，但是會在訓練過程中被Fine tuned；

CNN-multichannel: CNN-static和CNN-non-static的混合版本，即兩種型別的輸入；

從輸入層還可以看出kernel的size。很明顯kernel的高(h)會有不同的值，圖上有的是2，有的是3。這很容易理解，不同的kernel想獲取不同範圍內詞的關係；和影象不同的是，nlp中的cnn的kernel的寬(w)一般都是影象的寬，也就是word2vec的維度，這也可以理解，因為我們需要獲得的是縱向的差異資訊，也就是不同範圍的詞出現會帶來什麼資訊。

2、卷積層

由於kernel的特殊形狀，因此卷積後的feature map是一個寬度是1的長條。

3、池化層

這裡使用是MaxPooling，並且一個feature map只選一個最大值留下。這被認為是按照這個kernel卷積後的最重要的特徵。

4、全連線層

這裡的全連線層是帶dropout的全連線層和softmax。

論文實驗介紹

資料

1.word2vec使用谷歌預訓練的GoogleNews-vectors-negative300.bin

2.資料集

資料集.png

訓練和調參

filter window(kernel)的高度(h)：3,4,5；每個高度的Feature Map的數量為100，一共300個Feature Map；
Dropout rate 0.5；
L2 constraint （正則化限制權值大小）不超過3；
mini-batch size 50；
通過網格搜尋方法(Grid Search)得到的最優引數；
優化器使用Adadelta。

結果

結果.png

試著跑跑

Yoon Kim在GitHub上分享了自己的程式碼和資料集MR（Movie Review，只有兩個類，neg和pos）。

讓我們動手跑跑這個程式！

1、載入資料集

python process_data.py /home/cer/Data/GoogleNews-vectors-negative300.bin
output:
loading data... data loaded! 
number of sentences: 10662 
vocab size: 18765 
max sentence length: 56 
loading word2vec vectors... word2vec loaded! 
num words already in word2vec: 16448 
dataset created!

2、跑模型（使用預先載入的word2vec,並且不改變）注：為了便於顯示cv個數從10減到2

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python conv_net_sentence.py -nonstatic -word2vec

output：

Using gpu device 0: GeForce GTX 960M (CNMeM is disabled, cuDNN not available) 
loading data... data loaded! 
model architecture: CNN-non-static 
using: word2vec vectors 
[('image shape', 64, 300), ('filter shape', [(100, 1, 3, 300), (100, 1, 4, 300), (100, 1, 5, 300)]), ('hidden_units', [100, 2]), ('dropout', [0.5]), ('batch_size', 50), ('non_static', True), ('learn_decay', 0.95), ('conv_non_linear', 'relu'), ('non_static', True), ('sqr_norm_lim', 9), ('shuffle_batch', True)] 
... training 
epoch: 1, training time: 10.58 secs, train perf: 79.86 %, val perf: 75.16 % 
epoch: 2, training time: 10.48 secs, train perf: 86.93 %, val perf: 77.89 % 
epoch: 3, training time: 11.05 secs, train perf: 88.25 %, val perf: 77.68 % 
epoch: 4, training time: 10.73 secs, train perf: 95.44 %, val perf: 79.89 % 
epoch: 5, training time: 10.69 secs, train perf: 97.91 %, val perf: 79.58 % 
epoch: 6, training time: 11.38 secs, train perf: 99.11 %, val perf: 80.74 % 
epoch: 7, training time: 10.80 secs, train perf: 99.13 %, val perf: 79.16 % 
epoch: 8, training time: 11.11 secs, train perf: 99.84 %, val perf: 80.53 % 
epoch: 9, training time: 11.05 secs, train perf: 99.94 %, val perf: 80.95 %
 epoch: 10, training time: 11.03 secs, train perf: 99.91 %, val perf: 79.68 % 
epoch: 11, training time: 10.85 secs, train perf: 99.97 %, val perf: 80.74 % 
epoch: 12, training time: 11.01 secs, train perf: 99.98 %, val perf: 80.42 % 
epoch: 13, training time: 10.64 secs, train perf: 99.98 %, val perf: 80.53 % 
epoch: 14, training time: 11.32 secs, train perf: 99.99 %, val perf: 80.32 % 
epoch: 15, training time: 11.04 secs, train perf: 99.99 %, val perf: 79.68 % 
epoch: 16, training time: 10.98 secs, train perf: 99.99 %, val perf: 80.21 % 
epoch: 17, training time: 11.14 secs, train perf: 99.99 %, val perf: 80.53 % 
epoch: 18, training time: 11.06 secs, train perf: 99.99 %, val perf: 80.53 % 
epoch: 19, training time: 12.21 secs, train perf: 99.99 %, val perf: 80.63 % 
epoch: 20, training time: 10.68 secs, train perf: 100.00 %, val perf: 80.95 % 
epoch: 21, training time: 10.64 secs, train perf: 100.00 %, val perf: 80.42 % 
epoch: 22, training time: 11.16 secs, train perf: 100.00 %, val perf: 80.32 % 
epoch: 23, training time: 10.88 secs, train perf: 100.00 %, val perf: 80.53 % 
epoch: 24, training time: 10.65 secs, train perf: 100.00 %, val perf: 80.32 % 
epoch: 25, training time: 10.84 secs, train perf: 100.00 %, val perf: 80.32 % 
cv: 0, perf: 0.793002915452 
[('image shape', 64, 300), ('filter shape', [(100, 1, 3, 300), (100, 1, 4, 300), (100, 1, 5, 300)]), ('hidden_units', [100, 2]), ('dropout', [0.5]), ('batch_size', 50), ('non_static', True), ('learn_decay', 0.95), ('conv_non_linear', 'relu'), ('non_static', True), ('sqr_norm_lim', 9), ('shuffle_batch', True)] 
... training
 epoch: 1, training time: 10.92 secs, train perf: 80.01 %, val perf: 77.16 % 
epoch: 2, training time: 10.68 secs, train perf: 87.68 %, val perf: 79.89 % 
epoch: 3, training time: 10.78 secs, train perf: 91.45 %, val perf: 80.53 % 
epoch: 4, training time: 10.76 secs, train perf: 95.78 %, val perf: 80.63 % 
epoch: 5, training time: 10.62 secs, train perf: 97.99 %, val perf: 80.42 % 
epoch: 6, training time: 10.69 secs, train perf: 99.10 %, val perf: 79.89 % 
epoch: 7, training time: 10.95 secs, train perf: 99.31 %, val perf: 79.68 % 
epoch: 8, training time: 10.86 secs, train perf: 99.68 %, val perf: 79.68 % 
epoch: 9, training time: 10.64 secs, train perf: 99.82 %, val perf: 79.89 % 
epoch: 10, training time: 10.75 secs, train perf: 99.93 %, val perf: 80.32 % 
epoch: 11, training time: 10.94 secs, train perf: 99.97 %, val perf: 80.21 % 
epoch: 12, training time: 10.71 secs, train perf: 99.99 %, val perf: 80.53 % 
epoch: 13, training time: 10.74 secs, train perf: 99.97 %, val perf: 80.00 % 
epoch: 14, training time: 10.86 secs, train perf: 99.99 %, val perf: 80.00 % 
epoch: 15, training time: 11.00 secs, train perf: 99.99 %, val perf: 79.37 % 
epoch: 16, training time: 10.87 secs, train perf: 99.99 %, val perf: 80.11 % 
epoch: 17, training time: 10.94 secs, train perf: 99.99 %, val perf: 79.79 % 
epoch: 18, training time: 10.73 secs, train perf: 99.99 %, val perf: 79.79 % 
epoch: 19, training time: 11.05 secs, train perf: 100.00 %, val perf: 79.89 % 
epoch: 20, training time: 11.83 secs, train perf: 100.00 %, val perf: 79.79 % 
epoch: 21, training time: 10.85 secs, train perf: 100.00 %, val perf: 80.42 % 
epoch: 22, training time: 10.70 secs, train perf: 100.00 %, val perf: 79.79 % 
epoch: 23, training time: 10.89 secs, train perf: 100.00 %, val perf: 80.32 % 
epoch: 24, training time: 10.78 secs, train perf: 100.00 %, val perf: 80.00 % 
epoch: 25, training time: 11.19 secs, train perf: 100.00 %, val perf: 80.32 % 
cv: 1, perf: 0.814338235294 
0.803670575373

程式碼梳理

接下來研究研究Yoon Kim的程式碼，看看像這樣的一個Deep NLP的應用，是怎麼實現的。

5.1

大體結構

process_data.py:

資料預處理，資料以[revs, W, W2, word_idx_map, vocab]儲存在pkl檔案“mr.p”中。

revs的單條資料格式如下：

datum = {"y": 1, "text": orig_rev, "num_words": len(orig_rev.split()), "split": np.random.randint(0, cv)}

其中y是類標；text是句子原文（經過清洗）；num_words是句子長度（詞數）；split是分配的cv索引。

W即word matrix，W[i]是索引為i的詞對應的詞向量。

W2類似於W，但是是隨機初始化的。

word_idx_map是一個dict，key是資料集中出現的word，value是該word的索引。

vocab是一個dict，key是資料集中出現的word，value是該word出現的次數。

conv_net_classes.py:

定義具體的模型結構，不同的結構的層用不同的類定義。

如：

class HiddenLayer(object)class MLPDropout(object)class LogisticRegression(object)

conv_net_sentences.py:

完成資料的載入，模型的構建和連線，再訓練模型。

5.2

資料流

輸入的資料來自rt-polarity.neg和rt-polarity.pos，原始資料是很多英文句子，類標從檔名獲取。以及google的word2vec。

在process_data.py中：

1、build_data_cv()：接收資料集檔案，讀取兩個檔案，生成基本資料revs（rev的內容上面已經分析）。

2、load_bin_vec()：從GoogleNews-vectors-negative300.bin中載入w2v矩陣。生成w2v。w2v是一個dict，key是word，value是vector。

3、get_W():接收w2v，相當於把w2v從字典轉換成矩陣W，並且生成word_idx_map。相當於原來從word到vector只用查閱w2v字典；現在需要先從word_idx_map查閱word的索引，再2用word的索引到W矩陣獲取vector。

在conv_net_sentences.py中：

4、make_idx_data_cv():讀取rev中的text欄位，傳入get_idx_from_sent()方法，將句子轉換成一個list，list裡面的元素是這句話每個詞的索引。這個list形如(filter padding) - (word indices) - (Max padding) - (filter padding)，長度為max_l+2×(filter_h-1)，每句句子雖然本身長度不同，經過這步都轉換成相同長度的list。然後，按照cv索引，分割訓練集和測試集。

5.3

模型架構

在conv_net_classes.py中：

定義了所有網路層次和具體實現：

HiddenLayer
DropoutHiddenLayer
MLPDropout
MLP
LogisticRegression
LeNetConvPoolLayer

這些類大多數的實現都在init方法中：

1、首先接收這一層的輸入輸出的尺寸和這一層的輸入資料。

2、然後初始化這層的引數，引數都是theano.shared。

3、對於給定的輸入和引數，構建這層的輸出。

在conv_net_sentences.py中

獲取訓練資料和測試資料以後，絕大部分的工作由train_conv_net()完成：

1、傳入引數分為兩部分：（1）訓練資料+W矩陣（2）模型結構引數

2、組建模型網路：每層的定義都在conv_net_classes.py中實現了，因此這裡組建網路首先要初始化一個引數list：parameters，將每層的引數加入這個list統一管理；然後對於每一層，初始化該層的類，給該層喂入資料，獲取輸出；再將輸出餵給下一層，依照輸入輸出將每一層連線起來。

3、將訓練資料抽取0.1作為val資料。

4、構建function（theano.function）：（1）根據cost function構建train_model；（2）構建val集的測試函式：val_model（3）構建測試集的測試函式：test_model。

5、開始訓練。

程式碼重構

6.1

為什麼重構

首先要明確重構程式碼的目的：我不是真的認為Yoon Kim的程式碼寫的不好，我也不認為我重構完以後架構有多好；我的目的是learn by doing，通過重構程式碼加深對程式碼的理解，這是學習程式碼最好的方式之一。

6.2

哪裡可以重構

這份程式碼本來就是一分學術論文的實驗程式碼，可擴充套件性不高，我想用工業界的玩法去改這份程式碼，下面列出可以重構的地方：

1、如何定義神經網路某一層。

原來的程式碼用一個類定義一層，這本身沒有問題，但所有的細節都在init方法中實現，讓該方法顯得很臃腫，我們可以根據職責的不同，分開兩個方法：init_param()和build()。也就是構建某一層神經網路最重要的兩部：初始化引數和根據輸入獲取輸出。

2、train_conv_net()方法太臃腫，這一步包括了構建網路，拆分train/val，構建function，訓練。一共四大步，我們應該把每步拆分開。

3、為什麼沒有模型的類？模型的行為類似於具體某層的行為，一層可以是類，為什麼很多層組裝以後反而裝在一個方法裡？我們也可以寫一個模型類。

4、模型的結構引數為什麼由方法引數傳入？我們可以寫一個config檔案，把模型的結構引數寫在這個config檔案裡。這樣再做實驗時，調模型的引數只需修改config檔案。

6.3

重構細節

接下來按照上面的幾點，演示下重構的細節：

cer_main.py:載入資料，開始訓練。

cer_module.py:每層模型的實現細節。

cer_model.py:整體模型的實現。

1、重構單層類：

重構前：

class HiddenLayer(object):     
"""     
Class for HiddenLayer    
 """     
def __init__(self, rng, input, n_in, n_out, activation, W=None, b=None,                  
use_bias=False):          
self.input = input         
self.activation = activation        
if W is None:                         
if activation.func_name == "ReLU":                 
W_values = numpy.asarray(0.01 * rng.standard_normal(size=(n_in, n_out)), dtype=theano.config.floatX)            
else:                                 
W_values = numpy.asarray(rng.uniform(low=-numpy.sqrt(6. / (n_in + n_out)), high=numpy.sqrt(6. / (n_in + n_out)),                                                      
size=(n_in, n_out)), dtype=theano.config.floatX)             
W = theano.shared(value=W_values, name='W')                 
if b is None:             
b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)            
 b = theano.shared(value=b_values, name='b')         
 self.W = W         
self.b = b        
if use_bias:             
lin_output = T.dot(input, self.W) + self.b        
else:             
lin_output = T.dot(input, self.W)          
self.output = (lin_output if activation is None else activation(lin_output))        
# parameters of the model         
if use_bias:             
self.params = [self.W, self.b]       
 else:

self.params = [self.W]

重構後：

class HiddenLayer(object):     
"""     
Class for HiddenLayer     
"""     
 def __init__(self, rng,  n_in, n_out, activation, W=None, b=None):          
self.rng = rng         
self.activation = activation         
self.init_param(W, b, n_in, n_out)    
def init_param(self, W, b, n_in, n_out):         
if W is None:            
if self.activation.func_name == "ReLU":                 
W_values = numpy.asarray(0.01 * self.rng.standard_normal(size=(n_in, n_out)), dtype=theano.config.floatX)            
else:                 
W_values = numpy.asarray(                     
self.rng.uniform(low=-numpy.sqrt(6. / (n_in + n_out)), high=numpy.sqrt(6. / (n_in + n_out)),             size=(n_in, n_out)), dtype=theano.config.floatX)             
W = theano.shared(value=W_values, name='W')        
if b is None:             
b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)             
b = theano.shared(value=b_values, name='b')          
self.W = W        
 self.b = b    
def build(self, input, use_bias=False):         
if use_bias:            
 lin_output = T.dot(input, self.W) + self.b        
 else:            
 lin_output = T.dot(input, self.W)          
self.output = (lin_output if self.activation is None else self.activation(lin_output))        
# parameters of the model         
if use_bias:             
self.params = [self.W, self.b]        
else:             
self.params = [self.W]        
return self.output

2、重構整體模型的構建：

  ################################網路架構：1.初始化###########################        # 1.embedding層       
 self.emb_layer = EmbeddingLayer(U)        
# 2.卷積層        
self.conv_layers = []       
 for i in xrange(len(self.conf['filter_hs'])):            
filter_shape = filter_shapes[i]            
# print "filter_shape:", filter_shape            
pool_size = pool_sizes[i]            
conv_layer = LeNetConvPoolLayer(rng, image_shape=(self.conf['batch_size'], 1, self.img_h, self.conf['img_w']),                                           
 filter_shape=filter_shape, poolsize=pool_size, non_linear=self.conf['conv_non_linear'])            self.conv_layers.append(conv_layer)        
# 3.MLP(多層神經感知機，帶dropout)        
self.conf['hidden_units'][0] = feature_maps * len(self.conf['filter_hs'])        
self.classifier = MLPDropout(rng, layer_sizes=self.conf['hidden_units'],                                     activations=[eval(f_s) for f_s in self.conf['activations']],                                     dropout_rates=self.conf['dropout_rate'])        
#################################網路架構：2.連線網路#########################        # 1.embbeding層       
 emb_output = self.emb_layer.build(self.x)        
# 2.卷積層        
layer0_input = emb_output        
layer1_inputs = []        
for i in xrange(len(self.conf['filter_hs'])):            
conv_layer = self.conv_layers[i]            
layer1_input = conv_layer.build(layer0_input).flatten(2)           
 layer1_inputs.append(layer1_input)       
 layer1_input = T.concatenate(layer1_inputs, 1)        
self.classifier.build(layer1_input)        
###################提取模型引數########################################        
# define parameters of the model and update functions using adadelta        
params = self.classifier.params       
 for conv_layer in self.conv_layers:            
params += conv_layer.params       
 if self.conf["non_static"]:            
# if word vectors are allowed to change, add them as model parameters           
 params += [emb_output.Words]        
self.cost = self.classifier.negative_log_likelihood(self.y)        
self.dropout_cost = self.classifier.dropout_negative_log_likelihood(self.y)        
self.grad_updates = sgd_updates_adadelta(params, self.dropout_cost, self.conf['lr_decay'],                                            1e-6, self.conf['sqr_norm_lim'])

3、增加整體模型的類：CNN_Sen_Model()

類方法：

build_model()
train()
build_function()

整體模型的類和具體某層的類共同點在於build，也就是給定輸入獲取輸出的過程。不同點在於要少一個init_param()方法，因為整體模型不需要去初始化模型訓練的引數，直接從細節類獲取即可。另外還多一個train的方法用於模型的訓練。

具體可以看我的程式碼。

4、將模型引數儲存在model.json中：

{  
"img_w":300,  
"max_l":56, 
 "filter_hs":[3, 4, 5],  
"hidden_units":[100, 2], 
 "dropout_rate":[0.5],  
"shuffle_batch":true, 
"n_epochs":25,  
"batch_size":50, 
 "lr_decay":0.95, 
 "conv_non_linear":"relu",  
"activations":["Iden"],

"sqr_norm_lim":9, "non_static":false, "word_vectors":"word2vec"}

來跑跑看：

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python cer_main.py

output：

Using gpu device 0: GeForce GTX 960M (CNMeM is disabled, cuDNN not available) /home/cer/anaconda2/lib/python2.7/site-packages/theano/tensor/signal/
downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.  "downsample module has been moved to the theano.tensor.signal.pool module.") 
loading data... model architecture: CNN-static 
using: word2vec vectors 
model configs:  {u'dropout_rate': [0.5], u'hidden_units': [100, 2], u'word_vectors': u'word2vec', u'filter_hs': [3, 4, 5], u'conv_non_linear': u'relu', u'max_l': 56, u'img_w': 300, u'batch_size': 50, u'n_epochs': 25, u'sqr_norm_lim': 9, u'non_static': False, u'shuffle_batch': True, u'activations': [u'Iden'], u'lr_decay': 0.95} 
emb_output shape : [1029    1   64  300] 
conv_layer shape : [1029  100    1    1] 
conv_layer shape : [1029  100    1    1] 
conv_layer shape : [1029  100    1    1] 
... training 
epoch: 1, training time: 6.09 secs, train perf: 77.54 %, val perf: 73.79 % 
epoch: 2, training time: 6.05 secs, train perf: 84.10 %, val perf: 76.53 % 
epoch: 3, training time: 5.84 secs, train perf: 83.85 %, val perf: 76.32 % 
epoch: 4, training time: 6.36 secs, train perf: 89.45 %, val perf: 78.32 % 
epoch: 5, training time: 6.01 secs, train perf: 94.51 %, val perf: 79.26 % 
epoch: 6, training time: 6.72 secs, train perf: 95.07 %, val perf: 78.63 % 
epoch: 7, training time: 6.96 secs, train perf: 98.09 %, val perf: 79.89 % 
epoch: 8, training time: 6.41 secs, train perf: 98.91 %, val perf: 80.00 % 
epoch: 9, training time: 6.19 secs, train perf: 99.39 %, val perf: 78.63 % 
epoch: 10, training time: 6.57 secs, train perf: 98.83 %, val perf: 78.84 % 
epoch: 11, training time: 6.84 secs, train perf: 99.68 %, val perf: 80.00 % 
epoch: 12, training time: 5.84 secs, train perf: 99.84 %, val perf: 78.74 % 
epoch: 13, training time: 5.93 secs, train perf: 99.82 %, val perf: 79.16 % 
epoch: 14, training time: 5.94 secs, train perf: 99.95 %, val perf: 78.63 % 
epoch: 15, training time: 6.39 secs, train perf: 99.94 %, val perf: 78.42 % 
epoch: 16, training time: 6.92 secs, train perf: 99.95 %, val perf: 79.16 % 
epoch: 17, training time: 6.83 secs, train perf: 99.98 %, val perf: 78.53 % 
epoch: 18, training time: 6.72 secs, train perf: 99.98 %, val perf: 79.26 % 
epoch: 19, training time: 5.97 secs, train perf: 99.98 %, val perf: 78.63 % 
epoch: 20, training time: 5.92 secs, train perf: 99.98 %, val perf: 78.63 % 
epoch: 21, training time: 6.56 secs, train perf: 99.98 %, val perf: 79.37 % 
epoch: 22, training time: 6.05 secs, train perf: 99.98 %, val perf: 78.95 % 
epoch: 23, training time: 6.69 secs, train perf: 99.98 %, val perf: 78.63 % 
epoch: 24, training time: 7.03 secs, train perf: 99.98 %, val perf: 78.84 % 
epoch: 25, training time: 6.06 secs, train perf: 99.98 %, val perf: 79.16 %

cv: 0, perf: 0.781341107872

結語

這篇文章記錄了這個CNN Sentence Classification的基礎論文和程式碼實現，並沒有關注調參，Yoon Kim的github提到了一篇關於這種模型調參的paper（http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow），有興趣可以去看看。

這個模型還有Tensorflow的實現(http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow)，同樣可以看看。

最後再附上我的程式碼(https://github.com/applenob/CNN_sentence)，裡面有很多中文註釋，喜歡可以star哦～～～

用CNN做句子分類：CNN Sentence Classification (with Theano code)

試著跑跑

程式碼梳理

process_data.py:

conv_net_classes.py:

conv_net_sentences.py:

結語

用CNN做句子分類：CNN Sentence Classification (with Theano code)

記初次除錯CNN做文字向量表示

Keras vs PyTorch vs Caffe：CNN實現對比

Pytorch-影象分類和CNN模型的遷移學習

一聽就懂：用Python做一個超簡單的小遊戲

機器學習——用卷積神經網路（CNN）實現手寫數字識別

用Python做一個安全攻防工具：埠嗅探器（2）

愛奇藝針對“會員專屬廣告”解答：為了更好地向用戶做推薦，感謝理解

用Python做一個安全攻防工具：埠嗅探器（6）

機器學習sklearn（86）：演算法例項（43）分類（22）樸素貝葉斯（五）貝葉斯分類器做文字分類

英特爾用 ViT 做密集預測效果超越卷積：效能提高 28%，線上可玩

矩陣分解就能擊敗深度學習！MIT釋出時序資料庫tspDB：用SQL做機器學習

Tensorflow實踐：用神經網路訓練分類器

嘿,我用Drone做CI

用Swift做了一款步驟指示器

從0開始用kotlin做CRM之郵件系統--2.4配置日誌log4j2

Redis能用來做什麼

用Python做一個久坐提醒小助手的示例程式碼

python用TensorFlow做影象識別的實現

偷懶大法好，用 selenium 做 web 端自動化測試

用CNN做句子分類：CNN Sentence Classification (with Theano code)

試著跑跑

程式碼梳理

process_data.py:

conv_net_classes.py:

conv_net_sentences.py:

結語

相關推薦