詳細解讀簡單的lstm的例項
阿新 • • 發佈:2019-01-31
本文是初學keras這兩天來,自己仿照addition_rnn.py,寫的一個例項,資料處理稍微有些不同,但是準確性相比addition_rnn.py 差一點,下面直接貼程式碼,
解釋和註釋都在程式碼裡邊。
<span style="font-family: Arial, Helvetica, sans-serif;">#coding:utf-8</span>
from keras.models import Sequential from keras.layers.recurrent import LSTM from utils import log from numpy import random import numpy as np from keras.layers.core import RepeatVector, TimeDistributedDense, Activation ''' 先用lstm實現一個計算加法的keras版本, 根據addition_rnn.py改寫 size: 500 10次: test_acu = 0.3050 base_acu= 0.3600 30次: rest_acu = 0.3300 base_acu= 0.4250 size: 50000 10次: test_acu: loss: 0.4749 - acc: 0.8502 - val_loss: 0.4601 - val_acc: 0.8539 base_acu: loss: 0.3707 - acc: 0.9008 - val_loss: 0.3327 - val_acc: 0.9135 20次: test_acu: loss: 0.1536 - acc: 0.9505 - val_loss: 0.1314 - val_acc: 0.9584 base_acu: loss: 0.0538 - acc: 0.9891 - val_loss: 0.0454 - val_acc: 0.9919 30次: test_acu: loss: 0.0671 - acc: 0.9809 - val_loss: 0.0728 - val_acc: 0.9766 base_acu: loss: 0.0139 - acc: 0.9980 - val_loss: 0.0502 - val_acc: 0.9839 ''' log = log() #defination the global variable training_size = 50000 hidden_size = 128 batch_size = 128 layers = 1 maxlen = 7 single_digit = 3 def generate_data(): log.info("generate the questions and answers") questions = [] expected = [] seen = set() while len(seen) < training_size: num1 = random.randint(1, 999) #generate a num [1,999] num2 = random.randint(1, 999) #用set來儲存又有排序,來保證只有不同資料和結果 key = tuple(sorted((num1,num2))) if key in seen: continue seen.add(key) q = '{}+{}'.format(num1,num2) query = q + ' ' * (maxlen - len(q)) ans = str(num1 + num2) ans = ans + ' ' * (single_digit + 1 - len(ans)) questions.append(query) expected.append(ans) return questions, expected class CharacterTable(): ''' encode: 將一個str轉化為一個n維陣列 decode: 將一個n為陣列轉化為一個str 輸入輸出分別為 character_table = [' ', '+', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] 如果一個question = [' 123+23'] 那個改question對應的陣列就是(7,12): 同樣expected最大是一個四位數[' 146']: 那麼ans對應的陣列就是[4,12] ''' def __init__(self, chars, maxlen): self.chars = sorted(set(chars)) ''' >>> b = [(c, i) for i, c in enumerate(a)] >>> dict(b) {' ': 0, '+': 1, '1': 3, '0': 2, '3': 5, '2': 4, '5': 7, '4': 6, '7': 9, '6': 8, '9': 11, '8': 10} 得出的結果是無序的,但是下面這種方式得出的結果是有序的 ''' self.char_index = dict((c, i) for i, c in enumerate(self.chars)) self.index_char = dict((i, c) for i, c in enumerate(self.chars)) self.maxlen = maxlen def encode(self, C, maxlen): X = np.zeros((maxlen, len(self.chars))) for i, c in enumerate(C): X[i, self.char_index[c]] = 1 return X def decode(self, X, calc_argmax=True): if calc_argmax: X = X.argmax(axis=-1) return ''.join(self.index_char[x] for x in X) chars = '0123456789 +' character_table = CharacterTable(chars,len(chars)) questions , expected = generate_data() log.info('Vectorization...') #失量化 inputs = np.zeros((len(questions), maxlen, len(chars))) #(5000, 7, 12) labels = np.zeros((len(expected), single_digit+1, len(chars))) #(5000, 4, 12) log.info("encoding the questions and get inputs") for i, sentence in enumerate(questions): inputs[i] = character_table.encode(sentence, maxlen=len(sentence)) #print("questions is ", questions[0]) #print("X is ", inputs[0]) log.info("encoding the expected and get labels") for i, sentence in enumerate(expected): labels[i] = character_table.encode(sentence, maxlen=len(sentence)) #print("expected is ", expected[0]) #print("y is ", labels[0]) log.info("total inputs is %s"%str(inputs.shape)) log.info("total labels is %s"%str(labels.shape)) log.info("build model") model = Sequential() ''' LSTM(output_dim, init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one', activation='tanh', inner_activation='hard_sigmoid', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0., dropout_U=0., **kwargs) output_dim: 輸出層的維數,或者可以用output_shape init: uniform(scale=0.05) :均勻分佈,最常用的。Scale就是均勻分佈的每個資料在-scale~scale之間。此處就是-0.05~0.05。scale預設值是0.05; lecun_uniform:是在LeCun在98年發表的論文中基於uniform的一種方法。區別就是lecun_uniform的scale=sqrt(3/f_in)。f_in就是待初始化權值矩陣的行。 normal:正態分佈(高斯分佈)。 Identity :用於2維方陣,返回一個單位陣. Orthogonal:用於2維方陣,返回一個正交矩陣. lstm預設 Zero:產生一個全0矩陣。 glorot_normal:基於normal分佈,normal的預設 sigma^2=scale=0.05,而此處sigma^2=scale=sqrt(2 / (f_in+ f_out)),其中,f_in和f_out是待初始化矩陣的行和列。 glorot_uniform:基於uniform分佈,uniform的預設scale=0.05,而此處scale=sqrt( 6 / (f_in +f_out)) ,其中,f_in和f_out是待初始化矩陣的行和列。 W_regularizer , b_regularizer and activity_regularizer: 官方文件: http://keras.io/regularizers/ from keras.regularizers import l2, activity_l2 model.add(Dense(64, input_dim=64, W_regularizer=l2(0.01), activity_regularizer=activity_l2(0.01))) 加入規則項主要是為了在小樣本資料下過擬合現象的發生,我們都知道,一半在訓練過程中解決過擬合現象的方法主要中兩種,一種是加入規則項(權值衰減), 第二種是加大資料量 很顯然,加大資料量一般是不容易的,而加入規則項則比較容易,所以在發生過擬合的情況下,我們一般都採用加入規則項來解決這個問題. ''' model.add(LSTM(hidden_size, input_shape=(maxlen, len(chars)))) #(7,12) 輸入層 ''' keras.layers.core.RepeatVector(n) 把1維的輸入重複n次。假設輸入維度為(nb_samples, dim),那麼輸出shape就是(nb_samples, n, dim) inputshape: 任意。當把這層作為某個模型的第一層時,需要用到該引數(元組,不包含樣本軸)。 outputshape:(nb_samples,nb_input_units) ''' model.add(RepeatVector(single_digit + 1)) #表示有多少個隱含層 for _ in range(layers): model.add(LSTM(hidden_size, return_sequences=True)) ''' TimeDistributedDense: 官方文件:http://keras.io/layers/core/#timedistributeddense keras.layers.core.TimeDistributedDense(output_dim,init='glorot_uniform', activation='linear', weights=None W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, input_dim=None, input_length=None) 這是一個基於時間維度的全連線層。主要就是用來構建RNN(遞迴神經網路)的,但是在構建RNN時需要設定return_sequences=True。 for example: # input shape: (nb_samples, timesteps,10) model.add(LSTM(5, return_sequences=True, input_dim=10)) # output shape: (nb_samples, timesteps, 5) model.add(TimeDistributedDense(15)) # output shape:(nb_samples, timesteps, 15) W_constraint: from keras.constraints import maxnorm model.add(Dense(64, W_constraint =maxnorm(2))) #限制權值的各個引數不能大於2 ''' model.add(TimeDistributedDense(len(chars))) model.add(Activation('softmax')) ''' 關於目標函式和優化函式,參考另外一片博文: http://blog.csdn.net/zjm750617105/article/details/51321915 ''' model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # Train the model each generation and show predictions against the validation dataset for iteration in range(1, 3): print() print('-' * 50) print('Iteration', iteration) model.fit(inputs, labels, batch_size=batch_size, nb_epoch=2, validation_split = 0.1) # Select 10 samples from the validation set at random so we can visualize errors model.get_config()