多層感知機與簡易CNN的TensorFlow實現
阿新 • • 發佈:2020-12-10
下文使用TensorFlow實現了一個多層感知機和一個簡單的卷積神經網路模型,並應用於資料集MNIST。
所有程式碼以及所使用的的資料集檔案可以到作者的GitHub上下載,GitHub上提供的Jupyter Notebook文
件包含程式碼以及詳細註釋(程式碼中使用的每個函式的作用、引數說明)。
import tensorflow as tf
from tensorflow import keras print(tf.__version__)# 2.0.0
使用的TensorFlow版本為2.0.0
首先獲取資料集:
from tensorflow.keras.datasets import mnist (train_data, train_label), (test_data, test_label)= mnist.load_data('./mnist.npz')
這裡需要注意下載資料集時可能會出現HTTP連線超時的問題,可能需要VPN,也可以自行下載mnist.npz檔案
再將資料放到C:\Users\Administrator\.keras\datasets資料夾下。
多層感知機的實現:
# 定義模型 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ])
模型結構:
print(model.summary()) """ Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten (Flatten) (None, 784) 0 _________________________________________________________________ dense (Dense) (None, 256) 200960 _________________________________________________________________ dense_1 (Dense) (None, 10) 2570 ================================================================= Total params: 203,530 Trainable params: 203,530 Non-trainable params: 0 _________________________________________________________________ None"""
對資料做歸一化處理,並設定模型超引數,訓練模型:
# 將輸入資料歸一化 train_data = train_data / 255.0 test_data = test_data / 255.0 model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.5), loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(train_data, train_label, epochs=5, batch_size=256, validation_data=(test_data, test_label), validation_freq=1)
訓練結果:
Train on 60000 samples, validate on 10000 samples Epoch 1/5 60000/60000 [==============================] - 16s 259us/sample - loss: 0.3641 - accuracy: 0.8926 - val_loss: 0.2121 - val_accuracy: 0.9351 Epoch 2/5 60000/60000 [==============================] - 4s 63us/sample - loss: 0.1652 - accuracy: 0.9523 - val_loss: 0.1375 - val_accuracy: 0.9580 Epoch 3/5 60000/60000 [==============================] - 4s 63us/sample - loss: 0.1199 - accuracy: 0.9658 - val_loss: 0.1091 - val_accuracy: 0.9674 Epoch 4/5 60000/60000 [==============================] - 5s 85us/sample - loss: 0.0952 - accuracy: 0.9726 - val_loss: 0.1082 - val_accuracy: 0.9658 Epoch 5/5 60000/60000 [==============================] - 4s 70us/sample - loss: 0.0788 - accuracy: 0.9775 - val_loss: 0.0947 - val_accuracy: 0.9702 <tensorflow.python.keras.callbacks.History at 0x23036b99320>
除此之外,作者通過給上述多層感知機模型新增全連線層以及改變全連線層的尺寸,並觀察了這些操作對訓練結果
的影響。由於此文的目的是為了提供一個多層感知機的實現示例,因此不再展開,具體程式碼以及實驗結果可以在作
者GitHub上看到。
簡易CNN實現:
model5 = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(filters=6, kernel_size=5, activation='relu', input_shape=(28, 28, 1)), tf.keras.layers.MaxPool2D(pool_size=2, strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ])
模型結構:
Model: "sequential_7" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_4 (Conv2D) (None, 24, 24, 6) 156 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 12, 12, 6) 0 _________________________________________________________________ flatten_4 (Flatten) (None, 864) 0 _________________________________________________________________ dense_16 (Dense) (None, 256) 221440 _________________________________________________________________ dense_17 (Dense) (None, 10) 2570 ================================================================= Total params: 224,166 Trainable params: 224,166 Non-trainable params: 0 _________________________________________________________________ None
在訓練模型前需要將訓練資料的shape更改一下:
train_data = tf.reshape(train_data, (-1, 28, 28, 1))
test_data = tf.reshape(test_data, (-1, 28, 28, 1))
設定超引數並訓練模型:
model5.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy']) model5.fit(train_data, train_label, epochs=5, validation_split=0.1)
訓練結果:
Train on 54000 samples, validate on 6000 samples Epoch 1/5 54000/54000 [==============================] - 34s 622us/sample - loss: 0.2047 - accuracy: 0.9400 - val_loss: 0.0763 - val_accuracy: 0.9797 Epoch 2/5 54000/54000 [==============================] - 32s 594us/sample - loss: 0.0688 - accuracy: 0.9792 - val_loss: 0.0605 - val_accuracy: 0.9833 Epoch 3/5 54000/54000 [==============================] - 32s 600us/sample - loss: 0.0479 - accuracy: 0.9846 - val_loss: 0.0476 - val_accuracy: 0.9870 Epoch 4/5 54000/54000 [==============================] - 32s 593us/sample - loss: 0.0338 - accuracy: 0.9892 - val_loss: 0.0566 - val_accuracy: 0.9855 Epoch 5/5 54000/54000 [==============================] - 35s 649us/sample - loss: 0.0258 - accuracy: 0.9916 - val_loss: 0.0522 - val_accuracy: 0.9858 <tensorflow.python.keras.callbacks.History at 0x230380d9518>
在訓練model5之前,作者使用了同樣結構的model4,但是優化器選用的SGD,學習率設為0.9,在最後訓練完成後
發現整個模型的準確率=0.1就像沒有訓練過的隨機初始化的模型一樣,因此作者將模型的優化器修改為Adam,並將
學習率設為0.001,即model4,訓練完成後模型準確率為0.98。之後作者僅修改學習率,優化器仍然使用SGD,發現
模型訓練完成後的準確率雖然沒有優化器尋味Adam的版本好,但也有0.96。可以看出模型的超引數選取十分重要。