其次,對驗證碼進行去噪、切分成16*16的4個字母的單個圖片,並且根據圖片的名字,放在不同的資料夾下面,一共可以得到30個資料夾,每個資料夾下面分別放的改分類的單張16 * 16的圖片,如圖所示:





  1. 首先匯入對應的庫

    TensorFlow and tf.keras

    import tensorflow as tf
    from tensorflow import keras

    Helper libraries

    import numpy as np


    import matplotlib.pyplot as plt


    import os


    import matplotlib.image as mpimg
    from PIL import Image


    from sklearn import metrics

  2. 讀取圖片資料



class_names = ['2', '3', '4', '5', '6', '7', '8', '9',

通過遍歷分類資料組的資料,來實現遍歷整個圖片資料資料夾的目的,最終實現讀取所有驗證碼圖片並且其標籤的資料。在讀取到具體某張圖片資料時,獲取圖片的畫素矩陣為16 * 16,使用im = Image.open(path)獲取,通過im = (np.expand_dims(im,0))轉換為(1 *16 *16)的矩陣,並且通過numpy的vstack方法使其豎直方向上的疊加至最終的x,x矩陣最終的維度為(圖片數量 * 16 *16)。讀取迴圈中的分類名稱為圖片的標籤,型別轉換是和圖片畫素一樣的道理。

def laod_data(source_dir):
    x =None
    y =None
    for n in range(len(class_names)):
        d = source_dir+class_names[n]
        ldata = os.listdir(d)
        for i in range(0,len(ldata)):
            path = os.path.join(d,ldata[i])     
            im = Image.open(path)
            im = Image.fromarray(np.uint8(im))
            im = np.array(im) #(16*16)
            im = (np.expand_dims(im,0))#(1*16*16)
            if x is None:
                x = im
                x = np.vstack((x,im))
            if y is None:
                y = n
                y = np.vstack((y,n))
    return x,y


 	print('load data start...')
    train_dir = r'C:\\Users\\chenyang\\Desktop\\all\\training\\'
    test_dir = r'C:\\Users\\chenyang\\Desktop\\all\\testing\\'
    train_images, train_labels = laod_data(train_dir)

    test_images, test_labels = laod_data(test_dir)

    print('load data end...')
  1. 定義模型結構
  • 在keras中定義模型結構十分的方便,具體程式碼如下:

    model = keras.Sequential([
    keras.layers.Flatten(input_shape=(16, 16)),
    keras.layers.Dense(512, activation=tf.nn.relu),
    keras.layers.Dropout(0.4, noise_shape=None, seed=None),
    keras.layers.Dense(30, activation=tf.nn.softmax)





Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 256)               0         
dense (Dense)                (None, 512)               131584    
dropout (Dropout)            (None, 512)               0         
dense_1 (Dense)              (None, 30)                15390     
Total params: 146,974
Trainable params: 146,974
Non-trainable params: 0
  • 定義優化器和損失函式



  1. 開始訓練

     epochs = 100
     print('training start...')
     hist = model.fit(train_images, train_labels, epochs=epochs,shuffle=True)
     test_loss, test_acc = model.evaluate(test_images, test_labels)
     print('Test accuracy:', test_acc)
     predict = model.predict(test_images)
  • model.fit為keras的訓練方法,在這裡輸入的引數有,train_images是我們讀取的圖片畫素的矩陣,train_labels為訓練圖片的標籤,epochs為訓練的迭代輪數,shuffle為訓練的時候是否打亂資料的順序,這個預設是true的,返回一個history物件,這個用處後面會說到。
  • model.evaluate是keras的一個評估方法,輸入為測試資料和其標籤,會返回一個損失率和準確率。
  • model.save這是keras的模型儲存方法,輸入引數為模型的儲存路徑。對應的有載入方法keras.models.load_model,輸入引數為載入的路徑。
  • model.predict是keras的預測方法,輸入為圖片的畫素,返回的結果為預測為每個分類的概率。


training start...
Epoch 1/100
27311/27311 [==============================] - 4s 152us/step - loss: 0.6259 - acc: 0.8501
Epoch 2/100
27311/27311 [==============================] - 3s 109us/step - loss: 0.2536 - acc: 0.9357
Epoch 3/100
27311/27311 [==============================] - 3s 127us/step - loss: 0.1968 - acc: 0.9518
Epoch 4/100
27311/27311 [==============================] - 3s 104us/step - loss: 0.1604 - acc: 0.9607
Epoch 5/100
27311/27311 [==============================] - 3s 114us/step - loss: 0.1392 - acc: 0.9664


Epoch 96/100
27311/27311 [==============================] - 3s 103us/step - loss: 0.0330 - acc: 0.9929
Epoch 97/100
27311/27311 [==============================] - 3s 97us/step - loss: 0.0348 - acc: 0.9929
Epoch 98/100
27311/27311 [==============================] - 3s 103us/step - loss: 0.0346 - acc: 0.9925
Epoch 99/100
27311/27311 [==============================] - 3s 96us/step - loss: 0.0336 - acc: 0.9930
Epoch 100/100
27311/27311 [==============================] - 3s 105us/step - loss: 0.0306 - acc: 0.9936
11671/11671 [==============================] - 0s 39us/step
Test accuracy: 0.9906606117931984
  1. F值和混淆矩陣


 	pres = []
    for i in predict:
        tool = np.argmax(i)
    tests = []
    test_labels = test_labels.T
    li = test_labels.tolist()
    # 評估
    print("Precision, Recall and F1-Score...")
    print(metrics.classification_report(li[0],pres,target_names =class_names))
    # 混淆矩陣
    print("Confusion Matrix...")
    cm = metrics.confusion_matrix(li[0],pres)


Precision, Recall and F1-Score...
             precision    recall  f1-score   support

          2       0.99      0.98      0.98       359
          3       0.99      0.99      0.99       400
          4       0.99      1.00      1.00       405
          5       0.99      0.98      0.99       384
          6       0.99      0.99      0.99       409
          7       1.00      0.99      1.00       393
          8       1.00      0.99      0.99       417
          9       0.99      0.99      0.99       400
          A       1.00      0.99      1.00       376
          B       0.99      1.00      0.99       368
          C       0.99      0.97      0.98       409
          D       0.99      1.00      0.99       405
          E       1.00      0.99      0.99       411
          F       0.99      0.99      0.99       373
          G       0.97      0.99      0.98       380
          H       0.99      0.99      0.99       353
          J       1.00      0.99      0.99       402
          K       0.99      1.00      1.00       351
          M       0.99      0.99      0.99       362
          N       1.00      0.99      1.00       381
          P       0.99      0.99      0.99       377
          R       0.99      0.99      0.99       407
          S       0.99      0.98      0.99       363
          T       0.99      0.99      0.99       431
          U       1.00      0.99      0.99       384
          V       0.98      0.99      0.98       423
          W       0.99      0.99      0.99       351
          X       1.00      0.99      0.99       413
          Y       0.99      0.98      0.99       385
          Z       0.98      0.99      0.99       399

avg / total       0.99      0.99      0.99     11671

Confusion Matrix...
[[353   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   6]
 [  0 398   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0 405   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   1   0 378   3   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   2   0   0   0   0   0   0   0]
 [  0   0   0   2 406   0   0   0   0   0   0   0   0   0   0   1   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0 390   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   1   0   0   0   0   0   0   0   0   2]
 [  0   0   0   0   0   0 414   0   0   3   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   1   0   0   0   0   0 395   0   1   0   0   0   0   0   0   0   0
    0   0   1   0   1   1   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0 374   0   0   0   0   0   0   0   0   0
    2   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   1   0   0 367   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  2   0   0   0   0   0   0   0   0   0 397   1   1   0   8   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   1   0   0   0   0   0   0   0   0 403   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   1]
 [  1   0   0   1   1   0   0   0   0   1   0   0 407   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   2   0   0   0   0   0   0   0   0   0   0 371   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   1   0   0   0   0   0   2   1   0   0 376   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   1   0   0   0   0   0   0   0 350   1   0
    0   0   0   0   0   0   0   1   0   0   0   0]
 [  1   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0 398   0
    0   0   0   0   0   1   0   0   0   0   0   1]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 350
    0   0   0   1   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   2   0   0   0   0
  360   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0
    0 379   0   0   0   0   0   0   1   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0 373   4   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   2 404   0   0   0   0   0   0   1   0]
 [  1   1   0   2   0   0   0   0   0   0   0   0   0   0   1   1   0   0
    0   0   0   0 357   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0
    0   0   1   0   0 427   1   0   0   0   1   0]
 [  0   0   0   0   0   0   0   0   0   0   0   1   1   0   0   0   0   1
    0   0   0   0   0   0 380   1   0   0   0   0]
 [  0   0   0   0   1   0   0   2   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0 418   1   0   1   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   1   0   0   0   0   0   1 349   0   0   0]
 [  0   0   0   0   0   0   0   1   1   0   0   0   0   0   0   0   0   1
    0   0   0   0   0   0   0   2   0 407   1   0]
 [  0   0   0   0   0   0   0   0   0   0   1   0   0   0   1   0   0   0
    0   0   0   0   0   1   0   3   0   0 379   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   1   0   0   0   1   0 397]]
  • metrics.classification_report(li[0],pres,target_names =class_names)為sklean的分類結果的輸出,輸入引數分別為正常的標籤,預測的標籤,分類的實際名稱
  • metrics.confusion_matrix(li[0],pres)為sklean的混淆矩陣的輸入,輸入引數分別為正常的標籤,預測的標籤
  1. 繪畫訓練過程中的準確率和損失率


	history_dict = hist.history
    acc = hist.history['acc']
    loss = hist.history['loss']
    epochs = range(1, len(acc) + 1)
    # "bo" is for "blue dot"
    plt.plot(epochs, loss, 'r', label='Training loss')
    plt.plot(epochs, acc, 'b', label='Training acc')
    # b is for "solid blue line"
    plt.title('Training loss')




