1. 程式人生 > 實用技巧 >使用Keras和OpenCV完成人臉檢測和識別

使用Keras和OpenCV完成人臉檢測和識別

一、資料集選擇和實現思路

1、資料集說明:這裡用到資料集來自於百度AI Studio平臺的公共資料集,屬於實驗資料集,資料集本身較小因而會影響深度網路最終訓練的準確率。資料集連結:[https://aistudio.baidu.com/aistudio/datasetdetail/8325]:

2、使用說明:資料集解壓縮後有四類標註影象,此次只使用其中兩類做一個簡單的二分類,如有其他分類需求可自行修改相關的訓練程式碼,本人在此使用“jiangwen”和“zhangziyi”的分類。

如圖:

(需要說明的是,我這裡的face資料集資料夾放在專案資料夾下,專案資料夾是cascadeFace)

3、實現思路:使用OpenCV中提供的Haar級聯分類器進行面部檢測,扣取Haar分類器檢測到的面部影象,喂入已經訓練好的AlexNet卷積模型中獲取識別結果(當然我們將自己構建AlexNet並訓練它)。關於Haar的使用,我將在後面的測試程式碼部分講解;關於Haar的理論知識請參考[

https://www.cnblogs.com/zyly/p/9410563.html];

二、資料預處理

程式碼如下:

import os
import sys
import cv2
import numpy as np


"""預處理
"""
IMAGE_SIZE = 64


def resize_image(image, height=IMAGE_SIZE, width=IMAGE_SIZE):
    """按照指定尺寸調整影象大小
    """
    top, bottom, left, right = (0, 0, 0, 0)
    h, w, _ = image.shape
    # 找到最長的邊(對於長寬不等的影象)
    longest_edge = max(h, w)
    # 計算短邊需要增加多上畫素寬度使其與長邊等長
    if h < longest_edge:
        dh = longest_edge - h
        top = dh // 2
        bottom = dh - top
    elif w < longest_edge:
        dw = longest_edge - w
        left = dw // 2
        right = dw - left
    else:
        pass

    # RGB色彩
    BLACK = [0, 0, 0]
    # 給影象增加邊界,是影象長、寬等長,cv2.BORDED_CONSTANT指定邊界顏色由value指定
    constant = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=BLACK)
    return cv2.resize(constant, (height, width))


# 讀取訓練資料
images = []
labels = []

def read_path(path_name):
    for dir_item in os.listdir(path_name):
        # 從初始路徑開始疊加,合併成可識別操作路徑
        full_path = os.path.abspath(os.path.join(path_name, dir_item))
        if os.path.isdir(full_path):
            read_path(full_path)
        else:
            if dir_item.endswith('.jpg') or dir_item.endswith('.png'):
                image = cv2.imread(full_path)
                image = resize_image(image, IMAGE_SIZE, IMAGE_SIZE)
                # cv2.imwrite('1.jpg', image)
                images.append(image)
                labels.append(path_name)
    return images, labels


# 從指定路徑讀取訓練資料
def load_dataset(path_name):
    images, labels = read_path(path_name)
    # 將圖片轉換成四維陣列,尺寸:圖片數量 * IMAGE_SIZE * IMAGE_SIZE * 3
    # 圖片為 64*64 畫素,一個畫素3個顏色值
    images = np.array(images)
    print(images.shape)

    # 標註資料
    labels = np.array([0 if label.endswith('jiangwen') else 1 for label in labels])
    return images, labels


if __name__ == '__main__':
    if len(sys.argv) != 1:
        print('Usage: %s path_name\r\n' % (sys.argv[0]))
    else:
        images, labels = load_dataset('./face')


說明:resize_image()的功能是判斷影象是否長寬相等,如果不是則統一長寬,然後才呼叫cv2.resize()實現等比縮放,這樣確保了影象不會失真。

三、模型搭建與訓練

1、模型結構介紹

說明:這個結構參考圖很好的展示了AlexNet網路模型,AlexNet雖然如今已經是相對簡單基礎的卷積模型,但其引數量依然龐大,用作分類任務時其全連線層的百萬級引數量成為訓練網路的負擔,我們使用Dropout對半丟棄結點。還有一點需要說的就是我們的這次實驗在輸入資料的尺寸與上網路結構圖所顯示的不太一樣,具體的情況請閱讀接下來所展示的模型搭建與訓練程式碼。

2、模型引數表

(alexnet)

Layer (type)                   Output Shape            Param #   
conv2d_31 (Conv2D)           (None, 55, 55, 96)        28896     
_________________________________________________________________
activation_47 (Activation)   (None, 55, 55, 96)        0         
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_32 (Conv2D)           (None, 27, 27, 256)       614656    
_________________________________________________________________
activation_48 (Activation)   (None, 27, 27, 256)       0         
_________________________________________________________________
max_pooling2d_20(MaxPooling) (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_33 (Conv2D)           (None, 13, 13, 384)       885120    
_________________________________________________________________
activation_49 (Activation)   (None, 13, 13, 384)       0         
_________________________________________________________________
conv2d_34 (Conv2D)           (None, 13, 13, 384)       1327488   
_________________________________________________________________
activation_50 (Activation)   (None, 13, 13, 384)       0         
_________________________________________________________________
conv2d_35 (Conv2D)           (None, 13, 13, 256)       884992    
_________________________________________________________________
activation_51 (Activation)   (None, 13, 13, 256)       0         
_________________________________________________________________
max_pooling2d_21(MaxPooling) (None, 6, 6, 256)         0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_17 (Dense)             (None, 4096)              37752832  
_________________________________________________________________
activation_52 (Activation)   (None, 4096)              0         
_________________________________________________________________
dropout_15 (Dropout)         (None, 4096)              0         
_________________________________________________________________
dense_18 (Dense)             (None, 4096)              16781312  
_________________________________________________________________
activation_53 (Activation)   (None, 4096)              0         
_________________________________________________________________
dropout_16 (Dropout)         (None, 4096)              0         
_________________________________________________________________
dense_19 (Dense)             (None, 2)                 8194

3、模型搭建與訓練程式碼展示

import random

from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Dropout
from keras.layers import Activation
from keras.layers import Flatten
from keras.layers import Dense
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import load_model
from keras import backend

from load_data import load_dataset, resize_image, IMAGE_SIZE	# 這裡load_data是引入上面預處理程式碼


class Dataset:
    def __init__(self, path_name):
        # 訓練集
        self.train_images = None
        self.train_labels = None
        # 驗證集
        self.valid_images = None
        self.valid_labels = None
        # 測試集
        self.test_images = None
        self.test_labels = None
        # 資料集載入路徑
        self.path_name = path_name
        # 當前庫採取的維度順序
        self.input_shape = None

    # 載入資料集並按照交叉驗證劃分資料集再開始相關的預處理
    def load(self, img_rows=IMAGE_SIZE, img_cols=IMAGE_SIZE, img_channels=3, nb_classes=2):
        images, labels = load_dataset(self.path_name)
        train_images, valid_images, train_labels, valid_lables = train_test_split(images,
                                                                                  labels,
                                                                                  test_size=0.2,
                                                                                  random_state=random.randint(0, 100))
        _, test_images, _, test_labels = train_test_split(images,
                                                          labels,
                                                          test_size=0.3,
                                                          random_state=random.randint(0, 100))
        # 當前維度順序如果是'th',則輸入圖片資料時的順序為:channels, rows, cols; 否則:rows, cols, channels
        # 根據keras庫要求的維度重組訓練資料集
        if backend.image_dim_ordering() == 'th':
            train_images = train_images.reshape(train_images.shape[0], img_channels, img_rows, img_cols)
            valid_images = valid_images.reshape(valid_images.shape[0], img_channels, img_rows, img_cols)
            test_images = test_images.reshape(test_images.shape[0], img_channels, img_rows, img_cols)
            self.input_shape = (img_channels, img_rows, img_cols)
        else:
            train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, img_channels)
            valid_images = valid_images.reshape(valid_images.shape[0], img_rows, img_cols, img_channels)
            test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, img_channels)
            self.input_shape = (img_rows, img_cols, img_channels)

        # 輸出訓練集、驗證集、測試集數量
        print(train_images.shape[0], 'train samples')
        print(valid_images.shape[0], 'valid samples')
        print(test_images.shape[0], 'test samples')

        # 使用categorical_crossentropy作為損失,因此需要根據類別數量nb_classes將類別標籤進行one-hot編碼,分類類別為4類,所以轉換後的標籤維數為4
        train_labels = np_utils.to_categorical(train_labels, nb_classes)
        valid_lables = np_utils.to_categorical(valid_lables, nb_classes)
        test_labels = np_utils.to_categorical(test_labels, nb_classes)

        # 畫素資料浮點化以便進行歸一化
        train_images = train_images.astype('float32')
        valid_images = valid_images.astype('float32')
        test_images = test_images.astype('float32')

        # 歸一化
        train_images /= 255
        valid_images /= 255
        test_images /= 255

        self.train_images = train_images
        self.valid_images = valid_images
        self.test_images = test_images
        self.train_labels = train_labels
        self.valid_labels = valid_lables
        self.test_labels = test_labels


"""CNN構建
"""


class CNNModel:
    def __init__(self):
        self.model = None

    # 模型構建
    def build_model(self, dataset, nb_classes=2):
        # 構建一個空間網路模型(一個線性堆疊模型)
        self.model = Sequential()
        self.model.add(Convolution2D(96, 10, 10, input_shape=dataset.input_shape))
        self.model.add(Activation('relu'))
        self.model.add(MaxPooling2D(pool_size=(3, 3), strides=2))

        self.model.add(Convolution2D(256, 5, 5, border_mode='same'))
        self.model.add(Activation('relu'))
        self.model.add(MaxPooling2D(pool_size=(3, 3), strides=2))

        self.model.add(Convolution2D(384, 3, 3, border_mode='same'))
        self.model.add(Activation('relu'))

        self.model.add(Convolution2D(384, 3, 3, border_mode='same'))
        self.model.add(Activation('relu'))

        self.model.add(Convolution2D(256, 3, 3, border_mode='same'))
        self.model.add(Activation('relu'))
        self.model.add(MaxPooling2D(pool_size=(3, 3), strides=2))

        self.model.add(Flatten())
        self.model.add(Dense(4096))
        self.model.add(Activation('relu'))
        self.model.add(Dropout(0.5))

        self.model.add(Dense(4096))
        self.model.add(Activation('relu'))
        self.model.add(Dropout(0.5))

        self.model.add(Dense(nb_classes))
        self.model.add(Activation('softmax'))
        # 輸出模型概況
        self.model.summary()

    def train(self, dataset, batch_size=10, nb_epoch=5, data_augmentation=True):
        sgd = SGD(lr=0.01, decay=1e-6, momentum=0.7, nesterov=True)  # SGD+momentum的訓練器
        self.model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])  # 模型配置工作
        # 跳過資料提升
        if not data_augmentation:
            self.model.fit(dataset.train_images,
                           dataset.train_labels,
                           batch_size=batch_size,
                           nb_epoch=nb_epoch,
                           validation_data=(dataset.valid_images, dataset.valid_labels),
                           shuffle=True)
        # 使用實時資料提升
        else:
            # 定義資料生成器用於資料提升,其返回一個生成器物件datagen,datagen每被呼叫一次
            # 其順序生成一組資料,節省記憶體,該資料生成器其實就是python所定義的資料生成器
            datagen = ImageDataGenerator(featurewise_center=False,              # 是否使輸入資料去中心化(均值為0)
                                         samplewise_center=False,               # 是否使輸入資料的每個樣本均值為0
                                         featurewise_std_normalization=False,   # 是否資料標準化(輸入資料除以資料集的標準差)
                                         samplewise_std_normalization=False,    # 是否將每個樣本資料除以自身的標準差
                                         zca_whitening=False,                   # 是否對輸入資料施以ZCA白化
                                         rotation_range=20,                     # 資料提升時圖片隨機轉動的角度(範圍為0~180)
                                         width_shift_range=0.2,                 # 資料提升時圖片水平偏移的幅度(單位為圖片寬度的佔比,0~1之間的浮點數)
                                         height_shift_range=0.2,                # 垂直偏移幅度
                                         horizontal_flip=True,                  # 是否進行隨機水平翻轉
                                         vertical_flip=False                    # 是否進行隨機垂直翻轉
                                         )
            datagen.fit(dataset.train_images)
            self.model.fit_generator(datagen.flow(dataset.train_images,
                                                  dataset.train_labels,
                                                  batch_size=batch_size),
                                     samples_per_epoch=dataset.train_images.shape[0],
                                     nb_epoch=nb_epoch,
                                     validation_data=(dataset.valid_images, dataset.valid_labels)
                                     )

    MODEL_PATH = './cascadeface.model.h5'

    def save_model(self, file_path=MODEL_PATH):
        self.model.save(file_path)

    def load_model(self, file_path=MODEL_PATH):
        self.model = load_model(file_path)

    def evaluate(self, dataset):
        score = self.model.evaluate(dataset.test_images, dataset.test_labels, verbose=1)
        print('%s: %.2f%%' % (self.model.metrics_names[1], score[1] * 100))

    # 識別人臉
    def face_predict(self, image):
        # 根據後端系統確定維度順序
        if backend.image_dim_ordering() == 'th' and image.shape != (1, 3, IMAGE_SIZE, IMAGE_SIZE):
            image = resize_image(image)  # 尺寸必須與訓練集一致,都為:IMAGE_SIZE * IMAGE_SIZE
            image = image.reshape((1, 3, IMAGE_SIZE, IMAGE_SIZE))  # 與模型訓練不同,這裡是預測單張影象
        elif backend.image_dim_ordering() == 'tf' and image.shape != (1, IMAGE_SIZE, IMAGE_SIZE, 3):
            image = resize_image(image)
            image = image.reshape((1, IMAGE_SIZE, IMAGE_SIZE, 3))

        # 歸一化
        image = image.astype('float32')
        image /= 255
        # 給出輸入屬於各類別的概率
        result = self.model.predict_proba(image)
        print('result:', result)
        result = self.model.predict_classes(image)
        # 返回預測結果
        return result[0]


if __name__ == '__main__':
    dataset = Dataset('./face/')
    dataset.load()
    model = CNNModel()
    model.build_model(dataset)
    # 先前新增的測試build_model()函式的程式碼
    model.build_model(dataset)
    # 測試訓練函式的程式碼
    model.train(dataset)

if __name__ == '__main__':
    dataset = Dataset('./face/')
    dataset.load()
    model = CNNModel()
    model.build_model(dataset)
    model.train(dataset)
    model.save_model(file_path='./model/cascadeface.model.h5')

if __name__ == '__main__':
    dataset = Dataset('./face/')
    dataset.load()
    # 評估模型
    model = CNNModel()
    model.load_model(file_path='./model/cascadeface.model.h5')
    model.evaluate(dataset)

說明:相關超引數請參考程式碼註釋,並請注意程式碼中使用到的資料增強方法。

4、訓練結果

說明:因為資料集較小,所以模型最終準確率不如人意在意料之中,如果有機會拿到較好的資料集,可重新訓練一下。

儲存的模型檔案:

四、模型測試

1、Haar人臉識別:OK,這裡我們來說一下Haar分類器的使用,在OpenCV的開原始碼中提供了用於人臉檢測的.xml檔案,這些檔案封裝了已經通過haar分類器提取好的人臉特徵,其路徑是:opencv/sources/data/haarcascades,我的檔案所在位置如圖,

我們這裡使用到的是靜態影象檢測,所以將該資料夾下的haarcascade_frontalface_default.xml檔案拷貝到專案資料夾下,下面我們將使用該檔案完成人臉檢測,詳細的使用方法請參考下面的測試程式碼。

2、人臉檢測和識別測試程式碼

import cv2
from 人臉檢測與識別 import CNNModel		# 引入訓練程式碼中的模型物件


if __name__ == '__main__':

    # 載入模型
    model = CNNModel()
    model.load_model(file_path='./cascadeface.model.h5')
    # 人臉識別矩形框
    color = (0, 255, 0)
    # 人臉識別分類器路徑
    cascade_path = './haarcascade_frontalface_default.xml'

    image = cv2.imread('jiangwen.jpg')
    image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # 讀入分類器
    cascade = cv2.CascadeClassifier(cascade_path)
    # 利用分類器識別人臉區域
    faceRects = cascade.detectMultiScale(image_gray, scaleFactor=1.2, minNeighbors=5, minSize=(32, 32))

    if len(faceRects) > 0:
        for faceRect in faceRects:
            x, y, w, h = faceRect
            # 擷取影象臉部提交給識別模型識別
            img = image[y-10: y + h + 10, x-10: x + w + 10]
            faceID = model.face_predict(image)
            if faceID == 0:
                cv2.rectangle(image, (x - 10, y - 10), (x + w + 10, y + h + 10), color, thickness=2)
                cv2.putText(image,
                            'jiangwen',
                            (x + 30, y + 30),           # 座標
                            cv2.FONT_HERSHEY_SIMPLEX,   # 字型
                            1,                          # 字號
                            (0, 0, 255),                # 顏色
                            1)                          # 字的線寬
            elif faceID == 1:
                cv2.rectangle(image, (x - 10, y - 10), (x + w + 10, y + h + 10), color, thickness=2)
                cv2.putText(image,
                            'zhangziyi',
                            (x + 30, y + 30),
                            cv2.FONT_HERSHEY_SIMPLEX,
                            1,
                            (0, 0, 255),
                            1)
            else:
                pass

    cv2.imshow('image', image)
    cv2.waitKey(0)

執行結果如下: