keras+CNN影象分類
我們的深度學習資料集包括1,191張口袋妖怪影象,(存在於口袋妖怪世界中的動物般的生物,流行的電視節目,視訊遊戲和交易卡系列)。
我們的目標是使用Keras和深度學習訓練卷積神經網路,以識別和分類這些神奇寶貝。
我們將認識到的口袋妖怪包括:
Bulbasaur(234影象)
Charmander(238影象)
Squirtle(223影象)
皮卡丘(234影象)
Mewtwo(239影象)
專案結構
├── dataset │ ├── bulbasaur [234 entries] │ ├── charmander [238 entries] │ ├── mewtwo [239 entries] │ ├── pikachu [234 entries] │ └── squirtle [223 entries] ├── examples [6 entries] ├── pyimagesearch │ ├── __init__.py │ └── smallervggnet.py ├── plot.png ├── lb.pickle ├── pokedex.model ├── classify.py └── train.py
有3個目錄:
dataset:包含五個類,每個類都有自己的子目錄,使解析類標籤變得容易。
示例:包含我們將用於測試CNN的影象。
pyimagesearch模組:包含我們的SmallerVGGNet模型類。
根目錄中有5個檔案:
plot.png:訓練指令碼執行後生成的訓練/測試精度和損失圖。
lb.pickle:我們的LabelBinarizer序列化物件檔案 - 它包含類名查詢mechamisn的類索引。
pokedex.model:這是我們的序列化Keras卷積神經網路模型檔案(即“權重檔案”)。
train.py:我們將使用此指令碼訓練我們的Keras CNN,繪製準確性/丟失,然後將CNN和標籤二進位制檔案序列化到磁碟。
classify.py:我們的測試指令碼。
使用的CNN架構是VGGNet網路的更小,更緊湊的變體,由Simonyan和Zisserman在他們的2014年論文“用於大規模影象識別的超深度卷積網路”中介紹。
類似VGGNet的架構的特點是:
僅使用堆疊在彼此頂部的3×3卷積層來增加深度
通過最大池化減小卷大小
在softmax分類器之前,網路末端的完全連線層
深度學習開發環境配置的例子:
- Configuring Ubuntu for deep learning with Python
- Setting up Ubuntu 16.04 + CUDA + GPU for deep learning with Python
- Configuring macOS for deep learning with Python
預配置的例項:
- Amazon AMI for deep learning with Python
- Microsoft’s data science virtual machine (DSVM) for deep learning
smallervggnet.py
# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K
class SmallerVGGNet:
@staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# CONV => RELU => POOL
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
width:影象寬度尺寸。
height:影象高度尺寸。
深度:影象的深度 - 也稱為通道數。
classes:資料集中的類數(這將影響我們模型的最後一層)。使用了5個Pokemon課程,但是每個物種下載了足夠的示例影象,可以使用807 Pokemon物種
採用CONV => RELU => POOL塊。
卷積層具有32個濾波器和3×3核心。 我們使用RELU啟用函式,然後進行批量標準化。
我們的POOL層使用3 x 3 POOL大小將空間尺寸從96 x 96快速縮小到32 x 32(我們將使用96 x 96 x 3輸入影象來訓練我們的網路,我們將在下一節中看到)。
從程式碼塊中可以看出,我們還將在網路架構中使用dropout。 Dropout通過將節點從當前層隨機斷開連線到下一層來工作。 在訓練批次期間隨機斷開連線的過程有助於自然地在模型中引入冗餘 - 層中沒有一個單個節點負責預測某個類,物件,邊緣或角落。
train.py
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")
# import the necessary packages
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from pyimagesearch.smallervggnet import SmallerVGGNet
import matplotlib.pyplot as plt
from imutils import paths
import numpy as np
import argparse
import random
import pickle
import cv2
import os
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset (i.e., directory of images)")
ap.add_argument("-m", "--model", required=True,
help="path to output model")
ap.add_argument("-l", "--labelbin", required=True,
help="path to output label binarizer")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
help="path to output accuracy/loss plot")
args = vars(ap.parse_args())
# initialize the number of epochs to train for, initial learning rate,
# batch size, and image dimensions
EPOCHS = 100
INIT_LR = 1e-3
BS = 32
IMAGE_DIMS = (96, 96, 3)
# initialize the data and labels
data = []
labels = []
# grab the image paths and randomly shuffle them
print("[INFO] loading images...")
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)
# loop over the input images
for imagePath in imagePaths:
# load the image, pre-process it, and store it in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
image = img_to_array(image)
data.append(image)
# extract the class label from the image path and update the
# labels list
label = imagePath.split(os.path.sep)[-2]
labels.append(label)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
print("[INFO] data matrix: {:.2f}MB".format(
data.nbytes / (1024 * 1000.0)))
# binarize the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
labels, test_size=0.2, random_state=42)
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, fill_mode="nearest")
# initialize the model
print("[INFO] compiling model...")
model = SmallerVGGNet.build(width=IMAGE_DIMS[1], height=IMAGE_DIMS[0],
depth=IMAGE_DIMS[2], classes=len(lb.classes_))
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
print("[INFO] training network...")
H = model.fit_generator(
aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY),
steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS, verbose=1)
# save the model to disk
print("[INFO] serializing network...")
model.save(args["model"])
# save the label binarizer to disk
print("[INFO] serializing label binarizer...")
f = open(args["labelbin"], "wb")
f.write(pickle.dumps(lb))
f.close()
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
N = EPOCHS
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="upper left")
plt.savefig(args["plot"])
python train.py --dataset dataset --model pokedex.model --labelbin lb.pickle
classify.py
# import the necessary packages
from keras.preprocessing.image import img_to_array
from keras.models import load_model
import numpy as np
import argparse
import imutils
import pickle
import cv2
import os
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
help="path to trained model model")
ap.add_argument("-l", "--labelbin", required=True,
help="path to label binarizer")
ap.add_argument("-i", "--image", required=True,
help="path to input image")
args = vars(ap.parse_args())
# load the image
image = cv2.imread(args["image"])
output = image.copy()
# pre-process the image for classification
image = cv2.resize(image, (96, 96))
image = image.astype("float") / 255.0
image = img_to_array(image)
image = np.expand_dims(image, axis=0)
# load the trained convolutional neural network and the label
# binarizer
print("[INFO] loading network...")
model = load_model(args["model"])
lb = pickle.loads(open(args["labelbin"], "rb").read())
# classify the input image
print("[INFO] classifying image...")
proba = model.predict(image)[0]
idx = np.argmax(proba)
label = lb.classes_[idx]
# we'll mark our prediction as "correct" of the input image filename
# contains the predicted label text (obviously this makes the
# assumption that you have named your testing image files this way)
filename = args["image"][args["image"].rfind(os.path.sep) + 1:]
correct = "correct" if filename.rfind(label) != -1 else "incorrect"
# build the label and draw the label on the image
label = "{}: {:.2f}% ({})".format(label, proba[idx] * 100, correct)
output = imutils.resize(output, width=400)
cv2.putText(output, label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX,
0.7, (0, 255, 0), 2)
# show the output image
print("[INFO] {}".format(label))
cv2.imshow("Output", output)
cv2.waitKey(0)
python classify.py --model pokedex.model --labelbin lb.pickle \
--image examples/charmander_counter.png
該模型的主要侷限之一是少量的訓練資料。測試了各種影象,有時分類不正確。當發生這種情況時,我更仔細地檢查輸入影象+網路,發現影象中最主要的顏色會顯著影響分類。
例如,影象中的大量紅色和橙色可能會返回“Charmander”作為標籤。同樣,影象中的大量黃色通常會產生“皮卡丘”標籤。
這部分是由於我們的輸入資料。大部分照片來自電影/電視節目中的粉絲插圖或劇照。而且,每個類只有有限的資料量(~225-250個影象)。
理想情況下,在訓練卷積神經網路時,每個類應至少有500-1,000個影象。
部署深度學習模型: