1. 程式人生 > >Trained VGG Model to Classify Objects in Photographs

Trained VGG Model to Classify Objects in Photographs

Convolutional neural networks are now capable of outperforming humans on some computer vision tasks, such as classifying images.

That is, given a photograph of an object, answer the question as to which of 1,000 specific objects the photograph shows.

A competition-winning model for this task is the VGG model by researchers at Oxford. What is important about this model, besides its capability of classifying objects in photographs, is that the model weights are freely available and can be loaded and used in your own models and applications.

In this tutorial, you will discover the VGG convolutional neural network models for image classification.

After completing this tutorial, you will know:

  • About the ImageNet dataset and competition and the VGG winning models.
  • How to load the VGG model in Keras and summarize its structure.
  • How to use the loaded VGG model to classifying objects in ad hoc photographs.

Let’s get started.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

  1. ImageNet
  2. The Oxford VGG Models
  3. Load the VGG Model in Keras
  4. Develop a Simple Photo Classifier

ImageNet

ImageNet is a research project to develop a large database of images with annotations, e.g. images and their descriptions.

The images and their annotations have been the basis for an image classification challenge called the ImageNet Large Scale Visual Recognition Challenge or ILSVRC since 2010. The result is that research organizations battle it out on pre-defined datasets to see who has the best model for classifying the objects in images.

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions.

For the classification task, images must be classified into one of 1,000 different categories.

For the last few years very deep convolutional neural network models have been used to win these challenges and results on the tasks have exceeded human performance.

Sample of Images from the ImageNet Dataset used in the ILSVRC Challenge

Sample of Images from the ImageNet Dataset used in the ILSVRC Challenge
Taken From “ImageNet Large Scale Visual Recognition Challenge”, 2015.

The Oxford VGG Models

Researchers from the Oxford Visual Geometry Group, or VGG for short, participate in the ILSVRC challenge.

In 2014, convolutional neural network models (CNN) developed by the VGG won the image classification tasks.

ILSVRC Results in 2014 for the Classification task

ILSVRC Results in 2014 for the Classification task

After the competition, the participants wrote up their findings in the paper:

They also made their models and learned weights available online.

This allowed other researchers and developers to use a state-of-the-art image classification model in their own work and programs.

This helped to fuel a rash of transfer learning work where pre-trained models are used with minor modification on wholly new predictive modeling tasks, harnessing the state-of-the-art feature extraction capabilities of proven models.

… we come up with significantly more accurate ConvNet architectures, which not only achieve the state-of-the-art accuracy on ILSVRC classification and localisation tasks, but are also applicable to other image recognition datasets, where they achieve excellent performance even when used as a part of a relatively simple pipelines (e.g. deep features classified by a linear SVM without fine-tuning). We have released our two best-performing models to facilitate further research.

VGG released two different CNN models, specifically a 16-layer model and a 19-layer model.

Refer to the paper for the full details of these models.

The VGG models are not longer state-of-the-art by only a few percentage points. Nevertheless, they are very powerful models and useful both as image classifiers and as the basis for new models that use image inputs.

In the next section, we will see how we can use the VGG model directly in Keras.

Load the VGG Model in Keras

The VGG model can be loaded and used in the Keras deep learning library.

Keras provides an Applications interface for loading and using pre-trained models.

Using this interface, you can create a VGG model using the pre-trained weights provided by the Oxford group and use it as a starting point in your own model, or use it as a model directly for classifying images.

In this tutorial, we will focus on the use case of classifying new images using the VGG model.

Keras provides both the 16-layer and 19-layer version via the VGG16 and VGG19 classes. Let’s focus on the VGG16 model.

The model can be created as follows:

12 from keras.applications.vgg16 import VGG16model=VGG16()

That’s it.

The first time you run this example, Keras will download the weight files from the Internet and store them in the ~/.keras/models directory.

Note that the weights are about 528 megabytes, so the download may take a few minutes depending on the speed of your Internet connection.

The weights are only downloaded once. The next time you run the example, the weights are loaded locally and the model should be ready to use in seconds.

We can use the standard Keras tools for inspecting the model structure.

For example, you can print a summary of the network layers as follows:

123 from keras.applications.vgg16 import VGG16model=VGG16()print(model.summary())

You can see that the model is huge.

You can also see that, by default, the model expects images as input with the size 224 x 224 pixels with 3 channels (e.g. color).

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253 _________________________________________________________________Layer (type)                 Output Shape              Param #=================================================================input_1 (InputLayer)         (None, 224, 224, 3)       0_________________________________________________________________block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792_________________________________________________________________block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928_________________________________________________________________block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0_________________________________________________________________block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856_________________________________________________________________block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584_________________________________________________________________block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0_________________________________________________________________block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168_________________________________________________________________block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080_________________________________________________________________block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080_________________________________________________________________block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0_________________________________________________________________block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160_________________________________________________________________block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808_________________________________________________________________block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808_________________________________________________________________block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0_________________________________________________________________block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808_________________________________________________________________block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808_________________________________________________________________block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808_________________________________________________________________block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0_________________________________________________________________flatten (Flatten)            (None, 25088)             0_________________________________________________________________fc1 (Dense)                  (None, 4096)              102764544_________________________________________________________________fc2 (Dense)                  (None, 4096)              16781312_________________________________________________________________predictions (Dense)          (None, 1000)              4097000=================================================================Total params: 138,357,544Trainable params: 138,357,544Non-trainable params: 0_________________________________________________________________

We can also create a plot of the layers in the VGG model, as follows:

1234 from keras.applications.vgg16 import VGG16from keras.utils.vis_utils import plot_modelmodel=VGG16()plot_model(model,to_file='vgg.png')

Again, because the model is large, the plot is a little too large and perhaps unreadable. Nevertheless, it is provided below.

Plot of Layers in the VGG Model

Plot of Layers in the VGG Model

The VGG() class takes a few arguments that may only interest you if you are looking to use the model in your own project, e.g. for transfer learning.

For example:

  • include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
  • weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
  • input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
  • input_shape (None): The size of images that the model is expected to take if you change the input layer.
  • pooling (None): The type of pooling to use when you are training a new set of output layers.
  • classes (1000): The number of classes (e.g. size of output vector) for the model.

Next, let’s look at using the loaded VGG model to classify ad hoc photographs.

Develop a Simple Photo Classifier

Let’s develop a simple image classification script.

1. Get a Sample Image

First, we need an image we can classify.

You can download a random photograph of a coffee mug from Flickr here.

Coffee Mug

Coffee Mug
Photo by jfanaian, some rights reserved.

Download the image and save it to your current working directory with the filename ‘mug.jpg‘.

2. Load the VGG Model

Load the weights for the VGG-16 model, as we did in the previous section.

123 from keras.applications.vgg16 import VGG16# load the modelmodel=VGG16()

3. Load and Prepare Image

Next, we can load the image as pixel data and prepare it to be presented to the network.

Keras provides some tools to help with this step.

First, we can use the load_img() function to load the image and resize it to the required size of 224×224 pixels.

123 from keras.preprocessing.image import load_img# load an image from fileimage=load_img('mug.jpg',target_size=(224,224))

Next, we can convert the pixels to a NumPy array so that we can work with it in Keras. We can use the img_to_array() function for this.

123 from keras.preprocessing.image import img_to_array# convert the image pixels to a numpy arrayimage=img_to_array(image)

The network expects one or more images as input; that means the input array will need to be 4-dimensional: samples, rows, columns, and channels.

We only have one sample (one image). We can reshape the array by calling reshape() and adding the extra dimension.

12 # reshape data for the modelimage=image.reshape((1,image.shape[0],image.shape[1],image.shape[2]))

Next, the image pixels need to be prepared in the same way as the ImageNet training data was prepared. Specifically, from the paper:

The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.

Keras provides a function called preprocess_input() to prepare new input for the network.

123 from keras.applications.vgg16 import preprocess_input# prepare the image for the VGG modelimage=preprocess_input(image)

We are now ready to make a prediction for our loaded and prepared image.

4. Make a Prediction

We can call the predict() function on the model in order to get a prediction of the probability of the image belonging to each of the 1000 known object types.

12 # predict the probability across all output classesyhat=model.predict(image)

Nearly there, now we need to interpret the probabilities.

5. Interpret Prediction

Keras provides a function to interpret the probabilities called decode_predictions().

It can return a list of classes and their probabilities in case you would like to present the top 3 objects that may be in the photo.

We will just report the first most likely object.

1234567 from keras.applications.vgg16 import decode_predictions# convert the probabilities to class labelslabel=decode_predictions(yhat)# retrieve the most likely result, e.g. highest probabilitylabel=label[0][0]# print the classificationprint('%s (%.2f%%)'%(label[1],label[2]*100))

And that’s it.

Complete Example

Tying all of this together, the complete example is listed below:

1234567891011121314151617181920212223 from keras.preprocessing.image import load_imgfrom keras.preprocessing.image import img_to_arrayfrom keras.applications.vgg16 import preprocess_inputfrom keras.applications.vgg16 import decode_predictionsfrom keras.applications.vgg16 import VGG16# load the modelmodel=VGG16()# load an image from fileimage=load_img('mug.jpg',target_size=(224,224))# convert the image pixels to a numpy arrayimage=img_to_array(image)# reshape data for the modelimage=image.reshape((1,image.shape[0],image.shape[1],image.shape[2]))

相關推薦

Trained VGG Model to Classify Objects in Photographs

Tweet Share Share Google Plus Convolutional neural networks are now capable of outperforming hum

[iOS] How to sort an NSMutableArray with custom objects in it?

範例1: I think this will do it: brandDescriptor = [[NSSortDescriptor alloc] initWithKey:@"brand" ascending:YES]; sortDescriptors = [NSArray arrayWithObject

Ask HN: How to model numerical energy data in Wolfram Alpha

I'm working on a dataset that contains energy supply and consumption data. This is just a hobby and idea is to do visualizations and simple moodelling base

Switching to a Probabilistic Model for Venue Search in Foursquare

Switching to a Probabilistic Model for Venue Search in FoursquareOur Pilgrim contextual location technology (used by Foursquare City Guide and Foursquare S

Account Access to Objects In S3 Buckets

ACL permissions vary based on which S3 resource, bucket, or object that an ACL is applied to. For more information, see Access Control List (AC

Computers successfully trained to identify animals in photos

The artificial-intelligence breakthrough, detailed in a paper published in the scientific journal Methods in Ecology and Evolution, is described as a sign

An introduction to parsing text in Haskell with Parsec

util eof try xib reporting where its ner short Parsec makes parsing text very easy in Haskell. I write this as much for myself as for any

why does it suck to be an in-house programmer?

done lin programs man net soft control ams som Number one: you never get to do things the right way. You always have to do things the exp

DeepLearning to digit recognizer in kaggle

flags 權重 數據位 更新 multiple 就會 oss you 給定 DeepLearning to digit recongnizer in kaggle 近期在看deeplearning,於是就找了kaggle上字符識別進行練習。這裏我

How to Install wget in OS X如何在Mac OS X下安裝wget並解決configure: error:

configure openssl usr local 解壓 fix 官網下載 .org get 1.ftp://ftp.gnu.org/gnu/wget/官網下載最新的安裝包 wget-1.19.tar.gz 2.打開終端輸入 tar zxvf wget-1.9.1.ta

How to convert matrix to RDD[Vector] in spark

toarray kcon tex logs def supports iterator ati true The matrix is generated from SVD, and I am using the results from SVD to do clusteri

【論文閱讀-DL】《One Model To Learn Them All》閱讀

目前 自己 網絡 統一 也有 模式 nbsp 集中 one 概念:One/Zero-shot learning 訓練集中沒有樣本的學習;和transfer learning/domain adoption有關 NN很成功,但是每個領域都有自己的model;本文嘗試搞一個

[Recompose] Create Stream Behaviors to Push Props in React Components with mapPropsStream

() ppr and create tin tee rom other fun Rather than using Components to push streams into other Components, mapPropsStream allows you to

安裝Android studio出現'tools.jar' seems to be not in Android Studio classpath......的解決方法

eas 一個 origin java_home ems view 使用 分享 title 安裝Android studio出現‘tools.jar‘ seems to be not in Android Studio classpath......的解決方法 原創 201

How to Access Data in a Property Tree

clas 代碼 3.0 float itl compute iter () find 在屬性樹裏怎麽訪問數據?屬性樹類似於(幾乎是)一個標準容器,其值類型為pair。它具有通常的成員函數,如insert、push_back、find、erase等,當然可以使用這些函數來填充

python3.6.1環境配置出現Requirement already up-to-date: pip in c:python36libsite-packages決解方案

下載 date 保持 keyword require edits 文件夾 分號 file 本文轉載於:http://qoogle.cn/?id=39 前提: windows下同時安裝python2和python3, 應將python2和python3直接安裝在 C盤下面

using ThreadLocal to cache data in request scope

thread stack;reques/** * aim to cache the data that‘s accessed frequently and costly. * @param <K> * @param <V> */ public interface Cache&l

[Selenium+Java] How to Take Screenshot in Selenium WebDriver

pack ID save nsh cfi box screen clas pen Original URL: https://www.guru99.com/take-screenshot-selenium-webdriver.html Screenshots are de

How To install XRDP in UBUNTU 16.04

source accep tls .com dea wal enter href his 轉載自:http://www.techtogeek.com/how-to-install-xrdp-in-ubuntu-16-04/ by TechtoGeek · October 3

Python to list users in AWS

sch div iam school .com name print cli tran code import boto3 c1=boto3.client(‘iam‘) #list_users will be a dict users=c1.list_users() #t