1. 程式人生 > 程式設計 >keras 實現輕量級網路ShuffleNet教程

keras 實現輕量級網路ShuffleNet教程

ShuffleNet是由曠世發表的一個計算效率極高的CNN架構,它是專門為計算能力非常有限的移動裝置(例如,10-150 MFLOPs)而設計的。該結構利用組卷積和通道混洗兩種新的運算方法,在保證計算精度的同時,大大降低了計算成本。ImageNet分類和MS COCO物件檢測實驗表明,在40 MFLOPs的計算預算下,ShuffleNet的效能優於其他結構,例如,在ImageNet分類任務上,ShuffleNet的top-1 error 7.8%比最近的MobileNet低。在基於arm的移動裝置上,ShuffleNet比AlexNet實際加速了13倍,同時保持了相當的準確性。

Paper:ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile

Github:https://github.com/zjn-ai/ShuffleNet-keras

網路架構

組卷積

組卷積其實早在AlexNet中就用過了,當時因為GPU的視訊記憶體不足因而利用組卷積分配到兩個GPU上訓練。簡單來講,組卷積就是將輸入特徵圖按照通道方向均分成多個大小一致的特徵圖,如下圖所示左面是輸入特徵圖右面是均分後的特徵圖,然後對得到的每一個特徵圖進行正常的卷積操作,最後將輸出特徵圖按照通道方向拼接起來就可以了。

keras 實現輕量級網路ShuffleNet教程

目前很多框架都支援組卷積,但是tensorflow真的不知道在想什麼,到現在還是不支援組卷積,只能自己寫,因此效率肯定不及其他框架原生支援的方法。組卷積層的程式碼編寫思路就與上面所說的原理完全一致,程式碼如下。

def _group_conv(x,filters,kernel,stride,groups):
 """
 Group convolution
 # Arguments
  x: Tensor,input tensor of with `channels_last` or 'channels_first' data format
  filters: Integer,number of output channels
  kernel: An integer or tuple/list of 2 integers,specifying the
   width and height of the 2D convolution window.
  strides: An integer or tuple/list of 2 integers,specifying the strides of the convolution along the width and height.
   Can be a single integer to specify the same value for
   all spatial dimensions.
  groups: Integer,number of groups per channel
  
 # Returns
  Output tensor
 """
 channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
 in_channels = K.int_shape(x)[channel_axis]
 
 # number of input channels per group
 nb_ig = in_channels // groups
 # number of output channels per group
 nb_og = filters // groups
 
 gc_list = []
 # Determine whether the number of filters is divisible by the number of groups
 assert filters % groups == 0
 
 for i in range(groups):
  if channel_axis == -1:
   x_group = Lambda(lambda z: z[:,:,i * nb_ig: (i + 1) * nb_ig])(x)
  else:
   x_group = Lambda(lambda z: z[:,i * nb_ig: (i + 1) * nb_ig,:])(x)
  gc_list.append(Conv2D(filters=nb_og,kernel_size=kernel,strides=stride,padding='same',use_bias=False)(x_group))
  
 return Concatenate(axis=channel_axis)(gc_list)

通道混洗

通道混洗是這篇paper的重點,儘管組卷積大量減少了計算量和引數,但是通道之間的資訊交流也受到了限制因而模型精度肯定會受到影響,因此作者提出通道混洗,在不增加引數量和計算量的基礎上加強通道之間的資訊交流,如下圖所示。

keras 實現輕量級網路ShuffleNet教程

通道混洗層的程式碼實現很巧妙參考了別人的實現方法。通過下面的程式碼說明,d代表特徵圖的通道序號,x是經過通道混洗後的通道順序。

>>> d = np.array([0,1,2,3,4,5,6,7,8]) 
>>> x = np.reshape(d,(3,3)) 
>>> x = np.transpose(x,[1,0]) # 轉置
>>> x = np.reshape(x,(9,)) # 平鋪
'[0 1 2 3 4 5 6 7 8] --> [0 3 6 1 4 7 2 5 8]' 

利用keras後端實現程式碼:

def _channel_shuffle(x,groups):
 """
 Channel shuffle layer
 
 # Arguments
  x: Tensor,input tensor of with `channels_last` or 'channels_first' data format
  groups: Integer,number of groups per channel
  
 # Returns
  Shuffled tensor
 """
 
 if K.image_data_format() == 'channels_last':
  height,width,in_channels = K.int_shape(x)[1:]
  channels_per_group = in_channels // groups
  pre_shape = [-1,height,groups,channels_per_group]
  dim = (0,3)
  later_shape = [-1,in_channels]
 else:
  in_channels,width = K.int_shape(x)[1:]
  channels_per_group = in_channels // groups
  pre_shape = [-1,channels_per_group,width]
  dim = (0,4)
  later_shape = [-1,in_channels,width]
 
 x = Lambda(lambda z: K.reshape(z,pre_shape))(x)
 x = Lambda(lambda z: K.permute_dimensions(z,dim))(x) 
 x = Lambda(lambda z: K.reshape(z,later_shape))(x)
 
 return x

ShuffleNet Unit

ShuffleNet的主要構成單元。下圖中,a圖為深度可分離卷積的基本架構,b圖為1步長時用的單元,c圖為2步長時用的單元。

keras 實現輕量級網路ShuffleNet教程

ShuffleNet架構

注意,對於第二階段(Stage2),作者沒有在第一個1×1卷積上應用組卷積,因為輸入通道的數量相對較少。

keras 實現輕量級網路ShuffleNet教程

環境

Python 3.6

Tensorlow 1.13.1

Keras 2.2.4

實現

支援channel first或channel last

# -*- coding: utf-8 -*-
"""
Created on Thu Apr 25 18:26:41 2019
@author: zjn
"""
import numpy as np
from keras.callbacks import LearningRateScheduler
from keras.models import Model
from keras.layers import Input,Conv2D,Dropout,Dense,GlobalAveragePooling2D,Concatenate,AveragePooling2D
from keras.layers import Activation,BatchNormalization,add,Reshape,ReLU,DepthwiseConv2D,MaxPooling2D,Lambda
from keras.utils.vis_utils import plot_model
from keras import backend as K
from keras.optimizers import SGD
 
def _group_conv(x,groups):
 """
 Group convolution
 
 # Arguments
  x: Tensor,number of groups per channel
  
 # Returns
  Output tensor
 """
 
 channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
 in_channels = K.int_shape(x)[channel_axis]
 
 # number of input channels per group
 nb_ig = in_channels // groups
 # number of output channels per group
 nb_og = filters // groups
 
 gc_list = []
 # Determine whether the number of filters is divisible by the number of groups
 assert filters % groups == 0
 
 for i in range(groups):
  if channel_axis == -1:
   x_group = Lambda(lambda z: z[:,use_bias=False)(x_group))
  
 return Concatenate(axis=channel_axis)(gc_list)
def _channel_shuffle(x,number of groups per channel
  
 # Returns
  Shuffled tensor
 """
 if K.image_data_format() == 'channels_last':
  height,later_shape))(x)
 
 return x
 
def _shufflenet_unit(inputs,stage,bottleneck_ratio=0.25):
 """
 ShuffleNet unit
 
 # Arguments
  inputs: Tensor,number of groups per channel
  stage: Integer,stage number of ShuffleNet
  bottleneck_channels: Float,bottleneck ratio implies the ratio of bottleneck channels to output channels
   
 # Returns
  Output tensor
  
 # Note
  For Stage 2,we(authors of shufflenet) do not apply group convolution on the first pointwise layer 
  because the number of input channels is relatively small.
 """
 channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
 in_channels = K.int_shape(inputs)[channel_axis]
 bottleneck_channels = int(filters * bottleneck_ratio)
 
 if stage == 2:
  x = Conv2D(filters=bottleneck_channels,strides=1,use_bias=False)(inputs)
 else:
  x = _group_conv(inputs,bottleneck_channels,(1,1),groups)
 x = BatchNormalization(axis=channel_axis)(x)
 x = ReLU()(x)
 
 x = _channel_shuffle(x,groups)
 x = DepthwiseConv2D(kernel_size=kernel,depth_multiplier=1,use_bias=False)(x)
 x = BatchNormalization(axis=channel_axis)(x)
  
 if stride == 2:
  x = _group_conv(x,filters - in_channels,groups)
  x = BatchNormalization(axis=channel_axis)(x)
  avg = AveragePooling2D(pool_size=(3,3),strides=2,padding='same')(inputs)
  x = Concatenate(axis=channel_axis)([x,avg])
 else:
  x = _group_conv(x,groups)
  x = BatchNormalization(axis=channel_axis)(x)
  x = add([x,inputs])
 return x
 
def _stage(x,repeat,stage):
 """
 Stage of ShuffleNet
 
 # Arguments
  x: Tensor,number of groups per channel
  repeat: Integer,total number of repetitions for a shuffle unit in every stage
  stage: Integer,stage number of ShuffleNet
  
 # Returns
  Output tensor
 """
 x = _shufflenet_unit(x,stage)
 
 for i in range(1,repeat):
  x = _shufflenet_unit(x,stage)
 return x
 
def ShuffleNet(input_shape,classes):
 """
 ShuffleNet architectures
 
 # Arguments
  input_shape: An integer or tuple/list of 3 integers,shape
   of input tensor
  k: Integer,number of classes to predict
  
 # Returns
  A keras model
 """
 inputs = Input(shape=input_shape)
 
 x = Conv2D(24,use_bias=True,activation='relu')(inputs)
 x = MaxPooling2D(pool_size=(3,padding='same')(x)
 
 x = _stage(x,filters=384,kernel=(3,groups=8,repeat=4,stage=2)
 x = _stage(x,filters=768,repeat=8,stage=3)
 x = _stage(x,filters=1536,stage=4)
 
 x = GlobalAveragePooling2D()(x)
 
 x = Dense(classes)(x)
 predicts = Activation('softmax')(x)
 model = Model(inputs,predicts)
 return model
 
if __name__ == '__main__':
 model = ShuffleNet((224,224,1000)
 #plot_model(model,to_file='ShuffleNet.png',show_shapes=True)

以上這篇keras 實現輕量級網路ShuffleNet教程就是小編分享給大家的全部內容了,希望能給大家一個參考,也希望大家多多支援我們。