講解tf.estimator.Estimator tf.layers等高階API實現一個CNN
阿新 • • 發佈:2018-12-23
tf.contrib.layers.flatten
假設輸入(inputs)的第一個維度表示batch_size。在保持batch_size的同時,使輸入的shape變為: [batch_size, k]
tf.contrib.layers.flatten(
inputs,
outputs_collections=None,
scope=None
)
'''
Args:
inputs: A tensor of size [batch_size, ...].
outputs_collections: Collection to add the outputs.
scope: Optional scope for name_scope.
Returns:
A flattened tensor with shape [batch_size, k].
'''
tf.layers.conv2d
大部分內容來自這篇部落格
2D 卷積層的函式介面
這個層建立了一個卷積核,將輸入進行卷積來輸出一個 tensor。如果 use_bias 是 True(且提供了 bias_initializer),則一個偏差向量會被加到輸出中。最後,如果 activation 不是 None,啟用函式也會被應用到輸出中。
tf.layers.conv2d(
inputs,
filters,
kernel_size,
strides=(1, 1),
padding='valid',
data_format='channels_last' ,
dilation_rate=(1, 1),
activation=None,
use_bias=True,
kernel_initializer=None,
bias_initializer=tf.zeros_initializer(),
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
trainable=True ,
name=None,
reuse=None
)
'''
Arguments:
inputs: Tensor input.
filters: Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).
kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.
padding: One of "valid" or "same" (case-insensitive).
data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).
dilation_rate: An integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated convolution. Can be a single integer to specify the same value for all spatial dimensions. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any stride value != 1.
activation: Activation function. Set it to None to maintain a linear activation.
use_bias: Boolean, whether the layer uses a bias.
kernel_initializer: An initializer for the convolution kernel.
bias_initializer: An initializer for the bias vector. If None, the default initializer will be used.
kernel_regularizer: Optional regularizer for the convolution kernel.
bias_regularizer: Optional regularizer for the bias vector.
activity_regularizer: Optional regularizer function for the output.
kernel_constraint: Optional projection function to be applied to the kernel after being updated by an Optimizer (e.g. used to implement norm constraints or value constraints for layer weights). The function must take as input the unprojected variable and must return the projected variable (which must have the same shape). Constraints are not safe to use when doing asynchronous distributed training.
bias_constraint: Optional projection function to be applied to the bias after being updated by an Optimizer.
trainable: Boolean, if True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
name: A string, the name of the layer.
reuse: Boolean, whether to reuse the weights of a previous layer by the same name.
Returns:
Output tensor.
'''
主要引數說明
inputs: Tensor input.
filters: 就是卷積核的個數 是一個整數
kernel_size: 卷積視窗的大小 . 1個整數或2個整數的元組/列表,指定2D卷積視窗的 高度和寬度 。若為單個整數則表示高度和寬度方向都是該值。 (因為是2維度卷積,所以只在高,寬方向上進行卷積)–
strides:和kernel_size的引數格式相同,不過這個引數表示的是卷積視窗沿著高和寬移動的“步數”。另外, strides 不等於1 和 dilation_rate 不等於1 這兩種情況不能同時存在。
其他的具體引數請看英文的程式碼塊
tf.layers.max_pooling2d
tf.layers.max_pooling2d(
inputs,
pool_size,
strides,
padding='valid',
data_format='channels_last',
name=None
)
'''
Arguments:
inputs: The tensor over which to pool. Must have rank 4.
pool_size: An integer or tuple/list of 2 integers: (pool_height, pool_width) specifying the size of the pooling window. Can be a single integer to specify the same value for all spatial dimensions.
strides: An integer or tuple/list of 2 integers, specifying the strides of the pooling operation. Can be a single integer to specify the same value for all spatial dimensions.
padding: A string. The padding method, either 'valid' or 'same'. Case-insensitive.
data_format: A string. The ordering of the dimensions in the inputs. channels_last (default) and channels_first are supported. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).
name: A string, the name of the layer.
Returns:
Output tensor.
'''
說明
引數的含義和上面的tf.layers.conv2d幾乎一樣**(pool_size:表示的池化視窗的大小,也是隻在高和寬的方向上移動池化視窗)**
tf.layers.dropout(請千萬注意這個方法的引數rate的含義)
tf.layers.dropout(
inputs,
rate=0.5,
noise_shape=None,
seed=None,
training=False,
name=None
)
和tf.nn.dropout所實現的功能完全一樣,具體的請看本人這篇部落格
唯一需要注意的地方有兩點:
- 引數rate表示的是drop_prob而不是keep_prob
- 引數training:當其是True時,對輸入實施dropout,當是False時,不起dropout不起作用
用以上知識點實現一個CNN
#!/usr/bin/env python
# coding: utf-8
from __future__ import division,print_function,absolute_import
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=False)
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
#訓練引數
learning_rate =0.001
num_steps=2000
batch_size=128
#網路引數
num_input =784
num_classes=10
dropout=0.25
def conv_net(x_dict,n_classes,dropout,reuse,is_training):
with tf.variable_scope('ConvNet',reuse=reuse):
x=x_dict['image']
x=tf.reshape(x,[-1,28,28,1])
conv1=tf.layers.conv2d(x,32,5,activation=tf.nn.relu)
conv1=tf.layers.max_pooling2d(conv1,2,2)
conv2=tf.layers.conv2d(conv1,64,3,activation=tf.nn.relu)
conv2=tf.layers.max_pooling2d(conv2,2,2)
#將最後一個卷積得到的多個通道進行flatten
fc1=tf.contrib.layers.flatten(conv2)
fc1=tf.layers.dense(fc1,1024) #這是定義一個全連線 引數:輸入,和神經網路的輸出
fc1=tf.layers.dropout(fc1,rate=dropout,training=is_training)
out=tf.layers.dense(fc1,n_classes)
return out
def model_fn(features,labels,mode):
logits_train=conv_net(features,num_classes,dropout,reuse=False,is_training=True)
logits_test=conv_net(features,num_classes,dropout,reuse=True,is_training=False)
pred_classes=tf.argmax(logits_test,-1)
pred_probas=tf.nn.softmax(logits_test)
if mode==tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode,predictions=pred_classes)
loss_op=tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits_train,labels=tf.cast(labels,tf.int32)))
optimizer=tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op=optimizer.minimize(loss_op)
accury_op=tf.metrics.accuracy(labels=labels,predictions=pred_classes)
return tf.estimator.EstimatorSpec(
mode,
predictions=pred_classes,
loss=loss_op,
train_op=train_op,
eval_metric_ops={"accuracy":accury_op}
)
model=tf.estimator.Estimator(model_fn)
input_fn=tf.estimator.inputs.numpy_input_fn(x={'image':mnist.train.images},
y=mnist.train.labels,
batch_size=batch_size,
num_epochs=None,
shuffle=True)
model.train(input_fn,steps=num_steps)
# Ealuate the Model
# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(
x={'images': mnist.test.images}, y=mnist.test.labels,
batch_size=batch_size, shuffle=False)
# Use the Estimator 'evaluate' method
model.evaluate(input_fn)
# Predict single images
n_images = 4
# Get images from test set
test_images = mnist.test.images[:n_images]
# Prepare the input data
input_fn = tf.estimator.inputs.numpy_input_fn(
x={'images': test_images}, shuffle=False)
# Use the model to predict the images class
preds = list(model.predict(input_fn))
# Display
for i in range(n_images):
plt.imshow(np.reshape(test_images[i], [28, 28]), cmap='gray')
plt.show()
print("Model prediction:", preds[i])