SELU︱在keras、tensorflow中使用SELU啟用函式

阿新 • • 發佈：2019-01-01

arXiv 上公開的一篇 NIPS 投稿論文《Self-Normalizing Neural Networks》引起了圈內極大的關注，它提出了縮放指數型線性單元（SELU）而引進了自歸一化屬性，該單元主要使用一個函式 g 對映前後兩層神經網路的均值和方差以達到歸一化的效果。 Shao-Hua Sun 在 Github 上放出了 SELU 與 Relu、Leaky Relu 的對比，機器之心對比較結果進行了翻譯介紹，具體的實現過程可參看以下專案地址。

keras中使用SELU啟用函式

在keras 2.0.6版本之後才可以使用selu啟用函式，但是在版本2.0.5還是不行，所以得升級到這個版本。
在全連線層後面接上selu最終收斂會快一些

具體對比效果：

from __future__ import print_function

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.noise import AlphaDropout
from keras.utils import np_utils
from keras.optimizers import 
 RMSprop, Adam

batch_size = 128
num_classes = 10
epochs = 20
learning_rate = 0.001

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32' 
)
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

modelSELU = Sequential()
modelSELU.add(Dense(512, activation='selu', input_shape=(784,)))
modelSELU.add(AlphaDropout(0.1))
modelSELU.add(Dense(512, activation='selu'))
modelSELU.add(AlphaDropout(0.1))
modelSELU.add(Dense(10, activation='softmax'))

modelSELU.summary()

modelSELU.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

historySELU = modelSELU.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))
scoreSELU = modelSELU.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scoreSELU[0])
print('Test accuracy:', scoreSELU[1])

tensorflow中使用dropout_selu + SELU

def selu(x):
    with ops.name_scope('elu') as scope:
        alpha = 1.6732632423543772848170429916717
        scale = 1.0507009873554804934193349852946
        return scale*tf.where(x>=0.0, x, alpha*tf.nn.elu(x))

def dropout_selu(x, keep_prob, alpha= -1.7580993408473766, fixedPointMean=0.0, fixedPointVar=1.0,
                 noise_shape=None, seed=None, name=None, training=False):
    """Dropout to a value with rescaling."""

    def dropout_selu_impl(x, rate, alpha, noise_shape, seed, name):
        keep_prob = 1.0 - rate
        x = ops.convert_to_tensor(x, name="x")
        if isinstance(keep_prob, numbers.Real) and not 0 < keep_prob <= 1:
            raise ValueError("keep_prob must be a scalar tensor or a float in the "
                                             "range (0, 1], got %g" % keep_prob)
        keep_prob = ops.convert_to_tensor(keep_prob, dtype=x.dtype, name="keep_prob")
        keep_prob.get_shape().assert_is_compatible_with(tensor_shape.scalar())

        alpha = ops.convert_to_tensor(alpha, dtype=x.dtype, name="alpha")
        keep_prob.get_shape().assert_is_compatible_with(tensor_shape.scalar())

        if tensor_util.constant_value(keep_prob) == 1:
            return x

        noise_shape = noise_shape if noise_shape is not None else array_ops.shape(x)
        random_tensor = keep_prob
        random_tensor += random_ops.random_uniform(noise_shape, seed=seed, dtype=x.dtype)
        binary_tensor = math_ops.floor(random_tensor)
        ret = x * binary_tensor + alpha * (1-binary_tensor)

        a = tf.sqrt(fixedPointVar / (keep_prob *((1-keep_prob) * tf.pow(alpha-fixedPointMean,2) + fixedPointVar)))

        b = fixedPointMean - a * (keep_prob * fixedPointMean + (1 - keep_prob) * alpha)
        ret = a * ret + b
        ret.set_shape(x.get_shape())
        return ret

    with ops.name_scope(name, "dropout", [x]) as name:
        return utils.smart_cond(training,
                                lambda: dropout_selu_impl(x, keep_prob, alpha, noise_shape, seed, name),
                                lambda: array_ops.identity(x))

作者將其使用在以下案例之中：

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Check out https://www.tensorflow.org/get_started/mnist/beginners for
# more information about the mnist dataset

# parameters
learning_rate = 0.001
training_epochs = 50
batch_size = 100

# input place holders
X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None, 10])

# dropout (keep_prob) rate  0.7 on training, but should be 1 for testing
keep_prob = tf.placeholder(tf.float32)

# weights & bias for nn layers
# http://stackoverflow.com/questions/33640581/how-to-do-xavier-initialization-on-tensorflow
W1 = tf.get_variable("W1", shape=[784, 512],
                     initializer=tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([512]))
L1 = selu(tf.matmul(X, W1) + b1)
L1 = dropout_selu(L1, keep_prob=keep_prob)

W2 = tf.get_variable("W2", shape=[512, 512],
                     initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([512]))
L2 = selu(tf.matmul(L1, W2) + b2)
L2 = dropout_selu(L2, keep_prob=keep_prob)

W3 = tf.get_variable("W3", shape=[512, 512],
                     initializer=tf.contrib.layers.xavier_initializer())
b3 = tf.Variable(tf.random_normal([512]))
L3 = selu(tf.matmul(L2, W3) + b3)
L3 = dropout_selu(L3, keep_prob=keep_prob)

W4 = tf.get_variable("W4", shape=[512, 512],
                     initializer=tf.contrib.layers.xavier_initializer())
b4 = tf.Variable(tf.random_normal([512]))
L4 = selu(tf.matmul(L3, W4) + b4)
L4 = dropout_selu(L4, keep_prob=keep_prob)

W5 = tf.get_variable("W5", shape=[512, 10],
                     initializer=tf.contrib.layers.xavier_initializer())
b5 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(L4, W5) + b5

# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# train my model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {X: batch_xs, Y: batch_ys, keep_prob: 0.7}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning Finished!')

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

公眾號“素質雲筆記”定期更新部落格內容：
這裡寫圖片描述

SELU︱在keras、tensorflow中使用SELU啟用函式

arXiv 上公開的一篇 NIPS 投稿論文《Self-Normalizing Neural Networks》引起了圈內極大的關注，它提出了縮放指數型線性單元（SELU）而引進了自歸一化屬性，該單元主要使用一個函式 g 對映前後兩層神經網路的均值和方差以

一、 tensorflow中有一類在tensor的某一維度上求值的函式。如：

一、 tensorflow中有一類在tensor的某一維度上求值的函式。如：求最大值tf.reduce_max(input_tensor, reduction_indices=None, keep_dims=False, name=None) 求平均值tf.reduce_m

神經網路中的啟用函式sigmoid、 tanh 、RELU

首先解釋為什麼要做：再解釋怎麼做：從數學上來看，Sigmoid函式對中央區的訊號增益較大，對兩側區的訊號增益小，在訊號的特徵空間對映上，有很好的效果。在具體應用中，t

tensorflow中的pad函式解釋

from: 說明：關於 tf.pad(...) 函式網上的解釋和官網都讓你看不懂，自己理解整理如下，希望可以幫到需要的人，以下內容只關注0擴充套件邊界函式原型： tf.pad(input, paddings, name=None) input : 代表輸入張量 paddi

深度學習中的Xavier初始化和He Initialization（MSRA初始化）、Tensorflow中如何選擇合適的初始化方法?

Xavier初始化：論文：Understanding the difficulty of training deep feedforward neural networks 論文地址：http://proceedings.mlr.press/v9/glorot10a/glorot10a

玩玩機器學習4——TensorFlow基礎之啟用函式

啟用函式（activation function）執行時啟用神經網路中某一部分神經元，將啟用資訊向後傳入下一層的神經網路。神經網路之所以能解決非線性問題（如語音、影象識別），本質上就是啟用函式加入了非線性因素，彌補了線性模型的表達力，把“啟用的神經元的特徵”通過

[深度學習] 神經網路中的啟用函式（Activation function）

20180930 在研究調整FCN模型的時候，對啟用函式做更深入地選擇，記錄學習內容啟用函式（Activation Function），就是在人工神經網路的神經元上執行的函式，負責將神經元的輸入對映到輸出端。線性啟用函式：最簡單的linear fun

TensorFlow筆記：啟用函式

tf.nn.sigmid()函式函式表示式 f(x)=11+e−x f(x) = \frac{1}{1 + e^{-x}} f(x)=1+e−x1 函式影象函式性質對其求導可得到 f′(x)=

使用Python對Sigmoid、Tanh、ReLU三種啟用函式繪製曲線

Sigmoid啟用函式 import math import numpy as np import matplotlib.pyplot as plt x = np.arange(-10,10) a=np.array(x) y1=1/(1+math.e**(

tensorflow 中tf.expand_dims函式用法

tf.expand_dims()(維度擴充套件)函式用法 tf.expand_dims(input,position,name) 其中input就是要擴充套件的變數，position就是選擇在那個位置上擴充套件維度，如果input原來是3維的，那麼p

關於tensorflow中的softmax_cross_entropy_with_logits_v2函式的區別

tf.nn.softmax_cross_entropy_with_logits(記為f1) 和 tf.nn.sparse_softmax_cross_entropy_with_logits(記為f3),以及 tf.nn.softmax_cros

為什麼神經網路中需要啟用函式（activation function）？

在看tensorflow的時候，發現書中程式碼提到，使用ReLU啟用函式完成去線性化為什麼需要啟用函式去線性化？查了一下quaro，覺得這個回答能看明白（順便問一句，截圖算不算引用？？）---------------------------------------------

淺析神經網路中的啟用函式

1 什麼是啟用函式啟用函式就是一類x到y的對映目的：是為了引入非線性元素，解決線性模型不能解決的問題。意義：一個沒有啟用函式的神經網路將只不過是一個線性迴歸模型（Linear regression Model）。它並不能表達複雜的資料分佈。啟用

神經網路中的啟用函式（activation function）-Sigmoid, ReLu, TanHyperbolic(tanh), softmax, softplus

　　不管是傳統的神經網路模型還是時下熱門的深度學習，我們都可以在其中看到啟用函式的影子。所謂啟用函式，就是在神經網路的神經元上執行的函式，負責將神經元的輸入對映到輸出端。常見的啟用函式包括Sigmoid、TanHyperbolic(tanh)、ReLu、 sof

Tensorflow中常用的函式總結(一)

1、tf.shape()和x.get_shape().as_list()的使用 (1) tf.shape() 先說tf.shape()很顯然這個是獲取張量的大小的，用法無需多說，直接上例子吧！ import tensorflow as tf import numpy

深度學習中的啟用函式Sigmoid和ReLu啟用函式和梯度消失問題。

1. Sigmoid啟用函式： Sigmoid啟用函式的缺陷：當 x 取很大的值之後他們對應的 y 值區別不會很大，就會出現梯度消失的問題。因此現在一般都不使用Sigmoid函式，而是使用ReLu啟用函式。2. ReLu啟用函式： ReL

深度學習：神經網路中的啟用函式

軟飽和和硬飽和sigmoid 在定義域內處處可導，且兩側導數逐漸趨近於0。Bengio 教授等[1]將具有這類性質的啟用函式定義為軟飽和啟用函式。與極限的定義類似，飽和也分為左飽和與右飽和。與軟飽和相對的是硬飽和啟用函式，即：f'(x)=0，當 |x| > c，其中 c 為常數。同理，硬飽和也分為左飽和

TensorFlow中梯度下降函式

一介紹下面介紹在TensorFlow中進行隨機梯度下降優化的函式。在TensorFlow中通過一個叫Optimizer的優化器類進行訓練優化。二梯度下降優化器三說明在訓練過程中先例項化一個優化函式如tf.train.GradientDescentOptimizer，並基

Java呼叫Keras、Tensorflow模型

實現python離線訓練模型，Java線上預測部署。檢視原文目前深度學習主流使用python訓練自己的模型，有非常多的框架提供了能快速搭建神經網路的功能，其中Keras提供了high-level的語法，底層可以使用tensorflow或者theano。但是有很

tensorflow中的常見函式（1）

1、tensorflow的基本運作為了快速的熟悉TensorFlow程式設計，下面從一段簡單的程式碼開始： import tensorflow as tf #定義‘符號’變數，也稱為佔位符 a = tf.placeholder("float") b = tf.

SELU︱在keras、tensorflow中使用SELU啟用函式

keras中使用SELU啟用函式

tensorflow中使用dropout_selu + SELU

相關推薦