TensorFlow2.0（7）：啟用函式

阿新 • • 發佈：2019-10-21

注：本系列所有部落格將持續更新併發布在github上，您可以通過github下載本系列所有文章筆記檔案。

1 什麼是啟用函式¶

啟用函式是深度學習，亦或者說人工神經網路中一個十分重要的組成部分，它可以對神經元的接收資訊進行非線性變換，將變換後的資訊輸出到下一層神經元。啟用函式作用方式如下公式所示：

$$y = Activation(\sum\limits_i^N {{w_i} \cdot {x_i} + b} )$$

其中，$Activation()$就是啟用函式。

為什麼要使用啟用函式呢？當我們不用啟用函式時，網路中各層只會根據權重$w$和偏差$b$只會進行線性變換，就算有多層網路，也只是相當於多個線性方程的組合，依然只是相當於一個線性迴歸模型，解決複雜問題的能力有限。我們希望我們的神經網路能夠處理複雜任務，如語言翻譯和影象分類等，線性變換永遠無法執行這樣的任務。啟用函式得加入能對輸入進行非線性變換，使其能夠學習和執行更復雜的任務。

另外，啟用函式使反向傳播成為可能，因為啟用函式的誤差梯度可以用來調整權重和偏差。如果沒有可微的非線性函式，這就不可能實現。

總之，啟用函式的作用是能夠給神經網路加入一些非線性因素，使得神經網路可以更好地解決較為複雜的問題。

2 常用啟用函式¶

2.1 sigmoid函式¶

sigmoid函式可以將整個實數範圍的的任意值對映到[0,1]範圍內，噹噹輸入值較大時,sigmoid將返回一個接近於1的值,而當輸入值較小時,返回值將接近於0。sigmoid函式數學公式和函式影象如下所示：

$$f(x) = \frac{1}{{1 + {e^{ - x}}}}$$

感受一下TensorFlow中的sigmoid函式：

In [1]:

import tensorflow as tf
x = tf.linspace(-5., 5.,6)
x

Out[1]:

<tf.Tensor: id=3, shape=(6,), dtype=float32, numpy=array([-5., -3., -1.,  1.,  3.,  5.], dtype=float32)>

有兩種方式可以呼叫sigmoid函式：

In [2]:

tf.keras.activations.sigmoid(x)

Out[2]:

<tf.Tensor: id=4, shape=(6,), dtype=float32, numpy=
array([0.00669285, 0.04742587, 0.26894143, 0.7310586 , 0.95257413,
       0.9933072 ], dtype=float32)>

In [3]:

tf.sigmoid(x)

Out[3]:

<tf.Tensor: id=5, shape=(6,), dtype=float32, numpy=
array([0.00669285, 0.04742587, 0.26894143, 0.7310586 , 0.95257413,
       0.9933072 ], dtype=float32)>

看，$x$中所有值都對映到了[0,1]範圍內。

sigmoid優缺點總結：

優點：輸出的對映區間(0,1)內單調連續，非常適合用作輸出層，並且比較容易求導。
缺點：具有軟飽和性，即當輸入x趨向於無窮的時候，它的導數會趨於0，導致很容易產生梯度消失。

2.2 relu函式¶

Relu（Rectified Linear Units修正線性單元），是目前被使用最為頻繁得啟用函式，relu函式在x<0時，輸出始終為0。由於x>0時，relu函式的導數為1，即保持輸出為x，所以relu函式能夠在x>0時保持梯度不斷衰減，從而緩解梯度消失的問題，還能加快收斂速度，還能是神經網路具有稀疏性表達能力，這也是relu啟用函式能夠被使用在深層神經網路中的原因。由於當x<0時，relu函式的導數為0，導致對應的權重無法更新，這樣的神經元被稱為"神經元死亡"。

relu函式公式和影象如下：

$$f(x) = \max (0,x)$$

在TensorFlow中，relu函式的引數情況比sigmoid複雜，我們先來看一下：

tf.keras.activations.relu( x, alpha=0.0, max_value=None, threshold=0 )

x：輸入的變數
alpha：上圖中左半邊部分影象的斜率，也就是x值為負數（準確說應該是小於threshold）部分的斜率，預設為0
max_value：最大值，當x大於max_value時，輸出值為max_value
threshold：起始點，也就是上面圖中拐點處x軸的值

In [4]:

x = tf.linspace(-5., 5.,6)
x

Out[4]:

<tf.Tensor: id=9, shape=(6,), dtype=float32, numpy=array([-5., -3., -1.,  1.,  3.,  5.], dtype=float32)>

In [5]:

tf.keras.activations.relu(x)

Out[5]:

<tf.Tensor: id=10, shape=(6,), dtype=float32, numpy=array([0., 0., 0., 1., 3., 5.], dtype=float32)>

In [6]:

tf.keras.activations.relu(x,alpha=2.)

Out[6]:

<tf.Tensor: id=11, shape=(6,), dtype=float32, numpy=array([-10.,  -6.,  -2.,   1.,   3.,   5.], dtype=float32)>

In [7]:

tf.keras.activations.relu(x,max_value=2.)  # 大於2部分都將輸出為2.

Out[7]:

<tf.Tensor: id=16, shape=(6,), dtype=float32, numpy=array([0., 0., 0., 1., 2., 2.], dtype=float32)>

In [8]:

tf.keras.activations.relu(x,alpha=2., threshold=3.5)  # 小於3.5的值按照alpha * (x - threshold)計算

Out[8]:

<tf.Tensor: id=27, shape=(6,), dtype=float32, numpy=array([-17., -13.,  -9.,  -5.,  -1.,   5.], dtype=float32)>

2.3 softmax函式¶

softmax函式是sigmoid函式的進化，在處理分類問題是很方便，它可以將所有輸出對映到成概率的形式，即值在[0,1]範圍且總和為1。例如輸出變數為[1.5,4.4,2.0]，經過softmax函式啟用後，輸出為[0.04802413, 0.87279755, 0.0791784 ],分別對應屬於1、2、3類的概率。softmax函式數學公式如下：

$$f({x_i}) = \frac{{{e^{{x_i}}}}}{{\sum\limits_i {{e^{{x_i}}}} }}$$ In [9]:

tf.nn.softmax(tf.constant([[1.5,4.4,2.0]]))

Out[9]:

<tf.Tensor: id=29, shape=(1, 3), dtype=float32, numpy=array([[0.04802413, 0.87279755, 0.0791784 ]], dtype=float32)>

In [10]:

tf.keras.activations.softmax(tf.constant([[1.5,4.4,2.0]]))

Out[10]:

<tf.Tensor: id=31, shape=(1, 3), dtype=float32, numpy=array([[0.04802413, 0.87279755, 0.0791784 ]], dtype=float32)>

In [11]:

x = tf.random.uniform([1,5],minval=-2,maxval=2)
x

Out[11]:

<tf.Tensor: id=38, shape=(1, 5), dtype=float32, numpy=
array([[ 1.9715171 ,  0.49954653, -0.37836075,  1.6178164 ,  0.80509186]],
      dtype=float32)>

In [12]:

tf.keras.activations.softmax(x)

Out[12]:

<tf.Tensor: id=39, shape=(1, 5), dtype=float32, numpy=
array([[0.42763966, 0.09813169, 0.04078862, 0.30023944, 0.13320053]],
      dtype=float32)>

2.4 tanh函式¶

tanh函式無論是功能還是函式影象上鬥魚sigmoid函式十分相似，所以兩者的優缺點也一樣，區別在於tanh函式將值對映到[-1,1]範圍，其數學公式和函式影象如下：

$$f(x) = \frac{{\sinh x}}{{\cosh x}} = \frac{{1 - {e^{ - 2x}}}}{{1 + {e^{ - 2x}}}}$$

In [13]:

x = tf.linspace(-5., 5.,6)
x

Out[13]:

<tf.Tensor: id=43, shape=(6,), dtype=float32, numpy=array([-5., -3., -1.,  1.,  3.,  5.], dtype=float32)>

In [14]:

tf.keras.activations.tanh(x)

Out[14]:

<tf.Tensor: id=44, shape=(6,), dtype=float32, numpy=
array([-0.99990916, -0.9950547 , -0.7615942 ,  0.7615942 ,  0.9950547 ,
        0.99990916], dtype=float32)>

3 總結¶

神經網路中，隱藏層之間的輸出大多需要通過啟用函式來對映（當然，也可以不用，沒有使用啟用函式的層一般稱為logits層），在構建模型是，需要根據實際資料情況選擇啟用函式。TensorFlow中的啟用函式可不止這4個，本文只是介紹最常用的4個，當然，其他啟用函式大多是這幾個啟用函式的變種。

TensorFlow2.0（7）：啟用函式

1 什麼是啟用函式¶

2 常用啟用函式¶

2.1 sigmoid函式¶

2.2 relu函式¶

2.3 softmax函式¶

2.4 tanh函式¶

3 總結¶

TensorFlow2.0（7）：啟用函式

TensorFlow2.0（8）：誤差計算——損失函式總結

機器學習（一）：啟用函式（Activation Function）

小程式設計（五）：啟用函式sigmoid,tanh,relu,elu視覺化

ArcGIS for Android 100.3.0（7）：繪製圖層(GraphicsOverlay) ,符號和渲染器(Symbols and Renderers)

tensorflow學習筆記（四）：啟用函式

TensorFlow2.0（1）：基本資料結構—張量

TensorFlow2.0（二）：數學運算

TensorFlow2.0（五）：張量限幅

TensorFlow2.0（六）：Dataset

TensorFlow2.0（9）：TensorBoard視覺化

TensorFlow2.0（10）：載入自定義圖片資料集到Dataset

TensorFlow2.0（11）：tf.keras建模三部曲

TensorFlow2.0（12）：模型儲存與序列化

Spring基礎：快速入門spring boot（7）：spring boot 2.0簡單介紹

Python基礎（7）：函式

Python2語法簡記（7）：函式

樹莓派3學習筆記（7）：7寸（分辨率800 480）顯示器配置

Windows Phone開發（7）：當好總舵主

springBoot（7）：web開發-錯誤處理

TensorFlow2.0（7）：啟用函式

1 什麼是啟用函式¶

2 常用啟用函式¶

2.1 sigmoid函式¶

2.2 relu函式¶

2.3 softmax函式¶

2.4 tanh函式¶

3 總結¶

相關推薦