第二講-神經網路優化-損失函式

阿新 • • 發佈：2020-12-28

技術標籤：Tensorflow tensorflow python

5、損失函式

損失函式是前向傳播計算出的結果y與已知標準答案y_的差距。神經網路的優化目標，找出引數使得loss值最小。

本次介紹損失函式有：均方誤差（mse，Mean Squared Error）、自定義、交叉熵(ce,Cross Entropy)

均方誤差（y_表示標準答案，y表示預測答案計算值）

tensorFlow: lose_mse =tf.reduce_mean(tf.square(y-y’))

示例：預測酸奶日銷量y，x1,x2是影響因素。建模前，應預先採集資料：每日x1,x2和銷量y_。擬造資料集X，Y_：y_=x1+x2 噪聲：-0.05~+0.05.

程式碼：

import tensorflow as tf
import numpy as np
SEED = 23455

rdm = np.random.RandomState(seed=SEED)      # 生成[0,1)之間的隨機數
x = rdm.rand(32,2)     # 隨機制造訓練資料x和 y_
y_ = [[x1 + x2 + (rdm.rand()/10.0-0.05)] for (x1,x2) in x]    # 生成噪聲[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
x = tf.cast(x,dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2,1],stddev=1,seed=1))

epoch = 15000
lr = 0.002

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x,w1)         
        loss_mse = tf.reduce_mean(tf.square(y_ - y))

    grads = tape.gradient(loss_mse,w1)
    w1.assign_sub(lr * grads)

    if(epoch % 500==0):
        print("After %d training steps,w1 is " % (epoch))
        print(w1.numpy(),'\n')
print("Final w1 is: ", w1.numpy())



執行結果：
After 0 training steps,w1 is 
[[-0.8096241]
 [ 1.4855157]] 

After 500 training steps,w1 is 
[[-0.21934733]
 [ 1.6984866 ]] 

After 1000 training steps,w1 is 
[[0.0893971]
 [1.673225 ]] 
。。。。
After 14000 training steps,w1 is 
[[0.9993659]
 [0.999166 ]] 

After 14500 training steps,w1 is 
[[1.0002553 ]
 [0.99838644]] 

Final w1 is:  [[1.0009792]
 [0.9977485]]

最終W1均接近1,預測符合預期。

如預測商品銷量，預測多了，損失成本；預測少了，損失利潤。如果利潤！=成本，因mse會同等對待各個因素，則mse產生的loss無法利益最大化。從而可以根據實際情況進行自定義損失函式。

損失函式的定義能極大影響預測效果。好的損失函式設計對於模型訓練能起到良好的引導作用。

自定義損失函式

同樣以預測酸奶銷量為例：自定義損失函式為：

loss_zdy=tf.reduce_sum(tf.where(tf.greater(y,y_),(y-y_)*COST,(y_-y)*PROFIT))

如酸奶成本（COST）1元，利潤（PROFIT）99元。預測少了損失利潤99元，大於預測多了損失成本1元。預測少了損失大，希望生成的預測函式往多了預測。

完整程式碼（相比mse程式碼，僅修改了 # ----）：

import tensorflow as tf
import numpy as np

SEED = 23455
COST = 1    #  ----
PROFIT = 99     # ----

rdm = np.random.RandomState(seed=SEED)      # 生成[0,1)之間的隨機數
x = rdm.rand(32,2)     # 隨機制造訓練資料x和 y_
y_ = [[x1 + x2 + (rdm.rand()/10.0-0.05)] for (x1,x2) in x]    # 生成噪聲[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
x = tf.cast(x,dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2,1],stddev=1,seed=1))

epoch = 15000
lr = 0.002

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x,w1)
        loss_zdy = tf.reduce_sum(tf.where(tf.greater(y,y_),(y-y_)*COST,(y_-y)*PROFIT))     # ----

    grads = tape.gradient(loss_zdy,w1)
    w1.assign_sub(lr * grads)

    if(epoch % 500==0):
        print("After %d training steps,w1 is " % (epoch))
        print(w1.numpy(),'\n')
print("Final w1 is: ", w1.numpy())





執行結果：
。。。
Final w1 is:  [[1.1420636]
 [1.1016785]]

檢視執行最終w1,可見引數均大於1,預測均偏多。

若修改COST=99, PROFIT =1,再次執行檢視finalw1：

可見，當成本比利潤高時，預測的w均小於1，預測量偏少。

交叉熵損失函式（CrossEntropy）

表徵兩個概率分佈之間的距離。

交叉熵越大，兩個概率分佈越遠；

交叉熵越小，兩個概率分佈越近。

tensorFlow: tf.losses.categorical_crossentropy（y_，y）

例如：二分類已知答案y_=(1,0)，預測y1=(0.6,0.4)，y2=(0.8,0.2)，哪個更接近標準答案？

程式碼計算：

import tensorflow as tf
loss_ce1 = tf.losses.categorical_crossentropy([1,0],[0.6,0.4])
loss_ce2 = tf.losses.categorical_crossentropy([1,0],[0.8,0.2])
print("loss_ce1:",loss_ce1)
print("loss_ce2:",loss_ce2)

執行結果：
loss_ce1: tf.Tensor(0.5108256, shape=(), dtype=float32)
loss_ce2: tf.Tensor(0.22314353, shape=(), dtype=float32)

因為loss_ce1>loss_ce2，故y2預測更準確。

softmax與交叉熵結合

輸出先過softmax函式，再計算y與y-的交叉熵損失函式。

tf.nn.softmax_cross_entropy_with_logits(y_,y) --同時計算了softmax和交叉熵

示例：

import tensorflow as tf
import numpy as np

y_ = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0]])
y = np.array([[12, 3, 2], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])
y_pro = tf.nn.softmax(y)
loss_ce1 = tf.losses.categorical_crossentropy(y_,y_pro)
loss_ce2 = tf.nn.softmax_cross_entropy_with_logits(y_, y)

print('分步計算的結果:\n', loss_ce1)
print('結合計算的結果:\n', loss_ce2)

執行結果：
分步計算的結果:
 tf.Tensor(
[1.68795487e-04 1.03475622e-03 6.58839038e-02 2.58349207e+00
 5.49852354e-02], shape=(5,), dtype=float64)
結合計算的結果:
 tf.Tensor(
[1.68795487e-04 1.03475622e-03 6.58839038e-02 2.58349207e+00
 5.49852354e-02], shape=(5,), dtype=float64)

可見，loss_ce2 一行程式碼等同於y_pro，loss_ce1，2行程式碼效果。