1. 程式人生 > >神經網路反向傳播,通俗理解





dL/dz 簡稱dz_,L(a,y)使用交叉熵。




da_ = dL/da = 

dz_ = dL/da * da/dz = da_*

dw_ = dL/dz *dz/dw = dz* x

db_ = dz

dx =



da_2 = dL/da2 = 

dz_2 = dL/da2 * da2/dz2 = da_2*

dw_2 = dL/dz2 *dz2/dw2 = dz_2* a1

db_ 2= dz_2


da_1 =dz_2* w2

dz_1 = dL/da1 * da2/dz1 = da_1*

dw_1 = dL/dz1 *dz1/dw1 = dz_1*  a0(x)

db_ 1= dz_1





  • Pseudo code for forward propagation for layer l:

    Input  A[l-1]
    Z[l] = W[l]A[l-1] + b[l]
    A[l] = g[l](Z[l])
    Output A[l], cache(Z[l])
  • Pseudo code for back propagation for layer l:

    Input da[l], Caches
    dZ[l] = dA[l] * g'[l](Z[l])
    dW[l] = (dZ[l]A[l-1].T) / m
    db[l] = sum(dZ[l])/m                # Dont forget axis=1, keepdims=True
    dA[l-1] = w[l].T * dZ[l]            # The multiplication here are a dot product.
    Output dA[l-1], dW[l], db[l]
  • If we have used our loss function then:

    dA[L] = (-(y/a) + ((1-y)/(1-a)))

