反向传播

  • 一些笔记(未完待续)

文章中的英文描述,公式以及图片,均来自吴恩达深度学习课程的课后作业

J z 2 ( i ) = 1 m ( a [ 2 ] ( i ) y ( i ) ) \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)})

J W 2 = J z 2 ( i ) a [ 1 ] ( i ) T \frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T}

J b 2 = i J z 2 ( i ) \frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}

J z 1 ( i ) = W 2 T J z 2 ( i ) ( 1 a [ 1 ] ( i ) 2 ) \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } = W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2})

J W 1 = J z 1 ( i ) X T \frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } X^T

J i b 1 = i J z 1 ( i ) \frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}

下图是反向传播时的梯度计算,输出层激活函数为Sigmoid函数,隐藏层的激活函数是tanh(),右侧是对应的向量化实现。
反向传播时的梯度计算

  • Note that * denotes elementwise multiplication.

  • The notation you will use is common in deep learning coding:

    • dW1 = J W 1 \frac{\partial \mathcal{J} }{ \partial W_1 }
    • db1 = J b 1 \frac{\partial \mathcal{J} }{ \partial b_1 }
    • dW2 = J W 2 \frac{\partial \mathcal{J} }{ \partial W_2 }
    • db2 = J b 2 \frac{\partial \mathcal{J} }{ \partial b_2 }
  • Tips:

    • To compute dZ1 you’ll need to compute g [ 1 ] ( Z [ 1 ] ) g^{[1]'}(Z^{[1]}) . Since g [ 1 ] ( . ) g^{[1]}(.) is the tanh activation function, if a = g [ 1 ] ( z ) a = g^{[1]}(z) then g [ 1 ] ( z ) = 1 a 2 g^{[1]'}(z) = 1-a^2 . So you can compute
      g [ 1 ] ( Z [ 1 ] ) g^{[1]'}(Z^{[1]}) using (1 - np.power(A1, 2)).