Chain rule for neural networks #
Derivatives with respect to input variables #
Consider the neural network in Figure 1 with weights and biases as indicated and assume that all of its neurons use the ReLU activation function. The neural network itself is a function \(\mathcal{N}(x)\) of the single variable \(x\):
By carefully writing out the formulas, using the multivariate chain rule, and realizing that computing the derivative of ReLu at any specific point except for 0 is very simple, it is possible to compute the rate of change of \(\mathcal{N}\) with respect to the input variable \(x\). Carry this out below:
Derivatives with respect to weights and biases #
The output of a neural network varies with both: the values of the input vector \(\vec{x}\) as well as the weights and biases of the neurons: changing the value of a weight or of a bias changes the output of the neural network. Let us first study this is the case of a single neuron:
Unraveling what this diagram encodes, we find that \(\mathcal{N}(x,y) = \sigma(w_1 x + w_2 y +b ).\) It is a function of \(x\) and \(y\) as well as of \(w_1\), \(w_2\), and \(b\). So by holding all other variables constant, we can compute partial derivatives. For instance:
\[\frac{\partial\mathcal{N}}{\partial w_1} = \frac{\partial\sigma}{\partial c} \cdot \frac{\partial c }{\partial w_1} \] where \(c = w_1 x + w_2 y +b. \) We can similarly compute that partial derivatives of \(\mathcal{N}\) with respect to \(w_2\) and \(b\).