Machine learning: Chain rule

Multivariate chain rule #

Background #

You may have seen the multivariate chain rule in a calculus course. It is important to our current work and it will be absolutely crucial in our work on neural networks. Please reacquaint yourself with the details either by reading through Section 14.6 of our multivariate text, or watching the now-classic Math 1800 lectures below:

A useful example #

You will find the result of the following application of the chain rule useful in a subsequent exercise on a multivariate mean value theorem.

Homework exercise: Suppose that \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) and \(\vec{x}: \mathbb{R} \rightarrow \mathbb{R}^n\) where we can write the vector-valued function \(\vec{x}(t)\) using coordinate functions \(x_i(t)\) as \[\vec{x}(t) = (x_1(t), x_2(t), \ldots, x_n(t) ) .\] If we let \(g(t)\) be the composition \(g(t) = f(\vec{x}(t))\), then the corresponding dependency diagram is

graph TD; f--> x_1; f--> x_2; f--> x_3; f--> x_n; x_1 --> t; x_2 --> t; x_3 --> t; x_n --> t;

Write \(\tfrac{\partial \vec{x}}{\partial t} \) for the column vector formed from the derivatives of the coordinate functions, that is, \( ( \tfrac{\partial x_1}{\partial t}, \tfrac{\partial x_2}{\partial t}, \ldots, \tfrac{\partial x_n}{\partial t})^T \), and show that \[g’(c) = \nabla f (\vec{x}(c))^T \tfrac{\partial \vec{x}}{\partial t}(c) \] where \(c \in \mathbb{R}.\)