Machine learning: Derivatives

The Jacobian and a generalized chain rule #

In class we discussed the Jacobian of a function \(f: \mathbb{R}^n \rightarrow \mathbb{R}^m\). It is a matrix that encodes all possible partial derivatives between the input and output variables. Using the notation adopted in class and letting \(f(\vec{x}) =\vec{y} \), we write:

\[ \frac{\partial f}{\partial \vec{x}} = \begin{pmatrix} \tfrac{\partial y_1}{\partial x_1} & \tfrac{\partial y_1}{\partial x_2} & \dots & \tfrac{\partial y_1}{\partial x_n}\\ \vdots & \vdots & & \vdots \\ \tfrac{\partial y_m}{\partial x_1} & \tfrac{\partial y_m}{\partial x_2} & \dots & \tfrac{\partial y_m}{\partial x_n} \end{pmatrix}\]

Homework exercise: Suppose that \(f: \mathbb{R}^n \rightarrow \mathbb{R}^m\) and \(g: \mathbb{R}^m \rightarrow \mathbb{R}^k\) are differentiable functions. Use the chain rule from multivariable calculus to show that: \[ \frac{\partial (g \circ f)}{\partial \vec{x}} = \frac{\partial g}{\partial f} \cdot \frac{\partial f }{\partial \vec{x}}.\] This is the chain rule for Jacobians.
Homework exercise: Suppose that \(f: \mathbb{R}^n \rightarrow \mathbb{R}^m\) is an affine function; that is,
\[f(\vec{x}) = A \cdot \vec{x} + \vec{b},\] with \(A \in \mathbb{R}^{m \times n}\) and \(\vec{b} \in \mathbb{R}^m.\) Show that \(\tfrac{\partial f}{\partial \vec{x}} = A.\)