Machine learning: Generalized MVT

Generalized mean value theorem #

Background #

In a first course in analysis, the following theorem is a focal point of a unit on calculus:

Mean Value Theorem: Suppose that \(f \in \mathcal{C}[a,b]\) is differentiable at all points of \((a,b)\). Then there is a point \(c \in (a,b)\) where \[ \frac{f(b) - f(a)}{b-a} = f’(c).\]

There is an interesting way to reinterpret its meaning. Recall that the Taylor series expansion of a function \(f\) near \(x=a\) takes the form

\[ f(x) \approx f(a) + f’(a) (x-a). \]

Letting \(x=b\),

\[ f(b) \approx f(a) + f’(a) (b-a). \]

This is just an approximation, but the Mean Value Theorem tells us that we can replace \(f’(a)\) with \(f’(c)\) for some \(c\) between \(a\) and \(b\) and get equality instead! To see this, rearrange the terms of the MVT obtaining: \[ f(b) = f(a) + f’(c) (b-a). \]

Multivariate mean value theorem #

In class, we saw that the first-order Taylor approximation of a function \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) takes the form

\[ f(\vec{x}) \approx f(\vec{a}) + \nabla f(\vec{a})^T (\vec{x} - \vec{a}).\]

Prove the following version of the multivariate mean value theorem:

Homework exercise: Suppose that all partial derivatives of \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) exist. Show that there is a point \(\vec{c}\) between \(\vec{b}\) and \(\vec{a}\) where: \[ f(\vec{b}) = f(\vec{a}) + \nabla f(\vec{c})^T (\vec{b} - \vec{a}).\]

Hint: First of all, “between \(\vec{b}\) and \(\vec{a}\)” means \(\vec{c}\) lies on the line segment between \(\vec{b}\) and \(\vec{a}\), or more precisely, that there is a \(t \in (0,1)\) so that \(\vec{c} = t \vec{b} + (1-t) \vec{a} \). Follow this outline as you work:

  • Let \(\vec{x}(t) = t \vec{b} + (1-t) \vec{a}\). It is a function \(\vec{x} : \mathbb{R} \rightarrow \mathbb{R}^n\). What are its values at \(t=0\) and \(t=1\)?
  • Define \(g(t) = f( \vec{x}(t))\) and show that \( \exists \; c \in (0,1) \) so that \[ g(1) - g(0) = g’(c). \]
  • Compute \(g’(c)\) and deduce the desired result.

In class we will use a second-order version of this theorem where the second derivatives are encoded in the Hessian of \(f\). Its proof is similar so I will not make you suffer through it!