Machine learning: Exercise: higher dimensions

Approximating \(f: \mathbb{R}^d \rightarrow \mathbb{R}^k\) #

Background #

The original version of the Weierstrass Theorem shows that any continuous function \(f:\mathbb{R}\rightarrow\mathbb{R}\) can be approximated by a sequence of polynomials. But single-variable functions are boring from the machine learning perspective; it is rare that we want to predict a single value based on a single feature present in data! In class we derived a multivariate version of the Weierstrass Theorem:

Theorem: Let \([\vec{a},\vec{b}]\) be a multi-interval in \(\mathbb{R}^d\) and consider a continuous function: \(f: [\vec{a},\vec{b}] \rightarrow \mathbb{R}.\) Then there exists a sequence of multivariate polynomials \(\{p_n\}_{n \in \mathbb{N}}\) that converges uniformly to \(f\).

We can push this a little further. We really would like to be able to approximate continuous functions \(f: \mathbb{R}^d \rightarrow \mathbb{R}^k\). The key will be to break such \(f\) into simpler pieces called coordinate functions.

Coordinate functions #

I will ask you to do the heavy-lifting on this part. Consider the following questions in turn:

Exercise:

Suppose that \([\vec{a},\vec{b}] \) is a multi-interval in \(\mathbb{R}^d\) and that our task is to study a continuous function \(f: [\vec{a},\vec{b}] \rightarrow \mathbb{R}^k\).

  • Describe \(f\) in terms of \(k\) individual real-valued functions.
  • Are each of these individual real-valued functions continous as well? An argument via frantic hand-waving wil be fine.
  • How could you measure distance between two continuous functions \(f, g: [\vec{a},\vec{b}] \rightarrow \mathbb{R}^k\)? There are multiple choices; pick one that will make our work simpler.
  • As formally as you can, state a version of the Weierstrass Theorem for a continuous function \(f: [\vec{a},\vec{b}] \rightarrow \mathbb{R}^k\).
Solution
  1. Each such \(f\) is really \(k\) individual functions of the form \(f_i : [\vec{a},\vec{b}] \rightarrow \mathbb{R}\) whose value is just the value of \(f\) on the \(i\)th coordinate. The \(f_i\) are called the sought-after coordinate functions for \(f\).

  2. Coordinate functions are continuous. For an almost formal proof, consider the function \(p_i : \mathbb{R}^n \rightarrow \mathbb{R}\) that takes a vector and returns its \(i\)th coordinate. This function is called a projection. There are a couple of ways to prove that projections are continuous; for instance, what \(\delta\) could you use in a \(\delta-\epsilon\) proof? Finally, note that \(f_i = p_i \circ f\) and both functions on the right are continuous.

  3. Here is one way: compute a distance between \(f\) and \(g\) on each coordinate (choose your favorite one). This will give you a vector of non-negative numbers. And now compute a distance of this vector from the origin. There is a choice to make; two simple ways are \(\rho_1\) and \(\rho_\infty\).

  4. This I will do in class! There is a bit of typesetting…