Multivariate output

$\mathcal{N}: K \rightarrow \mathbb{R}^m$ #

In class, we showed that neural networks in $\Sigma^n(\sigma)$ are universal approximators for continuous functions $f: \mathbb{R}^n \rightarrow \mathbb{R}$ . In his original paper, Kurt Hornik extended the result to continuous functions $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ with a multivariable output by using the time-tested proof technique of frantic handwaving. He offered:

Analogous results are valid for multi-output networks approximating funtions from $\mathbb{R}^n$ to $\mathbb{R}^m$ .

The goal of this homework exercise is to provide a formal proof of this results. We begin with a formal definition of the analogous class of neural networks:

Definition: Let $\sigma: \mathbb{R} \rightarrow \mathbb{R}$ be a non-constant continuous function and write $\Sigma^{n,m}(\sigma)$ for the family of functions $s : \mathbb{R}^n \rightarrow \mathbb{R}^m$ that can be written in the following form: $s(\vec{x}) = \sum_i \vec{\beta_i} \sigma(\vec{w}_i^t \vec{x} + b_i)$ where $\vec{\beta_i} \in \mathbb{R}^m,$ $\vec{w}_i \in \mathbb{R}^n,$ and $b_i \in \mathbb{R}$ .

Exercise: Reconcile the definition above with our usual notion of a neural network with one hidden layer that has $n$ input and $m$ output neurons.

There is one more thing we have to agree upon. We would like to find a neural network that approximates a given continuous function, or in other words, a neural network whose outputs are close to the outputs of this continuous function. Since the outputs are both vectors in $\mathbb{R}^m$ , we need to fix a metric. For the purposes of this problem, let use use $\rho_1$ , that is:

$\rho_1(\vec{x}, \vec{y}) = \sum_i |x_i - y_i|.$

Equipped with the notation above, we can state a formal version of the result:

Theorem: Suppose that $K$ is a compact subset of $\mathbb{R}^n$ , $\sigma: \mathbb{R} \rightarrow \mathbb{R}$ a continuous squashing function, and $f : K \rightarrow \mathbb{R}^m$ an arbitrary continuous function. Then for every $\epsilon > 0$ , there is a neural network $s \in \Sigma^{n,m}(\sigma)$ so that $\rho_1( f(\vec{x}), s(\vec{x})) < \epsilon$ for all $\vec{x} \in K$ .

Homework exercise: Verify the above theorem. Use the version of the Universal Approximation Theorem we proved in class to approximate the output of each of the $m$ output neurons individually with a neural network in $\Sigma^n(\sigma)$ . From these, construct a neural network in $\Sigma^{n,m}(\sigma)$ .

N:K→Rm\mathcal{N}: K \rightarrow \mathbb{R}^m #

$\mathcal{N}: K \rightarrow \mathbb{R}^m$ #