Loading [MathJax]/jax/output/CommonHTML/jax.js
Machine learning: Multivariate output

N:KRm #

In class, we showed that neural networks in Σn(σ) are universal approximators for continuous functions f:RnR. In his original paper, Kurt Hornik extended the result to continuous functions f:RnRm with a multivariable output by using the time-tested proof technique of frantic handwaving. He offered:

Analogous results are valid for multi-output networks approximating funtions from Rn to Rm.

The goal of this homework exercise is to provide a formal proof of this results. We begin with a formal definition of the analogous class of neural networks:

Definition: Let σ:RR be a non-constant continuous function and write Σn,m(σ) for the family of functions s:RnRm that can be written in the following form: s(x)=iβiσ(wtix+bi) where βiRm, wiRn, and biR.

Exercise: Reconcile the definition above with our usual notion of a neural network with one hidden layer that has n input and m output neurons.

There is one more thing we have to agree upon. We would like to find a neural network that approximates a given continuous function, or in other words, a neural network whose outputs are close to the outputs of this continuous function. Since the outputs are both vectors in Rm, we need to fix a metric. For the purposes of this problem, let use use ρ1, that is:

ρ1(x,y)=i|xiyi|.

Equipped with the notation above, we can state a formal version of the result:

Theorem: Suppose that K is a compact subset of Rn, σ:RR a continuous squashing function, and f:KRm an arbitrary continuous function. Then for every ϵ>0, there is a neural network sΣn,m(σ) so that ρ1(f(x),s(x))<ϵ for all xK.

Homework exercise: Verify the above theorem. Use the version of the Universal Approximation Theorem we proved in class to approximate the output of each of the m output neurons individually with a neural network in Σn(σ). From these, construct a neural network in Σn,m(σ).