N:K→Rm #
In class, we showed that neural networks in Σn(σ) are universal approximators for continuous functions f:Rn→R. In his original paper, Kurt Hornik extended the result to continuous functions f:Rn→Rm with a multivariable output by using the time-tested proof technique of frantic handwaving. He offered:
Analogous results are valid for multi-output networks approximating funtions from Rn to Rm.
The goal of this homework exercise is to provide a formal proof of this results. We begin with a formal definition of the analogous class of neural networks:
Definition: Let σ:R→R be a non-constant continuous function and write Σn,m(σ) for the family of functions s:Rn→Rm that can be written in the following form: s(→x)=∑i→βiσ(→wti→x+bi) where →βi∈Rm, →wi∈Rn, and bi∈R.
Exercise: Reconcile the definition above with our usual notion of a neural network with one hidden layer that has n input and m output neurons.
There is one more thing we have to agree upon. We would like to find a neural network that approximates a given continuous function, or in other words, a neural network whose outputs are close to the outputs of this continuous function. Since the outputs are both vectors in Rm, we need to fix a metric. For the purposes of this problem, let use use ρ1, that is:
ρ1(→x,→y)=∑i|xi−yi|.
Equipped with the notation above, we can state a formal version of the result:
Theorem: Suppose that K is a compact subset of Rn, σ:R→R a continuous squashing function, and f:K→Rm an arbitrary continuous function. Then for every ϵ>0, there is a neural network s∈Σn,m(σ) so that ρ1(f(→x),s(→x))<ϵ for all →x∈K.
Homework exercise: Verify the above theorem. Use the version of the Universal Approximation Theorem we proved in class to approximate the output of each of the m output neurons individually with a neural network in Σn(σ). From these, construct a neural network in Σn,m(σ).