Machine learning: Multiplicative nets

Multiplicative neural networks #

To use the Stone-Weierstrass Theorem in his proof of the Universal Approximation Theorem, Kurt Hornik needed a family of functions that was both a unital algebra and also separated points. Following his notation, let us define two families of functions:

Definition: Let \(\sigma: \mathbb{R} \rightarrow \mathbb{R}\) be a non-constant continuous function and write \(\Sigma^n(\sigma) \) for the family of functions \(s : \mathbb{R}^n \rightarrow \mathbb{R}\) that can be written in the following form: \[ s(\vec{x}) = \sum_i \beta_i \sigma(\vec{w}_i^t \vec{x} + b_i)\] where \(\beta_i \in \mathbb{R}, \) \(\vec{w}_i \in \mathbb{R}^n, \) and \( b_i \in \mathbb{R}\).

After a bit of work and diagramming, one can see that \(\Sigma^n(\sigma)\) is just the family of dense neural networks with one hidden layer and activation function \(\sigma\) and no activation function or bias term on the output neuron. Unfortunately, it is not easy to see whether \(\Sigma^n(\sigma)\) is a unital algebra. Instead, consider:

Definition: Let \(\sigma: \mathbb{R} \rightarrow \mathbb{R}\) be a non-constant continuous function and write \(\Sigma \Pi^n (\sigma) \) for the family of functions \(s : \mathbb{R}^n \rightarrow \mathbb{R}\) that can be written in the following form: \[ s(\vec{x}) = \sum_i \beta_i \Big(\prod_j \sigma(\vec{w}_{ij}^t \vec{x} + b_{ij}) \Big)\] where \(\beta_{i} \in \mathbb{R}, \) \(\vec{w}_{ij} \in \mathbb{R}^n, \) and \( b_{ij} \in \mathbb{R}\).

Note that \(\Sigma(\sigma) \subset \Sigma \Pi(\sigma)\) but the latter family quite a bit richer as it allows outputs of the hidden layer neurons to be multiplied together.

Homework exercise:

Show that \(\Sigma \Pi^n (\sigma) \) forms an algebra, that it is unital, and that it separates points.

Hint: To show that this family separates points, it is enough to show that for any \(\vec{x} \neq \vec{y} \in \mathbb{R}^n\) there are \(\vec{w} \in \mathbb{R}^n\) and \(b in \mathbb{R}\) so that \(\sigma(\vec{w}^t \vec{x} + b) \neq \sigma(\vec{w}^t \vec{y} +b)\). And \(\vec{x} \neq \vec{y} \in \mathbb{R}^n\) iff they disagree on at least one coordinate.