Machine learning: Convolutional layers

Convolutional layers #

Recall the notion of a convolutional neural layer \(c: \mathbb{R}^{n_1\times k_1} \rightarrow \mathbb{R}^{n_2 \times k_2}\) from class. It establishes a set of connections between a layer with \(k_1\) channels each of dimension \(n_1\) with a layer consisting of \(k_2\) channels each of dimension \(n_2\). There are a few excellent references which supplement your class notes:

  • notes from Stanford’s CS231, and
  • Chapter 9 of Deep Learning by Goodfellow, Bengio, and Courville. The latter is available in hard cover from the math depatment’s library in Searles 214.
  • a blog about convolutions.

An explicit formula for a convolutional layer #

If you ignore all the architectural details of a convolutional layer, it is simply a function \[ c: \mathbb{R}^{n_1 k_1} \rightarrow \mathbb{R}^{n_2 k_2}\] that can be expressed in the the form \(c(\vec{x}) = \sigma(A \vec{x} + \vec{b}). \) In particular, all our prior work on dense neural networks such as Jacobian and gradient computations still apply.

Homework exercise: Write down an expression for the matrix \(A\). For simplicity, assume that \[c: \mathbb{R}^{4 \times 2} \rightarrow \mathbb{R}^{2 \times 2}\] and each neuron in the range is connected with non-zero weights to exactly three neurons in the domain.

A universal approximation theorem for CNNs #

A convolutional neural network, or simply a CNN if you like to abbreviate, \[\mathcal{N}: \mathbb{R}^n \rightarrow \mathbb{R}^m\] is a composition of convolutional neural layers as above. In practice, the term CNN is often used to refer to neural networks constructed using a mix of convolutional and dense layers, but let us ignore this generalization for the time being. Even though convolutional neural networks are very specific types of neural networks, they are in fact universal approximators! More preciesly:

Theorem: If \(K \) is a compact subset of \(\mathbb{R}^n\) and \(f: K \rightarrow \mathbb{R}^m\) is a continuous function, then for every \(\epsilon >0\) there is a convolutional neural network \(\mathcal{N}\) so that \[ \sup_{x \in K} ||f(x) - \mathcal{N}(x)|| < \epsilon .\]

There is a variety of results about the universality of convolutional and invariant neural networks. They are substantial mathematical achievements. The next problem is a little more reasonable.

Homework exercise:

Prove this universal approximation theorem for convolutional neural networks by first proving the following lemma about a single dense layer.

Lemma: Every dense neural layer \(d: \mathbb{R}^n \rightarrow \mathbb{R}^m\) can be written as a composition of two convolutional neural layers \[c_1: \mathbb{R}^n \rightarrow \mathbb{R}^{n \times m} \; \text{ and } \; c_2: \mathbb{R}^{n \times m} \rightarrow \mathbb{R}^{m}.\]
Hint: You will have to find an appropriate choice for the filters to use for both convolutional layers and show that they combine to form the desired dense layer. Start by working small examples to get an intution for how to proceed. You can use the matrix formulation from the problem above, or a graph-based formulation of the problem such as from class. And don’t forget to finish by explaining how this lemma implies the theorem.