Machine learning: JL Lemma

Dimension reduction via the Johnson-Lindenstrauss Lemma #

We will discuss a number of dimension reduction techniques. Among one of my favorite theorems in this area is due to William Johnson and Joram Lindenstrauss and states that if you are willing to change distances a little bit, any data set can have its dimension substantially reduced. Here is their lemma, in its full glory:

Lemma: Suppose \(\epsilon \in (0,1)\) and that \(\mathcal{D} \subset \mathbb{R}^k\) consists of \(n\) points. Then there is a linear function \(f: \mathbb{R}^k \rightarrow \mathbb{R}^d\) such that \[(1-\epsilon) \rho(u,v)^2 \leq \rho(f(u),f(v))^2 \leq (1+\epsilon)\rho(u,v)^2, \] provided that \(d > 8 \cdot \tfrac{\ln n}{\epsilon^2}\) and \(\rho\) is the usual Euclidean metric.

Note: Functions with bounds like those above are known as bi-Lipschitz.

In plain English, the lemma states that an intrepid data-scientists can always embed a data set in a small-dimensional space and as such, this lemma is often considered a partial cure for curse of dimensionality in machine learning. The price is changing the distances between points by a factor of at most \(\epsilon\). The dimension is logarithmic in \(n\), the number of points, and the transformation between the two spaces is linear. In fact, the original proof constructed \(f\) as an orthogonal projection.

Exercise: Try your hand at the following notebook. It constructs the projection \(f\) and verifies that the distances vary only by the predicted amounts. If you read the code, you will see how amazingly simple this projection is to construct (one line!). The notebook also suggests there are times when you would be better off doing something else.