Operator norms #
Background #
Mathematics is well-known for misappropriating concepts intended for something else. For instance, when Weierstraß used the convolution product in a proof of an eponymous theorem, he probably did not anticipate that one day convolutions would be used as feature detectors in data. The goal of this sequence of exercises is to misappropriate the operator norm of a matrix, a notion first used in functional analysis.
We often encode data as feature vectors which become rows or columns of some matrix \(A\). Sometimes we will want to to replace \(A\) by another matrix \(A’\) that is simpler: perhaps the data it encodes should be less noisy, or be of a lower dimension. In either case, \(A’\) should be close to \(A\), or in the language of analysis,
\[ \rho(A,A’) \approx 0 \]
for some choice of metric \(\rho\) defined on matrices. But how should we define \(\rho\)?
The action of a matrix #
One way of thinking about an \(m \times n\) matrix \(A\) is to unfold it into a long vector \(\tilde{A}\) in \(\mathbb{R}^{nm}\). Then there is a simple way of defining a norm for \(A\)! Pick your favorite norm \(||\cdot||\) on \(\mathbb{R}^{nm}\) and define \(||A||\) to equal \(||\tilde{A}||\). We have seen in a previous exercise that a norm lets us define a metric, which is what we wanted. This is a totally reasonable thing to do, and such uninspired thinking leads to:
But there is a different way to proceed. Let us think of an \(m \times n\) matrix as a function \(A: \mathbb{R}^n \rightarrow \mathbb{R}^m\). As such both inputs and outputs are vectors and you can think of \(A\) as somehow transforming one vector space into the other. The main idea before another way of defining a norm of \(A\) is to focus on what \(A\) does to these vectors, and in particular, their lengths. Some vectors may get mapped to longer vectors and some to shorter ones. But:
The answer is yes and you will be asked to prove this in an exercise below. It enables us to make the following definition:
Let \(A\) be an \(m\times n\) matrix and endow both \(\mathbb{R}^n \) and \(\mathbb{R}^m \) with the usual 2-norm \(||\cdot ||_2\). The operator norm \(||\cdot||_2\) of a matrix \(A\) is defined by
\[ ||A||_2 = \max_{x \in \mathbb{R}^n} \frac{||Ax||_2}{||x||_2}\]
It measures the maximum amount of stretching done by \(A\) as it transforms the vector space \(\mathbb{R}^n\) into \(\mathbb{R}^m \). This definition can be generalized. The choice of \(p=2\) is arbitrary, and we can define more general operator norms \(||\cdot||_p\) and \(||\cdot||_{p,q}\) in a similar manner. The latter uses the \(p\)-norm on \(\mathbb{R}^n \) and the \(q\)-norm on \(\mathbb{R}^m \).
Show that our definition of an operator norm makes sense by showing that the maximum \[\max_{x \in \mathbb{R}^n} \frac{||Ax||_2}{||x||_2}\]
exists.
Hint
Use the following steps in your solution:
-
Use the properties of a matrix as a linear operator to show that for every \(x \in \mathbb{R}^n\) there exists a \(y \in \mathbb{R}^n\) of unit length so that: \[\frac{||Ax||}{||x||} = \frac{||Ay||}{||y||} \] Explain why this means that we need only to consider \(x\) with \(||x|| = 1\).
-
Show that \(f(x) = Ax\) and \(g(x) = ||Ax||\) are continuous functions with domain \(\mathbb{R}^n\).
-
Show that the unit sphere defined using the norm \(|| \cdot ||\) in \(\mathbb{R}^n\) is compact.
-
Explain why \(\frac{||Ax||}{||x||}\) attains a maximum value on \(\mathbb{R}^n\).