\(k\)-means cluster centers #
Background #
In the Faces and dimension reduction lab, you used the singular value decomposition to compute a low-rank approximation for a data set of images, replacing vectors \(\{\vec{x}_i\} \in \mathbb{R}^{2914}\) with vectors \(\{\vec{c}_i\} \in \mathbb{R}^m \) for some value of \(m\). Following convention, we will call \(\mathbb{R}^{2914}\) image space and \(\mathbb{R}^m \) feature space. The \(\vec{x}_i\) were encoded as the columns of a matrix \(X\) and the \(\vec{c}_i \) as the columns of a matrix \(C\). The following is central:
We followed one approach to answer this question in the lab. This problem asks you to evaluate another.
Cluster centers #
Start by choosing a set of images, encode them as columns of a matrix \(X\), and compute the low-rank approximation \(C\) as detailed above.
- Use the \(k\)-means algorithm to construct a reasonable number of clusters in feature space:
This algorithm follows usual Python syntax: vectors being clustered are the rows of the array. Take the transpose if appropriate.
from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=17).fit(C) centers = kmeans.cluster_centers_
- You will now have an array of cluster centers, each a vector in feature space. As they are vectors in feature space, they are difficut to interpret.
- Use the singular value decomposition you computed above to recast each center as a vector \(\vec{y}\) in image space. In order to know what to do, you will have to carefully understand the SVD of \(X\).
- Finally, plot \(\vec{y}.\) Mimic the commands for the lab.
Briefly summarize your work. In particular, detail how you computed the vector \(\vec{y}\) for each cluster center, submit the corresponding images, and suggest an answer to the question above based on your observations.
Finally, note that I asked you to use linear dimension reduction. What is the difficulty to repeating this experiment using non-linear neural network dimension reduction?