Machine learning: k-means clusters

\(k\)-means cluster centers #

Background #

In the Faces and dimension reduction lab, you used the singular value decomposition to compute a low-rank approximation for a data set of images, replacing vectors \(\{\vec{x}_i\} \in \mathbb{R}^{2914}\) with vectors \(\{\vec{c}_i\} \in \mathbb{R}^m \) for some value of \(m\). Following convention, we will call \(\mathbb{R}^{2914}\) image space and \(\mathbb{R}^m \) feature space. The \(\vec{x}_i\) were encoded as the columns of a matrix \(X\) and the \(\vec{c}_i \) as the columns of a matrix \(C\). The following is central:

Question: Does clustering work better in feature space than in image space? That is, does this dimension reduction technique yield better clusters?

We followed one approach to answer this question in the lab. This problem asks you to evaluate another.

Cluster centers #

Start by choosing a set of images, encode them as columns of a matrix \(X\), and compute the low-rank approximation \(C\) as detailed above.

  • Use the \(k\)-means algorithm to construct a reasonable number of clusters in feature space:
    from sklearn.cluster import KMeans
    kmeans  = KMeans(n_clusters=17).fit(C)
    centers = kmeans.cluster_centers_ 
    
    This algorithm follows usual Python syntax: vectors being clustered are the rows of the array. Take the transpose if appropriate.
  • You will now have an array of cluster centers, each a vector in feature space. As they are vectors in feature space, they are difficut to interpret.
  • Use the singular value decomposition you computed above to recast each center as a vector \(\vec{y}\) in image space. In order to know what to do, you will have to carefully understand the SVD of \(X\).
  • Finally, plot \(\vec{y}.\) Mimic the commands for the lab.
Homework exercise:

Briefly summarize your work. In particular, detail how you computed the vector \(\vec{y}\) for each cluster center, submit the corresponding images, and suggest an answer to the question above based on your observations.

Finally, note that I asked you to use linear dimension reduction. What is the difficulty to repeating this experiment using non-linear neural network dimension reduction?