The data manifold

Unfolding the data manifold #

Background #

In the final part of the Faces and dimension reduction lab, you trained a neural network to distinguish among the faces of a set of individuals. The output of one of the intermediate dense layers was used as a method of non-linear dimension reduction and perhaps not unexpectedly, you found evidence that the training data embedded in this way was nicely clustered. In other words, you found evidence that the network has learned how to unfold the data manifold. But a question remains:

Question: Does this result extend to images of individuals who were not present in the data used to train the neural network?

More precisely, has the network really learned to unfold the data manifold, or has it merely learned how to distinguish among the faces in its training data?

Procedure #

The idea for this experiment is simple. After training a neural network as above, you will look at two individuals not included in the training data and examine whether their embedded images cluster as well.

Choose a set of individuals and use images of their faces to train a classifier neural network.
Compare the inter- and intra-cluster distances for pairs of individuals in an dense embedding layer and assess whether you see evidence of good clustering in this low-dimensional embedding.
Now choose two individuals from the Labeled Faces in the Wild data set who were not in the data used to train the network and compare the inter- and intra-cluster distances between them on the same embedding layer. Make sure you choose individuals who have at least a dozen or so images available.

Homework exercise: Briefly detail your findings. Include images you produced and assess whether you have found evidence that your neural network embedding has really unfolded the data manifold in a way that extends to individuals not present in the training data.