MNIST revisited

MNIST digits and the perceptron #

By now you are familiar with the MNIST data set of handwritten digits. We used logistic regression and gradient descent to learn a model that distinguishes pairs of digits. Can we do the same, this time using the perceptron?

Homework exercise:

Follow the outline of our original MNIST lab to train a digit-recognition perceptron. The Colab notebook contains some useful code, although you will have to fill in some of the details. In your write up, address the following questions:

How does the size of the training data affect validation accuracy? Report your results in a table and repeat your expriment for least two pairs of digits.
What happens when you find a separating hyperplane for the training data early on in the training process and your validation accuracy is not yet good enough? Describe a possible modification to the perceptron algorithm that will allow you to keep learning longer.
Explain why classification accuracy instead of MSE or logistic loss is the appropriate measure of the quality of the perceptron.
Does the data set \(\mathcal{D}\) actually need to be linearly separable for the perceptron algorithm to be useful?