Multivariable calculus: Handwritten digits

Computer vision and hand-written digit recognition #

written and developed by Thomas Pietraho

Background #

One of the oldest and most-studied problems in computer vision is the automated reading of human handwriting, and in particular, automated recognition of hand-written digits. The first implementations of machine ZIP code reading systems occurred in the 1980s, but the results were generally poor and suffered from high error rates. The late 1990s saw significant improvements and the error rates for the recognition of individual digits were reduced to around 1%. The short table below gives a few of the most recent results:

Year Author Error
2013 Goodfellow et al. 0.50%
2014 Graham 0.30%
2018 Kowsari et al. 0.18%
2019 me (!) 0.20%

Modern results are sufficiently good that the problem of hand-written recognition is considered solved and current efforts concentrate on letter recognition, see for instance the EMNIST data set, and other more complex vision tasks. Our lab will examine how well simple feed-forward dense neural networks can perform when asked to identify a single digit.

Data set #

The MNIST, short for Mixed National Institute of Standards and Technology, database was introduced by LeCun et al. in 1998. Since its introduction, this data set has been perhaps the most common testing ground for a variety of machine learning and pattern recognition algorithms. The data consists of 70,000 grey-scale images of handwritten digits, each represented as a 28x28 array of real numbers. Of these, 60,000 are in a training set and the remaining 10,000 are in a validation set. More than 250 writers were used to make the data set, drawn from high-school students and census bureau employees. The training and validation sets were selected so that an individual writer is only used in one of the two datasets.

Examples from the MNIST data set.
Images by Josef Steppan and Baldominos, Saez, and Isasi.
none

Central question #

In this lab, we will single out one digit, say a 7, and ask the following question:

Question: Does an image represent the digit 7?

We will use a Mathematica notebook to train neural networks and a shared spreadsheet to report and compare results. But before we start, we have to figure out how to think about this question more mathematically. We cover our approach in the next section. Read on.

Technical framework #

We would like to cast the above problem into the machine learning framework we have established. For us, a data set has the form of a collection of input-output pairs:

\[\mathcal{D} = \{(\vec{x}_i, y_i)\}_{i=1}^n \]

Given a data set, the challenge is then to find a function so that \(f(\vec{x}_i) \approx y_i \) At the moment, however, what we actually have is a collection of 28x28 grey-scale images each one representing some digit. But there is a nice way to make all of this more numerical. We can think of each image as a list of 784 pixels where shade is represented by a number between 0 and 1. So mathematically, an image in our data set is is just a vector in 784 dimensions: let’s call the \(i\)th one \(\vec{x}_i\). The corresponding output is a “yes” or “no”, answering the question whether the digit is indeed a 7. To make things numerical, we will let \(y_i\)equal 1 if the answer is “yes,” and 0 otherwise. Now our data is ready for mathematical analysis.

The goal of the lab is to find a neural network that fits this data. The Mathematica notebook will serve as a guide. The particulars of building and training a neural network are in the notebook itself, so follow its lead. While there are many possible directions to explore, we will focus on the following questions:

Question 1: Finding the best classifier First, let's figure out how to build the best digit classifier, at least with the mathematical tools currently at our disposal. Choose your favorite digit and use our Mathematica notebook to train a neural network to recognize it from among the MNIST images. Examine the effect of the number of hidden layers, the number of neurons in each, the activation functions, as well as the optimization method on the accuracy of a trained neural network on our validation set. What is the highest level of accuracy you can achieve? Report it on the class spreadsheet.
Question 2: What is the impact of training set size? MNIST is a fairly large data set and for many practical questions, much less data is available. Determine the impact of training set size on classifier accuracy by reducing training set size several times. Each time, train the best classifier you can. You can use Mathematica to plot validation accuracy versus training set size: see the examples in ListPlot.
Question 3: Are some digits easier to identify than others? Use the collective data to support your answer or design your own experiment and carry it out.

Summarize your results #

For your notes, summarize your answers to the questions above. Briefly describe the choices you made, the neural network architectures that gave you the best results, and give supporting evidence for any conclusions.