The problem of handwritten digit recognition has been used as base line for many AI models. It was amongst the first real life problem solved by neural network and with great accuracy (~99%).
For the purposes of this post we will be using the famous mnist dataset, containing around 70 000 28×28 images of handwritten digits, created by more than 500 different people.
You need to have basic understanding of artificial neural networks before we start. Check our previous post for basic introduction to neural networks.

### Before we start

You will need to install the following items on your machine before we start:

• python 3+ : install it from the official web site
• tensorflow : easiest way is to install it with pip, but you can use any other way. It doesn’t really matter if you will install GPU support, but in order to do so you need to make sure your graphics card supports it and also install the appropriate CUDA driver before installing tensorflow. In this example I will be using CPU version of tensorflow which should run everywhere.
• we will also be using the following python packages numpy, keras, PIL, matplotlib, mnist

Next download the mnist data set. Download everything train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz, t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz. We will need both the train and the test data with the corresponding labels.
The mnst data set is images of size 28×28 pixels converted to arrays ready to be passed as input to neural networks.

### Some clarifications

Tensorflow is framework for effective calculations with matrices. It is a computational graph. Every parameters inside the graph is matrix. On each connection between the nodes an operation is performed. In a way the matrices flow through the graph and tensor is actually generalized matrix which can be also vector, single number or even empty vector. This inspired the name tensor-flow. The officially supported language for tensorflow is python, but it can be used with other languages also.

Keras is a python library which greatly simplifies the way we interact with tensorflow. Because tensorflow is a general purpose computational framework it provides a wider set of features not necessarily needed for the purposes of creating neural networks. Keras gives us a nice API to call only those portions of tensorflow that are applicable for AI.

### Building the neural network

First we need to determine how are we going to pass the images as input to the neural network. Actually that is already solved in the mnst data set. Every image is scaled to 28 x 28 size. Colors are converted to black and white only and every picture is converted to array of pixels, where the value of the pixel is 0 for white and 255 for black. Everything in between is shades of gray (in this case 253 shades of gray, not 50 as most people would suggest). At the end we have array of 748 elements and each element represents a pixel in the image.

The output again needs to be an array of numbers. We can represent any digit with the so called one hot encoding. We have 10 digits, so we create array of length 10 for each of the digits. The array is all zeroes except for the index of the digit where we put 1.
This will give us the following encoding:

Creating a artificial neural network architecture is not an easy task. This is why we chose to use one of the popular architectures found online. We have our own attempts of architectures but, lets not bother with them for now :).

The input layer will have 748 nodes, because the image pixel array is that long. The output layer is of size 10 nodes, because that is how we encoded the digits.
Now the hidden layers, we will use 3 fully connected (dense) layers, of sizes respectively 512, 256, 128.

A rough representation of how the network would look like. The actual node count is different, but it would be hard to visualize the actual nodes.
Now to implement this using Keras:
We use Sequential model which is a linear stack of layers.

We add the input layer, with input size 748 and 512 nodes. The explicit definition of the input layer is actually omitted in keras. When we define the first hidden layer and provide the ‘input_dim‘ parameter the input layer is automatically created behind the scenes.
For the hidden layers we will use the ‘ReLU‘ activation function.

next the actual hidden layers

Finally the output layer

We used ‘sigmoid‘ activation, because this way the neural net performed slightly better.

We need to ‘compile’ the model, by providing loss function and optimizer.
Since we are dealing with classification problem, our loss function will be ‘binary cross entropy‘.
As optimizer we will use Adam, which is implementation of gradient descent algorithm.
‘metrics’ is just for logging.

### Training of Handwritten digit recognition ANN

here ‘images’ will be array of 748 sized arrays of all the images and labels will be digits corresponding to each of the pixel arrays.
We need to convert the label to the encoding agreed above

The whole function will look like this:

Now we can convert the train data into nympy arrays and feed them to the neural network

We do 50 train iteration or epochs. One epoch is when all the training data passes once through the network.
verbose‘ level is just for logging.

Training could take some time, on my machine around 15 minutes. After the training completes we save the model to file.

Here is the whole code we have so far

The log output looks like this

We get accuracy: 0.9805 on the first epoch, which is very promising. The loss function that measures how far off we are with the predictions. It gets minimized and after the 35th epoch it stops changing, or changes by very little. Same goes for the accuracy which means that we can probably get away with fewer iterations.

### Testing of the Handwritten digit recognition neural network

We can load the test data from using the same logic as with the training data.

Next we load the model and the training data.

The input for the artificial neural network is provided as nympy array

The whole data is fed to the model and we get array with the predictions for each of the inputs. The format of the predictions is defined by our output layer. (the encoding of numbers above).

Now we just need to print the result. A simple method for converting from one hot encoded vectors to numbers is created

Lets see how accurate is the network with the test data

We have 283 wrong predictions, out of 10 000 test examples, which the network has never seen before. 99,97% accuracy on unknown data.. we can conclude that the network is working pretty well and the handwritten digit recognition problem seems solved 🙂
Here are some of the wrong predictions

Well its hard to tell why they are wrong, lets visualize some of the predictions. (Actual is what the network predicted and expected is what actually is drawn)

As you can see some of the numbers are pretty had to guess, others have wrong labels.

We used this function to draw the mnist data set:

If you are interested in AI and NLP models, we at digitalowl host pre-trained models in the cloud.
Feel free to check out free python NLP client and contact us with any questions.

Categories: AI