Simple image recognition using


Image Recognition is a complex field generally restricted to experienced data scientists with in-depth knowledge of Deep Learning. There exist many tutorials on digit recognition (MNIST) or how to classify whether an image is a dog or a cat. Such tutorials use exclusively Tensorflow, Theanos, Caffe, Keras and other similar Deep Learning frameworks. Some explain how to train the machine from scratch and some explain how to re-train the machine. In this article we will show that the same methodology can be applied without needing to use Deep Learning, but through more classic Machine Learning tools like

For most Deep Learning projects you will use a pre-trained image recogniser, like Inception by Google or similar. You (re-)train it to recognise your image without the need for configuring a Convolutional Neural Network (CNN). It can be helpful if you find a pre-trained model that closely matches your problem. Your engineering skills will be needed more than any complex data science skills.

In this article we will explain how to deal with images using, starting from zero. The goal is not to obtain a model with improved accuracy; but rather to give a simple example of how to train an image classifier using a (classic) classification algorithm. This article is more a prologue to a few other articles which will explain how to improve deep neural networks using to pre-process your data, whatever its form.

In the first part we will briefly present the data set used. Then we will explain the two simple steps needed to obtain an image classifier. Then, the most interesting part: we will describe how pre-processes the image; and finally we will show some results.


Dataset notMNIST

So let’s begin with a data set of jpg images: the notMNIST data set provided by Google in the Udacity course (ud730):

  • a large data set of jpg images to train (train)
  • a smaller one to test (test)

This is a data set of characters with resolution 28x28. See example:

image data

The characters are A, B, C, D, E, F, G, H, I and J. Each image is stored in a folder labelled with its letter: “A” if it is an A, “B” if it is a B, and so on.


Step by step

The code is amazingly simple… There are only two steps:

  • a loop to flatten the image into a .csv file of (28 x 28 + 1) fields: one for each pixel, plus information denoting its character.
  • and we train using this .csv file.

The code and results are available here.


Go deeper: pre-processing by uses a really smart pipeline of Automatic Machine Learning to fit a model. It applies:

  • (smart aggregate generators to relational data; not used in this case!!)
  • an optimal statistical reduction (scientific paper: MODL) which selects relevant features and optimises them
  • and a bootstrap aggregating algorithm over a Naive Bayes classifier (scientific paper: SNB)

The statistical reduction step transforms raw images into optimised images (in other words statistically interesting):

From a raw image like this one:

raw image analytics predicsis optimises the image to the following:

image optimises

This pre-processing removes data noise and allows the machine learning to focus on the relevant data. (See our other papers that discuss this and Tensorflow ;-) )Then the classifier computes for each image to identify whether it is an A, B, etc.:

–> E (97.5%)

–> E (97.5%)


This baseline model has the following performance:

  • AUC: 0.96
  • Accuracy: 77%
  • Cumulative gain chart:
gain chart

and confusion matrix and recalls:

confusion matrix

How could it be simpler?? In the next post we will show you how to use smart aggregates over images to improve recognition ;-). In subsequent blog posts, we will show you how to use optimal statistical reduction (MODL) to reduce noise and accelerate training of a deep neural network (one example using a convolutional network and another using a recurrent network), using Tensorflow.

Please leave your comments below, share your experiments of image recognition and follow our blog :-)

Intrigued? To get a sense of what we do at, please visit our demo page