The original venue for this post is on r/MachineLearning, reposted here as I have a working blog now.

I was recently working on a data analysis project, part of which needed to detect the value of a seven segment display. I started by applying an MNIST-trained model, but the error was very high. I went looking for a different dataset and was not finding any easily accessible ones. Eventually, I found this image generator that was designed to create datasets with seven segment display computer-generated images. I used it to create a dataset with ~17k samples, 0-9 digits. It worked well for my project, where I applied it with a random forest model. I don’t plan to write a paper so thought I’d leave the dataset here, to save someone else a few hours if faced with a similar task.

Link to Dataset: sevensegdataset.npy

Original generation and application, for reference: datasetgenerator.ipynb

Images are 28x28, flattened here.

features = np.load("sevensegdataset.npy")
X = features[:, :-1]
Y = features[:, -1]
images = X.reshape((-1, 28, 28))
plt.imshow(images[0], cmap='gray'), plt.imshow(images[1], cmap='gray')

Digit 0 Digit 9

Labels are integers 0-9:

Y[1], Y[2]

gives us

(9, 0)

Training a random forest classifier based on this dataset then allowed me to accurately classify individual inputs such as


that was first preprocessed into 5 separate images


More details on the preprocessing I used in this example can be found in readscale.ipynb.