Like MNIST, but for 7-segment displays
The original venue for this post is on r/MachineLearning, reposted here as I have a working blog now.
I was recently working on a data analysis project, part of which needed to detect the value of a seven segment display. I started by applying an MNIST-trained model, but the error was very high. I went looking for a different dataset and was not finding any easily accessible ones. Eventually, I found this image generator that was designed to create datasets with seven segment display computer-generated images. I used it to create a dataset with ~17k samples, 0-9 digits. It worked well for my project, where I applied it with a random forest model. I don’t plan to write a paper so thought I’d leave the dataset here, to save someone else a few hours if faced with a similar task.
Link to Dataset: sevensegdataset.npy
Original generation and application, for reference: datasetgenerator.ipynb
Images are 28x28, flattened here.
features = np.load("sevensegdataset.npy")
X = features[:, :-1]
Y = features[:, -1]
images = X.reshape((-1, 28, 28))
plt.axis('off')
plt.imshow(images[0], cmap='gray'), plt.imshow(images[1], cmap='gray')
Labels are integers 0-9:
Y[1], Y[2]
gives us
(9, 0)
Training a random forest classifier based on this dataset then allowed me to accurately classify individual inputs such as
that was first preprocessed into 5 separate images
More details on the preprocessing I used in this example can be found in readscale.ipynb.