Object Detection with DIGITS and Caffe

This is a simplified version of the notes I took while creating and training a neural network to detect balls in the FRC 2017 game. This will be further updated with additional detail as time allows.

Requirements: Vatic, DIGITS, NvCaffe

An input data set of images with the object for detection in the target environment needs to be captured. About one minute of HD video at 30 FPS should be sufficient.

In order for DetectNet to find the location of the bounding boxes, it needs to know the ground truth position and location of these in the training data set. Vatic can be used for this process as it will allow for the division of effort of annotating the objects in the images between multiple individuals, and it requires very little training of the operators.

The output.txt file produced by vatic details the bounding box locations for each frame of the input image. A python script is then used to extract the frame numbers and dimensions from this file and build the folder structure demanded by DetectNet, which is the same as the KITTI data set. The layout appears as follows:


In DIGITS, create a new object detection data set with the folders as described above. In order to use the original DetectNet network without modification, adjust the padding and resize options so the final output size is 1248×384. For images with heights greater than 384, this may first involve padding to fit the image to a 3.25 aspect ratio followed by a resize. The remainder of the settings here can be left as default, and modification of custom classes can be avoided by simply using class names from the KITTI dataset, such as car or pedestrian for the new input.

Then, create a new object detection model from this dataset. Set training epochs to 100, batch size to the graphics card’s limits (a batch size of 10 requires 12 GB of dedicated video memory), solver type to Adam, learning rate to 0.0001, and policy to exponential decay. Disable mean image subtraction. Choose custom network and paste the contents of the reference DetectNet network. To speed learning, download the pretrained weights for the GoogLeNet network to the computer running DIGITS and set the path below accordingly. Click create to begin training.

The model will begin training, and may take hours even on top of the line graphics cards. The output named mAP is close to percentage accuracy for this task and it will increase from zero if the model is learning.

Final output:

Predicting the Presence of Cancer in Medical Images using Convolutional Neural Networks

Paper submitted to Missouri Junior Science, Engineering, and Humanities Symposium for 2017


Convolutional neural networks model human brain activity from sensory inputs and have been applied to various activities such as driving cars, recognizing speech, and playing complex games. These machine learning algorithms gained popularity in recent years as an effective method for pattern recognition. This paper proposes a novel convolutional neural network architecture to detect lung cancer in radiographic images. Given that patients routinely have x-rays taken for pre-emptive health screenings, and the growing amount of medical data from these imaging procedures, the need for automated detections of abnormalities increases. The solution used a dataset of labeled x-ray images of patients with confirmed instances of lung cancer combined with an equal set of cancer-free patients. The network relied on rectified linear unit activations and dropout to reduce overfitting. Then, this neural network model was trained with a standard backpropagation algorithm with an Adam optimizer using softmax regression. This model demonstrated a 91% accuracy rate on lung cancer chest x-rays.


title = {Predicting the Presence of Cancer in Medical Images using Convolutional Neural Networks},
author = {Monahan, Connor},
note =
{preprint on webpage at \url{connormonahan.net}},
year = {2017}

Article (PDF)