Person Detection using weakly supervised localization

Hamd ul Moqeet Riaz

This project deals with detection of people in thermal images (Infrared) using Deep Convolutional Neural Networks (CNNs). Infrared images makes it easier to separate background from objects. However, since they don't contain colour information, it is harder to differentiate objects of similar shapes (for example sign post from people). Convolutional neural networks can extract detailed object features and classify them at varying scales and orientations. We have employed an object recognition CNN model (modified VGG-16) to detect humans in the IR images. Class activation mapping (CAMs) technique has been used to localize and predict bounding boxes around the people. By visualizing CAMs, one can highlight region of importance in an image during prediction of a particular class by a CNN model. The network is trained using only the image labels, but during test time, it can predict the location of humans as well (weakly supervised).

Fig.1This image shows how bounding boxes are generated using class activation mapping from a object recognition CNN model

The test results show reasonable accuracy for recognition and localization. The model was also tested on various platforms including ones with limited computation power (NVIDIA Jetson TX2) and still showed reasonable performance.

Fig.2 The normalized confusion matrix on our dataset