We provide five different datasets captured with several sensor types, including a stereo camera, an RGB-D camera and a time-of-flight camera. Each of the single datasets consists of two separate sequences from different locations around and within our research institute. All sequences were captured with all named depth sensors at the same time, therefore providing the same situations for each sensor. As a result, behaviour and capability of person detection algorithms can be benchmarked given the same 3D scenes, but in different quality due to the sensor data.
To capture the datasets, we utilized a Summit XL from Robotnik and equipped it with a Nerian SP1 stereo camera system, an ASUS Xtion Pro Live RGB-D camera and a Fotonic E70P time-of-flight camera.
Whereas the stereo as well as the RGB-D camera provided images with VGA resolution, the Fotonic E70P is limited to QQVGA.
Annotation files are saved as .yaml files and contain an associative map. The format is as follows:
|x_d: [depth roi pos x]|
|y_d: [depth roi pos y]|
|w_d: [depth roi width]|
|h_d: [depth roi height]|
|x_rgb: [visual roi pos x]|
|y_rgb: [visual roi pos y]|
|w_rgb: [visual roi width]|
|h_rgb: [visual roi height]|
The annotations always contain the time stamp / the id of the data quadruple. Regions of interest are provided for depth and intensity / color images. We formulate two different visibility levels or class with '1' for fully visible persons and '2' for only partially visible people. Implicitly, '0' is the code of 'non-human' or 'non-visible'. Theses are also the classification labels used by CS::APEX.
File System Structure
A dataset consists of two separate sequences, packed into separate folders within the archive. Each of these folders contains four directories:
There are always four files, one from each folder, which belong together and have the same name, but different extensions. To order all files in to a continuous sequence, the file names are based on Unix time stamps in relation to the recording sessions.
Excerpts from the different datasets
To achieve generally comparability of person detection approaches, we recommend using the Intersection over Union (IoU) metric for evaluation. This is a commonly used metric for benchmarking problems using 2D bounding rectangles.
The IoU measures the quality of ground truth / detection association, not only in terms of position but also in terms of shape.
We provide the datasets, information about the datasets, and the associated material (altogether subsequently denoted by the "Software") because we hope that they are useful to you. We have collected all data thoroughly and described them to the best of our knowledge. Copyright (c) 2017 Chair of Cognitive Systems Permission is hereby granted, free of charge, to any person downloading the Software, to use the datasets for research and academic purposes, including the right to publish experimental results obtained by using the data. The Software, however, must not be sold or redistributed. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.