Our dataset consist of manually contour-labeled pedestrian images captured from a vehicle-mounted calibrated stereo camera rig in an urban environment. For each pedestrian cutout we provide a 24 bit PNG image, a float disparity map and a ground truth shape.
Dense stereo is computed using the semi-global matching algorithm (H. Hirschmueller, Stereo processing by semi-global matching and mutual information, IEEE Trans. on PAMI, 30(2):328-341, 2008).
The 785 image cut-outs have a height between 34 and 468 pixels and a width between 11 and 267 pixels. In our BMVC’13 publication only samples with a height greater than 120 pixels are used. We provide the samples with an additional 10 % border to each side.
The dataset that we used for training our Boosted Decision Tree Ensemble derives from the publication
T. Scharwchter, M. Enzweiler, U. Franke, and S. Roth. “Efficient Multi-Cue Scene Segmentation”. In Lecture Notes in Computer Science (Proc. of the German Conference on Pattern Recognition (GCPR)), volume 8142. Springer, 2013. It can be downloaded here.