Pedestrian detection is a critical problem in computer vision with significant impact on safety in urban autonomous driving. In this work, we explore how semantic segmentation can be used to boost pedestrian detection accuracy while having little to no impact on network efficiency. We propose a segmentation infusion network to enable joint supervision on semantic segmentation and pedestrian detection. When placed properly, the additional supervision helps guide features in shared layers to become more sophisticated and helpful for the downstream pedestrian detector. Using this approach, we find weakly annotated boxes to be sufficient for considerable performance gains. We provide an in-depth analysis to demonstrate how shared layers are shaped by the segmentation supervision. In doing so, we show that the resulting feature maps become more semantically meaningful and robust to shape and occlusion. Overall, our simultaneous detection and segmentation framework achieves a considerable gain over the state-of-the-art on the Caltech pedestrian dataset, competitive performance on KITTI, and executes 2x faster than competitive methods.

Sample Results

Sample video for SDS-RCNN using Caltech Test Sequences

SDS-RCNN Source Code

SDS-RCNN implementation in Matlab and Caffe may be downloaded from here.

If you use SDS-RCNN code, please cite the ICCV 2017 paper:

Publications

  • Illuminating Pedestrians via Simultaneous Detection & Segmentation
    Garrick Brazil, Xi Yin, Xiaoming Liu
    In Proceeding of International Conference on Computer Vision (ICCV 2017), Venice, Italy, Oct. 2017
    Bibtex | PDF | arXiv
  • @inproceedings{ illuminating-pedestrians-via-simultaneous-detection-segmentation,
      author = { Garrick Brazil and Xi Yin and Xiaoming Liu },
      title = { Illuminating Pedestrians via Simultaneous Detection & Segmentation },
      booktitle = { In Proceeding of International Conference on Computer Vision },
      address = { Venice, Italy },
      month = { October },
      year = { 2017 },
    }