Pedestrian detection is a critical problem in computer vision with an significant impact on safety in urban autonomous driving. In this work, we introduce a cascade of networks containing an exhaustive region proposal network followed by deep region classification network. We extend the region proposal network using a 30 class semantic segmentation side task to reduce confusion between non-pedestrian classes and improve overall proposal quality. The classes used for segmentation are based on classes commonly seen in urban street scenes including roads, buildings, vehicles, pedestrians, traffic signs, and trees, among many others. In addition to multi-task learning, we further treat the predicted segmentation masks as strong features for our main task of pedestrian detection. Through providing our system with detailed semantic information we are able to reduce false alarms with such classes commonly mistaken for pedestrians.

Sample Results