Understanding the world around us and making decisions about the future is a critical component to human intelligence. As autonomous systems continue to develop, their ability to reason about the future will be the key to their success. Semantic anticipation is a relatively under-explored area for which autonomous vehicles could take advantage of (e.g., forecasting pedestrian trajectories). Motivated by the need for real-time prediction in autonomous systems, we propose to decompose the challenging semantic forecasting task into two subtasks: current frame segmentation and future optical flow prediction. Through this decomposition, we built an efficient, effective, low overhead model with three main components: flow prediction network, feature-flow aggregation LSTM, and end-to-end learnable warp layer. Our proposed method achieves state-of-the-art accuracy on short-term and moving objects semantic forecasting while simultaneously reducing model parameters by up to 95% and increasing efficiency by greater than 40x.

Overview

Figure 1. Our proposed approach aggregates past optical flow features using a convolutional LSTM to predict future optical flow, which is used by an learnable warp layer to produce the future segmentation mask.

Efficiency

Table 1. Computational complexity analysis with respect to previous work. Models are measured without the SegCNN included (only SegPred). Runtime estimates were calculated by averaging 100 forward passes with each model. Single vs. sliding testing describes using a single forward pass of 512 × 1,024 resolution, relative to the costly sliding window approach of eight overlapping 713 × 713 full resolution crops.

Results

Table 2. Comparison of available baselines for short-term (t = 3) and mid-term (t = 9) semantic forecasting tasks for all nineteen classes in Cityscapes. We further emphasize our models capability on the eight foreground, moving objects (MO) classes.

Results

Table 3. Comparison of available baselines for short-term (t = 1) and mid-term (t = 10). † indicates a model contained no recurrent fine-tuning. We compare our model with FlowNet2-c and FlowNet2-C backbones, where C contains approx. 8/3 more feature channels.

Sample Results

AR-Ped Results

Segmentation Forecasting Source Code

Recurrent Flow-Guided Semantic Forecasting implementation in python and caffe may be downloaded from here.

If you use the segmentation forecasting code, please cite the WACV 2019 paper:

Publications

  • Recurrent Flow-Guided Semantic Forecasting
    Adam M. Terwilliger, Garrick Brazil, Xiaoming Liu
    Proc. IEEE Winter Conference on Application of Computer Vision (WACV 2019), Hawaii, Jan. 2019
    Bibtex | PDF | arXiv
  • @inproceedings{ recurrent-flow-guided-semantic-forecasting,
      author = { Adam M. Terwilliger and Garrick Brazil and Xiaoming Liu },
      title = { Recurrent Flow-Guided Semantic Forecasting },
      booktitle = { Proc. IEEE Winter Conference on Application of Computer Vision },
      address = { Hawaii },
      month = { January },
      year = { 2019 },
    }