Perceiving the physical world in 3D is fundamental for selfdriving applications. Although temporal motion is an invaluable resource to human vision for detection, tracking, and depth perception, such features have not been thoroughly utilized in modern 3D object detectors. In this work, we propose a novel method for monocular video-based 3D object detection which leverages kinematic motion to extract scene dynamics and improve localization accuracy. We first propose a novel decomposition of object orientation and a self-balancing 3D confidence. We show that both components are critical to enable our kinematic model to work effectively. Collectively, using only a single model, we efficiently leverage 3D kinematics from monocular videos to improve the overall localization precision in 3D object detection while also producing useful by-products of scene dynamics (ego-motion and per-object velocity). We achieve state-of-the-art performance on monocular 3D object detection and the Bird’s Eye View tasks within the KITTI self-driving dataset.
Kinematic 3D Object Detection in Monocular Video
Garrick Brazil, Gerard Pons-Moll, Xiaoming Liu, Bernt SchieleKeywords: 3D Object Detection
Kinematic 3D Source Code
Kinematic 3D implementation in Python and Pytorch can be downloaded from here.
If you use the Kinematic 3D code, please cite the ECCV 2020 paper.
Publications
-
Kinematic 3D Object Detection in Monocular Video
Garrick Brazil, Gerard Pons-Moll, Xiaoming Liu, Bernt Schiele
In Proceeding of European Conference on Computer Vision (ECCV 2020), Virtual, Aug. 2020
Bibtex | PDF | arXiv | Supplemental | Code | Video