Learning to Fuse Information with Missing Modalities

One of the key GEOINT capabilities is to be able to automatically recognize a large array of objects from visual data. Depending on the resolution of imagery, objects may range from specific locations or scenes, road, building, forest to vehicle, human, etc. This is clearly a technically challenging problem for both computer vision and machine learning due to the large variations in the appearance of these objects exhibited in the imagery. To address this problem, researchers have developed various fusion methods that combine information collected from multiple sensing modalities, such as RGB imagery, LiDAR point cloud, multispectral imaging, hyperspectral imaging, and GPS, to improve the reliability and accuracy of object recognition. This research direction is motivated by the ever-decreasing sensoring cost, and more importantly, by the complementary characteristics among multiple sensing modalities. Therefore, with the well-funded promise of escalating object recognition performance, a great deal of data analysis research is in urgent need in order to fully take advantage of this massive amount of multi-modality data.

All the prior research oninformation fusion requires that the sensor data of all modalities are available for every training data instance. This requirement significantly limits the application of information fusion methods as missing modalities abound in practical applications.

In recognizing missing modalities as a roadblock toward fulfilling the key GEOINT capability, we propose to develop powerful and computationally efficient approaches that can learn to fuse information from different sensors when a significant portion of training data has missing modalities. The ultimate goal of our project is to develop a suite of computer vision and machine learning tools for geographical imagery analysis that can serve as an aid for geo-spatial analysts to facilitate the analysis and classification of geographical images.

Figure 1: Learning with missing modalities. Given a large collection of sensor data from multiple modalities, data imputation may take advantage of all available data to learn multi-object classifiers despite the missing modalities (white area). In contrast, the existing approaches may have to remove some modalities and/or training samples so that all remaining training samples are observed in all remaining modalities (dashed box).

CVPR 2017 source code may be downloaded from here.

If you use the code, please cite to the papers:

Learning to Fuse Information with Missing Modalities

Missing Modalities Imputation Source Code

Publications