Deep CNNs have been pushing the frontier of visual recognition over past years. Besides recognition accuracy, strong demands in understanding deep CNNs in the research community motivate developments of tools to dissect pre-trained models to visualize how they make predictions. Recent works further push the interpretability in the network learning stage to learn more meaningful representations. In this work, focusing on a specific area of visual recognition, we report our efforts towards interpretable face recognition. We propose a spatial activation diversity loss to learn more structured face representations. By leveraging the structure, we further design a feature activation diversity loss to push the interpretable representations to be discriminative and robust to occlusions. We demonstrate on three face recognition benchmarks that our proposed method is able to achieve the state-of-art face recognition accuracy with easily interpretable face representations.

overview network structure

Figure 1: The overall network architecture of the proposed method.

Qualitative Evaluation

Visualization of filter response "heat maps" of 10 different filters on faces from different subjects (Top 4 rows) and the same subject (Bottom 4 rows). The positive and negative responses are shown as two colors within each image. Note the high consistency of response locations across subjects and across poses.

Figure 2: Visualization of filter response.

The average locations of positive (Left 3 faces) and negative (Right 3 faces) peak responses of 320 filters for three models: (a) base CNN model (d=6.9), (b) our (SAD only, d=17.1), and (c) our model (d=18.7), where d quantifies the average locations spreadness. The color on each location denotes the standard deviation of peak locations. The face size is 96 * 96.

Figure 3: The spreadness of feature activation locations.

Figure 4: The correspondence between feature difference magnitude and occlusion locations.

Towards Interpretable Face Recognition Source Code

The training, testing code, and pre-trianed model are available at here


  • Towards Interpretable Face Recognition
    Bangjie Yin*, Luan Tran*, Haoxiang Li, Xiaohui Shen, Xiaoming Liu
    In Proceeding of International Conference on Computer Vision (ICCV 2019), Seoul, South Korea, Oct. 2019 (Oral presentation)
    Bibtex | PDF | arXiv | Code | Video
  • @inproceedings{ towards-interpretable-face-recognition,
      author = { Bangjie Yin* and Luan Tran* and Haoxiang Li and Xiaohui Shen and Xiaoming Liu },
      title = { Towards Interpretable Face Recognition },
      booktitle = { In Proceeding of International Conference on Computer Vision },
      address = { Seoul, South Korea },
      month = { October },
      year = { 2019 },