Real-world face recognition datasets exhibit long-tail characteristics, which results in biased classifiers in conventionally-trained deep neural networks, or insufficient data when long-tail classes are ignored. In this paper, we propose to handle long-tail classes in the training of a face recognition engine by augmenting their feature space under a center-based feature transfer framework. A Gaussian prior is assumed across all the head (regular) classes and the variance from regular classes are transferred to the long-tail class representation. This encourages the long-tail distribution to be closer to the regular distribution, while enriching and balancing the limited training data. Further, an alternating training regimen is proposed to simultaneously achieve less biased decision boundaries and a more discriminative feature representation. We conduct empirical studies that mimic long-tail datasets by limiting the number of samples and the proportion of long-tail classes on the MS-Celeb-1M dataset. We compare our method with baselines not designed to handle long-tail classes and also with state-of-the-art methods on face recognition benchmarks. State-of-the-art results on LFW, IJB-A and MS-Celeb-1M datasets demonstrate the effectiveness of our feature transfer approach and training strategy. Finally, our feature transfer allows smooth visual interpolation, which demonstrates disentanglement to preserve identity of a class while augmenting its feature space with non-identity variations.
Figure 1: The proposed framework includes a feature extractor Enc, a decoder Dec, a feature filtering module R, and a fully connected layer as classifier FC. The proposed feature transfer module G generates new features from original features. The network is trained with an alternative bi-stage strategy. At stage 1, we fix Enc and apply feature transfer G to generate new features (green triangle) that are more diverse and likely to violate decision boundary. In stage 2, we fix the rectified classifier FC, and update all the other models. As a result, the samples that are originally on or across the boundary are pushed towards their center (blue arrows in bottom right). Best viewed in color.