Face alignment is a process of applying a supervised learned model to a face image and estimating the locations of a set of facial landmarks, such as eye corners, mouth corners, etc. Face alignment is a key module in the pipeline of most facial analysis algorithms, normally after face detection and before subsequent feature extraction and classification. Therefore, it is an enabling capability with a multitude of applications, such as face recognition, expression recognition, face deidentification, etc. Despite the continuous improvement on the alignment accuracy, face alignment is still a very challenging problem, due to the non-frontal face pose, low image quality, occlusion, etc. Among all the challenges, we identify the pose invariant face alignment as the one deserving substantial research efforts, for a number of reasons.
Motivated by the needs to address the pose variation, and the lack of prior work in handling poses, as shown in Fig. 1, we proposed a novel regression-based approach for pose-invariant face alignment, which aims to estimate the 2D and 3D locations of face landmarks, as well as their visibilities in the 2D image, for a face with arbitrary pose (e.g., -90< yaw<+90).
The overall architecture of our proposed PIFA method is shown in Fig. 2. We first learn a 3D Point Distribution Model (3DPDM) from a set of labeled 3D scans, where a set of 2D landmarks on an image can be considered as a projection of a 3DPDM instance (i.e., 3D landmarks). For each 2D training face image, we assume that there exists the manual labeled 2D landmarks and their visibilities, as well as the corresponding 3D ground truth 3D landmarks and the camera projection matrix. Given the training images and 2D/3D ground truth, we train a cascaded coupled-regressor that is composed of two regressors at each cascade layer, for the estimation of the update of the 3DPDM coefficient and the projection matrix respectively. Finally, the visibilities of the projected 3D landmarks are automatically computed via the domain knowledge of the 3D surface normals, and incorporated into the regressor learning procedure.
As shown in Fig. 3, despite the large pose range of -90< yaw<+90, our algorithm does a good job of aligning the landmarks, and correctly predict the landmark visibilities. These results are especially impressive if you consider the same mean shape (2D landmarks) is used as the initialization of all testing images, which has very large deformations with respect to their final landmark estimation.