Face alignment is a process of applying a supervised learned model to a face image and estimating the locations of a set of facial landmarks, such as eye corners, mouth corners, etc. Face alignment is a key module in the pipeline of most facial analysis algorithms, normally after face detection and before subsequent feature extraction and classification. Therefore, it is an enabling capability with a multitude of applications, such as face recognition, expression recognition, face deidentification, etc. Despite the continuous improvement on the alignment accuracy, face alignment is still a very challenging problem, due to the non-frontal face pose, low image quality, occlusion, etc. Among all the challenges, we identify the pose invariant face alignment as the one deserving substantial research efforts, for a number of reasons.
Motivated by the needs to address the pose variation, and the lack of prior work in handling poses, as shown in Fig. 1, we proposed a novel regression-based approach for pose-invariant face alignment, which aims to estimate the 2D and 3D locations of face landmarks, as well as their visibilities in the 2D image, for a face with arbitrary pose (e.g., -90< yaw<+90).

Figure 1: Given a face image with an arbitrary pose, our proposed algorithm automatically estimates the 2D locations and visibilities of facial landmarks, as well as 3D landmarks. The displayed 3D landmarks are estimated for the image in the center. Green/red points indicate visible/invisible landmarks.
Proposed Method
The overall architecture of our proposed PIFA method is shown in Fig. 2. We first learn a 3D Point Distribution Model (3DPDM) from a set of labeled 3D scans, where a set of 2D landmarks on an image can be considered as a projection of a 3DPDM instance (i.e., 3D landmarks). For each 2D training face image, we assume that there exists the manual labeled 2D landmarks and their visibilities, as well as the corresponding 3D ground truth 3D landmarks and the camera projection matrix. Given the training images and 2D/3D ground truth, we train a cascaded coupled-regressor that is composed of two regressors at each cascade layer, for the estimation of the update of the 3DPDM coefficient and the projection matrix respectively. Finally, the visibilities of the projected 3D landmarks are automatically computed via the domain knowledge of the 3D surface normals, and incorporated into the regressor learning procedure.

Figure 2: Overall architecture of our proposed PIFA method, with three main modules (3D modeling, cascaded coupled-regressor learning, and 3D surface-enabled visibility estimation). Green/red arrows indicate surface normals pointing toward/away from the camera.
Qualitative results
As shown in Fig. 3, despite the large pose range of -90< yaw<+90, our algorithm does a good job of aligning the landmarks, and correctly predict the landmark visibilities. These results are especially impressive if you consider the same mean shape (2D landmarks) is used as the initialization of all testing images, which has very large deformations with respect to their final landmark estimation.

Figure 3: Testing result of AFLW database. As shown in the top row, we initialize face alignment by placing a 2D mean shape in the given bounding box of each image. Note the disparity between the initial landmarks and the final estimated ones, as well as the diversity in pose, illumination and resolution among the images. Green/red points indicate visible/invisible estimated landmarks.