We present an algorithm for "unconstrained" 3D face reconstruction from a 2D photo collection of face images of a subject captured under a diverse variation of poses, expressions, and illuminations, without meta data about the cameras, timing, or light conditions. The output of our algorithm is a true 3D face surface model represented as a watertight triangulated surface with albedo data or texture information. This is certainly a very challending problem, as we do not have access to stereo imaging or video. Motivated by the success of the state-of-the-art method [2], we developed a novel photometric stereo-based method with two distinct novelties. First, working with a true 3D model allows us to enjoy the benefit of using images from all possible poses, including profile, without warping them to a frontal image. Second, by leveraging emerging face alignment techniques and our novel field-based Laplace editing, a combination of landmark constraints and photometric stereo-based normals drives our surface reconstruction.

Figure 1: Given a photo collection of images with unknown pose, illumination, and expression, our goal is to reconstruct a detailed 3D model of the face.
Algorithm Summary
Motivated by the state-of-the-art results and ammendable limitation of [2], this project proposes a novel approach to 3D face reconstruction. Our algorithm is broken down into three major steps which we briefly describe. For full details, please refer to the bibliography on this topic. The first step is enabled by the recent explosion of face alignment techniques, where 2D landmark estimation has been substantially improved. Specifically, given a collection of unconstrained face images, we perform 2D landmark estimation and enhance a 3D template by deforming a generic 3D face template such that the projection of its 3D landmarks are consistent with the estimated 2D landmarks while maintaining the original surface normals. The second step is to estimate the person-specific face normals via photometric stereo. We take 2D face images at all poses and project them onto the enhanced 3D face template to establish a dense correspondence across the images. We then jointly estimate the lighting and surface normals via SVD. The third step is to deform the 3D shape so that its updated surface normals will be similar to the estimated ones, under the landmark constrain and an additional boundary constraint. The process iterates until convergence.