DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment
CVPR 2023

Abstract

Sensitivity to severe occlusion and large view angles limits the usage scenarios of the existing monocular 3D dense face alignment methods. The state-of-the-art 3DMM-based method, directly regresses the model's coefficients, underutilizing the low-level 2D spatial and semantic information, which can actually offer cues for face shape and orientation. In this work, we demonstrate how modeling 3D facial geometry in image and model space jointly can solve the occlusion and view angle problems. Instead of predicting the whole face directly, we regress image space features in the visible facial region by dense prediction first. Subsequently, we predict our model's coefficients based on the regressed feature of the visible regions, leveraging the prior knowledge of whole face geometry from the morphable models to complete the invisible regions. We further propose a fusion network that combines the advantages of both the image and model space predictions to achieve high robustness and accuracy in unconstrained scenarios. Thanks to the proposed fusion module, our method is robust not only to occlusion and large pitch and roll view angles, which is the benefit of our image space approach, but also to noise and large yaw angles, which is the benefit of our model space method. Comprehensive evaluations demonstrate the superior performance of our method compared with the state-of-the-art methods. On the 3D dense face alignment task, we achieve 3.80% NME on the AFLW2000-3D dataset, which outperforms the state-of-the-art method by 5.5%.

Video


Framework

We process the input image in two spaces, the model space and the image space, and subsequently integrate them within a UV space via a fusion module. In the image space branch, we first predict the visible facial region in form of four 2D maps as a inter representation. Subsequently, we use a PointNet-based post-process algorithm to complete the whole face's geometry from the visible part.

Post-process Algorithm

We first convert the depth map in the visible face region into a point cloud and align it from the image view to the canonical view using the correspondence map. We use a lightweight PointNet module to process the point cloud and complete the shape of the whole face by predicting the 3DMM shape coefficient. Finally, we align the predicted shape from the canonical view to the image view as a prediction of the image space.

Qualitative Results

Citation


The website template was borrowed from Michaël Gharbi and Ref-NeRF.