Repository logo

3D Face Reconstruction from a Front Image by Pose Extension in Latent Space

dc.contributor.authorZhang, Zhao
dc.contributor.supervisorLee, Wonsook
dc.date.accessioned2023-09-27T17:53:48Z
dc.date.available2023-09-27T17:53:48Z
dc.date.issued2023-09-27en_US
dc.description.abstractNumerous techniques for 3D face reconstruction from a single image exist, making use of large facial databases. However, they commonly encounter quality issues due to the absence of information from alternate perspectives. For example, 3D reconstruction with a single front view input data has limited realism, particularly for profile views. We have observed that multiple-view 3D face reconstruction yields higher-quality models compared to single-view reconstruction. Based on this observation, we propose a novel pipeline that combines several deep-learning methods to enhance the quality of reconstruction from a single frontal view. Our method requires only a single image (front view) as input, yet it generates multiple realistic facial viewpoints using various deep-learning networks. These viewpoints are utilized to create a 3D facial model, significantly enhancing the 3D face quality. Traditional image-space editing has limitations in manipulating content and styles while preserving high quality. However, editing in the latent space, which is the space after encoding or before decoding in a neural network, offers greater capabilities for manipulating a given photo. Motivated by the ability of neural networks to generate 2D images from an extensive database and recognizing that multi-view 3D face reconstruction outperforms single-view approaches, we propose a new pipeline. This pipeline involves latent space manipulation by first finding a latent vector corresponding to a given image using the Generative Adversarial Network (GAN) inversion method. We then search for nearby latent vectors to synthesize multiple pose images from the provided input image, aiming to enhance 3D face reconstruction. The generated images are then fed into Diffusion models, another image synthesis network, to generate their respective profile views. The Diffusion model is known for producing more realistic large-angle variations of a given object than GAN models do. Subsequently, all these images (multi-view images) are fed into an Autoencoder, a neural network designed for 3D face model predictions, to derive the 3D structure of the face. Finally, the texture of the 3D face model is combined to enhance its realism, and certain areas of the 3D shape are refined to correct any unrealistic aspects. Our experimental results validate the effectiveness and efficiency of our method in reconstructing highly accurate 3D models of human faces from a single input (front view input) image. The reconstructed models retain high visual fidelity to the original image, even without the need for a 3D database.en_US
dc.identifier.urihttp://hdl.handle.net/10393/45481
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-29687
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectGANen_US
dc.subjectDiffusion modelen_US
dc.subject3D Faceen_US
dc.subjectSingle inputen_US
dc.subjectMulti viewen_US
dc.title3D Face Reconstruction from a Front Image by Pose Extension in Latent Spaceen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMAScen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Zhang_Zhao_2023_thesis.pdf
Size:
38.47 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: