Repository logo

Cross-Domain Latent Conditioning for Coherent Generative Synthesis

Loading...
Thumbnail ImageThumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa / University of Ottawa

Abstract

Generative models have revolutionized synthesis research, enabling machines to produce images and text that rival human-created content. However, challenges in controllability, diversity, and coherence limit their application. This thesis investigates generative models with applications to multiple domains: texture synthesis, texture concept blending using vision-language model latent spaces, and language generation for procedural dialogue in computer role-playing games. In texture synthesis, the thesis introduces a novel method for generating diverse, high-quality non-homogeneous textures from a single exemplar. The approach combines a Generative Adversarial Network (GAN) and a Variational Autoencoder within a unified architecture, that uses custom layers to share information during training. This method also presents a similarity loss term that promotes output diversity while enhancing quality. The architecture enables rapid training, producing superior results in less time when compared to state-of-the-art techniques. The challenges of texture synthesis extend into the text domain, where describing complex textures is often difficult. This inspired the hypothesis that vision-language models, such as Contrastive Language-Image Pretraining (CLIP), might encode spatial information that could enable latent exploration, akin to GAN latent space traversal. To explore this, the thesis introduces a novel pipeline for constructing latent manifolds from a texture dataset that uses the CLIP model. This pipeline enables spatially optimized interpolation between user-selected inputs, for the generation of coherent and novel blends of visual concepts that are challenging to describe with text prompts. Finally, using text synthesis for procedural dialogue generation, the thesis presents a structured framework that overcomes the limitations of traditional parser-based systems by integrating Large Language Models (LLMs) with a graph-based state tracking approach for use in computer role playing games. This approach frames interactions as puzzles with goal-based milestones to enforce narrative progression. The method empowers designers to craft abstract rules that enable players to invent solutions rather than adhere to predefined paths. A proof-of-concept game demonstrates how this system realistically enhances player freedom, validated through both qualitative and quantitative analysis. This method represents a move towards the long-standing goal of improving player agency in games.

Description

Keywords

Computer Vision, Machine Learning, Generative Models, Texture Synthesis, Large Language Models, Agentic Systems, State Tracking

Citation

Related Materials

Alternate Version