Towards Generalizable Few-Shot Object Detection via Enhanced Representation Learning

Zhang, Yan

Towards Generalizable Few-Shot Object Detection via Enhanced Representation Learning

Fichiers

Zhang_Yan_2026_thesis.pdf (42.67 MB)

Date

2026-01-06

Authors

Zhang, Yan

Éditeur

Université d'Ottawa | University of Ottawa

Licence Creative Commons

Attribution-ShareAlike 4.0 International

Résumé

Few-shot object detection (FSOD), which aims to detect novel categories with minimal training examples, faces significant challenges in learning robust feature representations due to severe data scarcity. Additionally, FSOD models often struggle to distinguish objects from visually ambiguous backgrounds, restricting their generalization capability. We propose a novel FSOD framework designed to address these challenges through two key innovations. First, we introduce Wavelet‑Semantic Fusion Attention (WSFA), which enhances semantic ViT features by incorporating frequency-domain information via discrete wavelet transform, providing complementary edge and texture cues through cross-modal attention. Second, we propose the Learnable Background Prototype (LBP) that explicitly models the background patterns, significantly improving foreground-background discrimination. These contributions are then integrated into a unified single-stage transformer-based detection framework with inter-class contrastive learning. Comprehensive experiments on standard FSOD benchmarks (PASCAL VOC and MS COCO) demonstrate that our method achieves stable improvements over strong baseline methods and outperforms existing state-of-the-art approaches. This work provides a practical solution for scenarios with limited annotated data, enhancing the applicability of object detection in real-world applications.

Mots-clés

Computer Vision, Object Detection, Few-Shot Learning

URI

http://hdl.handle.net/10393/51223
https://doi.org/10.20381/ruor-31646

Collections

- Thèses, 2011 - // Theses, 2011 -

Notice complète

Towards Generalizable Few-Shot Object Detection via Enhanced Representation Learning

Fichiers

Date

Authors

Nom de la revue

ISSN de la revue

Titre du volume

Éditeur

Licence Creative Commons

Résumé

Description

Mots-clés

Citation

URI

Collections

Approbation

Évaluation

Complété par

Référencé par