Towards Generalizable Few-Shot Object Detection via Enhanced Representation Learning

En cours de chargement...
Vignette d'image

Nom de la revue

ISSN de la revue

Titre du volume

Éditeur

Université d'Ottawa | University of Ottawa

Licence Creative Commons

Attribution-ShareAlike 4.0 International

Résumé

Few-shot object detection (FSOD), which aims to detect novel categories with minimal training examples, faces significant challenges in learning robust feature representations due to severe data scarcity. Additionally, FSOD models often struggle to distinguish objects from visually ambiguous backgrounds, restricting their generalization capability. We propose a novel FSOD framework designed to address these challenges through two key innovations. First, we introduce Wavelet‑Semantic Fusion Attention (WSFA), which enhances semantic ViT features by incorporating frequency-domain information via discrete wavelet transform, providing complementary edge and texture cues through cross-modal attention. Second, we propose the Learnable Background Prototype (LBP) that explicitly models the background patterns, significantly improving foreground-background discrimination. These contributions are then integrated into a unified single-stage transformer-based detection framework with inter-class contrastive learning. Comprehensive experiments on standard FSOD benchmarks (PASCAL VOC and MS COCO) demonstrate that our method achieves stable improvements over strong baseline methods and outperforms existing state-of-the-art approaches. This work provides a practical solution for scenarios with limited annotated data, enhancing the applicability of object detection in real-world applications.

Description

Mots-clés

Computer Vision, Object Detection, Few-Shot Learning

Citation

Approbation

Évaluation

Complété par

Référencé par