From No Labels to Informed Decisions: Improving Active and Semi-Supervised Learning Strategies for Financial Crime Detection
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université d'Ottawa / University of Ottawa
Abstract
Detecting fraud and financial crimes is a critical yet challenging task due to extreme class imbalance, high cost of labeling, and often a complete lack of labeled data. Traditional supervised learning approaches often fail in such settings, especially when labeled fraud cases are scarce or unavailable. This thesis addresses these limitations by proposing an adaptive pipeline that combines unsupervised learning, Semi-Supervised Learning (SSL), and Active Learning (AL) techniques for effective decision-making in scenarios ranging from no labels to limited labeled data, progressively acquired under a budget constraint.
We begin with a comparative analysis of AL and SSL under different imbalance ratios and resampling strategies. While AL outperforms SSL in certain cases, it is often more costly and less effective in highly imbalanced settings. To overcome this, we propose X- ITERADE (EXplainable ITERative Anomaly Detection Ensemble), an explainable, fully unsupervised anomaly detection ensemble that uses dynamic clustering to identify and rank suspicious cases. X-ITERADE produces a subset of transactions with a significantly higher fraud rate and includes an explainability module that uses Large Language Models (LLM)s to assist domain experts in understanding anomalous behavior. Building on X-ITERADE's output, we introduce ALISA (Adaptive Learning through Iterative Semi-supervised Active learning), a novel framework that adaptively transitions between AL and SSL based on model signals and dataset characteristics. Using as a starting point the suspicious cases identified by X-ITERADE, ALISA combines weighted AL and enhanced self-training with data augmentation through an iterative process. ALISA enables starting from minimal or even one-class labeled sets, addressing the cold-start problem in early-stage fraud detection.
Together, X-ITERADE and ALISA form a robust, adaptive pipeline that bridges the gap from unlabeled data to strategic querying and enhanced fraud detection, enabling informed, cost-efficient decisions throughout the learning process.
Description
Keywords
Fraud Detection, Anomaly Detection, Active Learning, Semi-supervised Learning, Explainable AI, Imbalanced Data
