Repository logo

From No Labels to Informed Decisions: Improving Active and Semi-Supervised Learning Strategies for Financial Crime Detection

Loading...
Thumbnail ImageThumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa / University of Ottawa

Abstract

Detecting fraud and financial crimes is a critical yet challenging task due to extreme class imbalance, high cost of labeling, and often a complete lack of labeled data. Traditional supervised learning approaches often fail in such settings, especially when labeled fraud cases are scarce or unavailable. This thesis addresses these limitations by proposing an adaptive pipeline that combines unsupervised learning, Semi-Supervised Learning (SSL), and Active Learning (AL) techniques for effective decision-making in scenarios ranging from no labels to limited labeled data, progressively acquired under a budget constraint. We begin with a comparative analysis of AL and SSL under different imbalance ratios and resampling strategies. While AL outperforms SSL in certain cases, it is often more costly and less effective in highly imbalanced settings. To overcome this, we propose X- ITERADE (EXplainable ITERative Anomaly Detection Ensemble), an explainable, fully unsupervised anomaly detection ensemble that uses dynamic clustering to identify and rank suspicious cases. X-ITERADE produces a subset of transactions with a significantly higher fraud rate and includes an explainability module that uses Large Language Models (LLM)s to assist domain experts in understanding anomalous behavior. Building on X-ITERADE's output, we introduce ALISA (Adaptive Learning through Iterative Semi-supervised Active learning), a novel framework that adaptively transitions between AL and SSL based on model signals and dataset characteristics. Using as a starting point the suspicious cases identified by X-ITERADE, ALISA combines weighted AL and enhanced self-training with data augmentation through an iterative process. ALISA enables starting from minimal or even one-class labeled sets, addressing the cold-start problem in early-stage fraud detection. Together, X-ITERADE and ALISA form a robust, adaptive pipeline that bridges the gap from unlabeled data to strategic querying and enhanced fraud detection, enabling informed, cost-efficient decisions throughout the learning process.

Description

Keywords

Fraud Detection, Anomaly Detection, Active Learning, Semi-supervised Learning, Explainable AI, Imbalanced Data

Citation

Related Materials

Alternate Version