Detecting Novel Concepts in Data Streams: To Infinity and Beyond
| dc.contributor.author | Gaudreault, Jean-Gabriel | |
| dc.contributor.supervisor | Branco, Paula | |
| dc.date.accessioned | 2026-02-13T23:31:18Z | |
| dc.date.available | 2026-02-13T23:31:18Z | |
| dc.date.issued | 2026-02-13 | |
| dc.description.abstract | As new network attacks develop, new illnesses spread, or new topics surface on social media, many machine learning models become obsolete as they fail to recognize these novel concepts. This requires the development of artificial intelligence models capable of detecting, learning, and adapting to new concepts autonomously. In this context, novelty detection emerges as a critical method for identifying new concepts in data streams while ensuring the accurate classification of known ones. Despite growing research interest, progress in the field is hindered by challenges such as the lack of a comprehensive evaluation framework and the absence of robust algorithms that can adapt to changing data distributions with minimal human intervention. This thesis addresses these fundamental issues through four primary contributions. First, we present a systematic literature review that establishes an updated taxonomy of the field, analyzing existing works to identify key challenges and research directions. Second, building on this review, we address inconsistencies in current evaluation methods by introducing a comprehensive framework for assessing novelty detection algorithms in multi-class data streams. We empirically demonstrate that key data stream characteristics, which are often overlooked, substantially impact algorithm performance and hinder fair comparisons across studies. Our framework formalizes these characteristics and introduces novel temporal metrics, enabling robust model comparison through a single, evolving performance score. Third, to address the limitations of existing algorithms, we introduce CASCADE (Clustering-based Adaptive Stream Classification And Detection of Emerging classes). CASCADE integrates binary classifiers and threshold-based filters to mitigate false positives, while leveraging hierarchical clustering on custom meta-features to function effectively without dependence on ground-truth labels or extensive hyperparameter tuning. It achieves state-of-the-art performance, frequently matching or surpassing supervised approaches on diverse real-world and synthetic datasets. Finally, we demonstrate the effectiveness and advantages of novelty detection techniques in the domain of cybersecurity, using numerous benchmark datasets. Altogether, this thesis provides a comprehensive overview of the current state of the field, establishes a rigorous methodology for performance evaluation, and introduces a high-performing algorithm that addresses critical limitations of existing techniques and matches or even surpasses supervised methods while operating without ground-truth labels or extensive hyperparameter tuning. | |
| dc.identifier.uri | http://hdl.handle.net/10393/51377 | |
| dc.identifier.uri | https://doi.org/10.20381/ruor-31748 | |
| dc.language.iso | en | |
| dc.publisher | Université d'Ottawa / University of Ottawa | |
| dc.subject | Novelty Detection | |
| dc.subject | Machine Learning | |
| dc.subject | Cybersecurity | |
| dc.subject | Data Mining | |
| dc.subject | Data Streams | |
| dc.title | Detecting Novel Concepts in Data Streams: To Infinity and Beyond | |
| dc.type | Thesis | en |
| thesis.degree.discipline | Génie / Engineering | |
| thesis.degree.level | Doctoral | |
| thesis.degree.name | PhD | |
| uottawa.department | Science informatique et génie électrique / Electrical Engineering and Computer Science |
