Repository logo

Detecting Novel Concepts in Data Streams: To Infinity and Beyond

dc.contributor.authorGaudreault, Jean-Gabriel
dc.contributor.supervisorBranco, Paula
dc.date.accessioned2026-02-13T23:31:18Z
dc.date.available2026-02-13T23:31:18Z
dc.date.issued2026-02-13
dc.description.abstractAs new network attacks develop, new illnesses spread, or new topics surface on social media, many machine learning models become obsolete as they fail to recognize these novel concepts. This requires the development of artificial intelligence models capable of detecting, learning, and adapting to new concepts autonomously. In this context, novelty detection emerges as a critical method for identifying new concepts in data streams while ensuring the accurate classification of known ones. Despite growing research interest, progress in the field is hindered by challenges such as the lack of a comprehensive evaluation framework and the absence of robust algorithms that can adapt to changing data distributions with minimal human intervention. This thesis addresses these fundamental issues through four primary contributions. First, we present a systematic literature review that establishes an updated taxonomy of the field, analyzing existing works to identify key challenges and research directions. Second, building on this review, we address inconsistencies in current evaluation methods by introducing a comprehensive framework for assessing novelty detection algorithms in multi-class data streams. We empirically demonstrate that key data stream characteristics, which are often overlooked, substantially impact algorithm performance and hinder fair comparisons across studies. Our framework formalizes these characteristics and introduces novel temporal metrics, enabling robust model comparison through a single, evolving performance score. Third, to address the limitations of existing algorithms, we introduce CASCADE (Clustering-based Adaptive Stream Classification And Detection of Emerging classes). CASCADE integrates binary classifiers and threshold-based filters to mitigate false positives, while leveraging hierarchical clustering on custom meta-features to function effectively without dependence on ground-truth labels or extensive hyperparameter tuning. It achieves state-of-the-art performance, frequently matching or surpassing supervised approaches on diverse real-world and synthetic datasets. Finally, we demonstrate the effectiveness and advantages of novelty detection techniques in the domain of cybersecurity, using numerous benchmark datasets. Altogether, this thesis provides a comprehensive overview of the current state of the field, establishes a rigorous methodology for performance evaluation, and introduces a high-performing algorithm that addresses critical limitations of existing techniques and matches or even surpasses supervised methods while operating without ground-truth labels or extensive hyperparameter tuning.
dc.identifier.urihttp://hdl.handle.net/10393/51377
dc.identifier.urihttps://doi.org/10.20381/ruor-31748
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectNovelty Detection
dc.subjectMachine Learning
dc.subjectCybersecurity
dc.subjectData Mining
dc.subjectData Streams
dc.titleDetecting Novel Concepts in Data Streams: To Infinity and Beyond
dc.typeThesisen
thesis.degree.disciplineGénie / Engineering
thesis.degree.levelDoctoral
thesis.degree.namePhD
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Gaudreault_Jean-Gabriel_2026_thesis.pdf
Size:
2.95 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: