Semi-Supervised Hybrid Windowing Ensembles for Learning from Evolving Streams

Floyd, Sean Louis Alan

Semi-Supervised Hybrid Windowing Ensembles for Learning from Evolving Streams

dc.contributor.author	Floyd, Sean Louis Alan
dc.contributor.supervisor	Viktor, Herna
dc.date.accessioned	2019-06-03T19:22:48Z
dc.date.available	2019-06-03T19:22:48Z
dc.date.issued	2019-06-03	en_US
dc.description.abstract	In this thesis, learning refers to the intelligent computational extraction of knowledge from data. Supervised learning tasks require data to be annotated with labels, whereas for unsupervised learning, data is not labelled. Semi-supervised learning deals with data sets that are partially labelled. A major issue with supervised and semi-supervised learning of data streams is late-arriving or missing class labels. Assuming that correctly labelled data will always be available and timely is often unfeasible, and, as such, supervised methods are not directly applicable in the real world. Therefore, real-world problems usually require the use of semi-supervised or unsupervised learning techniques. For instance, when considering a spam detection task, it is not reasonable to assume that all spam will be identified (correctly labelled) prior to learning. Additionally, in semi-supervised learning, "the instances having the highest [predictive] confidence are not necessarily the most useful ones" [41]. We investigate how self-training performs without its selective heuristic in a streaming setting. This leads us to our contributions. We extend an existing concept drift detector to operate without any labelled data, by using a sliding window of our ensemble's prediction confidence, instead of a boolean indicating whether the ensemble's predictions are correct. We also extend selective self-training, a semi-supervised learning method, by using all predictions, and not only those with high predictive confidence. Finally, we introduce a novel windowing type for ensembles, as sliding windows are very time consuming and regular tumbling windows are not a suitable replacement. Our windowing technique can be considered a hybrid of the two: we train each sub-classifier in the ensemble with tumbling windows, but delay training in such a way that only one sub-classifier can update its model per iteration. We found, through statistical significance tests, that our framework is (roughly 160 times) faster than current state of the art techniques, and achieves comparable predictive accuracy. That being said, more research is needed to further reduce the quantity of labelled data used for training, while also increasing its predictive accuracy.	en_US
dc.identifier.uri	http://hdl.handle.net/10393/39273
dc.identifier.uri	http://dx.doi.org/10.20381/ruor-23520
dc.language.iso	en	en_US
dc.publisher	Université d'Ottawa / University of Ottawa	en_US
dc.subject	Ensemble learning	en_US
dc.subject	Data streams	en_US
dc.subject	Concept drift	en_US
dc.subject	Online learning	en_US
dc.subject	Non-stationary environments	en_US
dc.title	Semi-Supervised Hybrid Windowing Ensembles for Learning from Evolving Streams	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Génie / Engineering	en_US
thesis.degree.level	Masters	en_US
thesis.degree.name	MSc	en_US
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science	en_US

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Floyd_Sean_Louis_Alan_2019_thesis.pdf
Taille:: 7.91 MB
Format:: Adobe Portable Document Format
Description:

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -