Repository logo

Semi-Supervised Hybrid Windowing Ensembles for Learning from Evolving Streams

dc.contributor.authorFloyd, Sean Louis Alan
dc.contributor.supervisorViktor, Herna
dc.date.accessioned2019-06-03T19:22:48Z
dc.date.available2019-06-03T19:22:48Z
dc.date.issued2019-06-03en_US
dc.description.abstractIn this thesis, learning refers to the intelligent computational extraction of knowledge from data. Supervised learning tasks require data to be annotated with labels, whereas for unsupervised learning, data is not labelled. Semi-supervised learning deals with data sets that are partially labelled. A major issue with supervised and semi-supervised learning of data streams is late-arriving or missing class labels. Assuming that correctly labelled data will always be available and timely is often unfeasible, and, as such, supervised methods are not directly applicable in the real world. Therefore, real-world problems usually require the use of semi-supervised or unsupervised learning techniques. For instance, when considering a spam detection task, it is not reasonable to assume that all spam will be identified (correctly labelled) prior to learning. Additionally, in semi-supervised learning, "the instances having the highest [predictive] confidence are not necessarily the most useful ones" [41]. We investigate how self-training performs without its selective heuristic in a streaming setting. This leads us to our contributions. We extend an existing concept drift detector to operate without any labelled data, by using a sliding window of our ensemble's prediction confidence, instead of a boolean indicating whether the ensemble's predictions are correct. We also extend selective self-training, a semi-supervised learning method, by using all predictions, and not only those with high predictive confidence. Finally, we introduce a novel windowing type for ensembles, as sliding windows are very time consuming and regular tumbling windows are not a suitable replacement. Our windowing technique can be considered a hybrid of the two: we train each sub-classifier in the ensemble with tumbling windows, but delay training in such a way that only one sub-classifier can update its model per iteration. We found, through statistical significance tests, that our framework is (roughly 160 times) faster than current state of the art techniques, and achieves comparable predictive accuracy. That being said, more research is needed to further reduce the quantity of labelled data used for training, while also increasing its predictive accuracy.en_US
dc.identifier.urihttp://hdl.handle.net/10393/39273
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-23520
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.subjectEnsemble learningen_US
dc.subjectData streamsen_US
dc.subjectConcept driften_US
dc.subjectOnline learningen_US
dc.subjectNon-stationary environmentsen_US
dc.titleSemi-Supervised Hybrid Windowing Ensembles for Learning from Evolving Streamsen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMScen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Floyd_Sean_Louis_Alan_2019_thesis.pdf
Size:
7.91 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: