Repository logo

Clustering to Improve One-Class Classifier Performance in Data Streams

dc.contributor.authorMoulton, Richard Hugh
dc.contributor.supervisorJapkowicz, Nathalie
dc.contributor.supervisorViktor, Herna
dc.date.accessioned2018-08-27T13:33:53Z
dc.date.available2018-08-27T13:33:53Z
dc.date.issued2018-08-27en_US
dc.description.abstractThe classification task requires learning a decision boundary between classes by making use of training examples from each. A potential challenge for this task is the class imbalance problem, which occurs when there are many training instances available for a single class, the majority class, and few training instances for the other, the minority class [58]. In this case, it is no longer clear how to separate the majority class from something for which we have little to no knowledge. More worrying, often the minority class is the class of interest, e.g. for detecting abnormal conditions from streaming sensor data. The one-class classification (OCC) paradigm addresses this scenario by casting the task as learning a decision boundary around the majority class with no need for minority class instances [110]. OCC has been thoroughly investigated, e.g. [20, 60, 90, 110], and many one-class classifiers have been proposed. One approach for improving one-class classifier performance on static data sets is learning in the context of concepts: the majority class is broken down into its constituent sub-concepts and a classifier is induced over each [100]. Modern machine learning research, however, is concerned with data streams: where potentially infinite amounts of data arrive quickly and need to be processed as they arrive. In these cases it is not possible to store all of the instances in memory, nor is it practical to wait until “the end of the data stream” before learning. An example is network intrusion detection: detecting an attack on the computer network should occur as soon as practicable. Many one-class classifiers for data streams have been described in the literature, e.g. [33, 108], and it is worth investigating whether the approach of learning in the context of concepts can be successfully applied to the OCC task for data streams as well. This thesis identifies that the idea of breaking the majority class into subconcepts to simplify the OCC problem has been demonstrated for static data sets, [100], but has not been applied in data streams. The primary contribution to the literature made by this thesis is the identification of how the majority class’s sub-concept structure can be used to improve the classification performance of streaming one-class classifiers while mitigating the challenges posed by the data stream environment. Three frameworks are developed, each using this knowledge to a different degree. These are applied with a selection of streaming one-class classifiers to both synthetic and benchmark data streams with performance compared to that of the one-class classifier learning independently. These results are analyzed and it is shown that scenarios exist where knowledge of sub-concepts can be used to improve one-class classifier performance.en_US
dc.identifier.urihttp://hdl.handle.net/10393/38030
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-22285
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.subjectmachine learningen_US
dc.subjectone-class classificationen_US
dc.subjectdata streamsen_US
dc.subjectsub-conceptsen_US
dc.titleClustering to Improve One-Class Classifier Performance in Data Streamsen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMCSen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Moulton_Richard_Hugh_2018_thesis.pdf
Size:
5.38 MB
Format:
Adobe Portable Document Format
Description:
A thesis submitted to the University of Ottawa in partial fulfillment of the requirements for the Master of Computer Science.

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: