Repository logo

Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging

dc.contributor.authorOlorunnimbe, Muhammed
dc.contributor.supervisorViktor, Herna
dc.date.accessioned2015-05-13T13:19:02Z
dc.date.available2015-05-13T13:19:02Z
dc.date.created2015
dc.date.issued2015
dc.degree.disciplineGénie / Engineering
dc.degree.levelmasters
dc.degree.nameMSc
dc.description.abstractIn this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal.
dc.identifier.urihttp://hdl.handle.net/10393/32340
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-4304
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectData stream
dc.subjectConcept drift
dc.subjectMetalearning
dc.subjectCost sensitive adaptation
dc.subjectROI
dc.subjectUtility
dc.subjectAdaptive ensemble size
dc.subjectOnline bagging
dc.titleIntelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
dc.typeThesis
thesis.degree.disciplineGénie / Engineering
thesis.degree.levelMasters
thesis.degree.nameMSc

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Olorunnimbe_Muhammed_2015_thesis.pdf
Size:
2.04 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
4.07 KB
Format:
Item-specific license agreed upon to submission
Description: