Repository logo

An Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce

dc.contributor.authorLiu, Xuan
dc.contributor.supervisorJapkowicz, Nathalie
dc.contributor.supervisorMatwin, Stan
dc.date.accessioned2014-03-25T14:05:07Z
dc.date.available2014-03-25T14:05:07Z
dc.date.created2014
dc.date.issued2014
dc.degree.disciplineGénie / Engineering
dc.degree.levelmasters
dc.degree.nameMASc
dc.description.abstractWe propose a new ensemble algorithm: the meta-boosting algorithm. This algorithm enables the original Adaboost algorithm to improve the decisions made by different WeakLearners utilizing the meta-learning approach. Better accuracy results are achieved since this algorithm reduces both bias and variance. However, higher accuracy also brings higher computational complexity, especially on big data. We then propose the parallelized meta-boosting algorithm: Parallelized-Meta-Learning (PML) using the MapReduce programming paradigm on Hadoop. The experimental results on the Amazon EC2 cloud computing infrastructure show that PML reduces the computation complexity enormously while retaining lower error rates than the results on a single computer. As we know MapReduce has its inherent weakness that it cannot directly support iterations in an algorithm, our approach is a win-win method, since it not only overcomes this weakness, but also secures good accuracy performance. The comparison between this approach and a contemporary algorithm AdaBoost.PL is also performed.
dc.embargo.termsimmediate
dc.faculty.departmentScience informatique et génie électrique / Electrical Engineering and Computer Science
dc.identifier.urihttp://hdl.handle.net/10393/30702
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-3596
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectAdaboost
dc.subjectMeta-learning
dc.subjectBig Data
dc.subjectHadoop
dc.subjectMapReduce
dc.subjectEnsemble Learning
dc.subjectScalable Machine Learning Algorithm
dc.titleAn Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce
dc.typeThesis
thesis.degree.disciplineGénie / Engineering
thesis.degree.levelMasters
thesis.degree.nameMASc
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Liu_Xuan_2014_thesis.pdf
Size:
1.32 MB
Format:
Adobe Portable Document Format
Description:
Main article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
4.21 KB
Format:
Item-specific license agreed upon to submission
Description: