Extending AdaBoost:Varying the Base Learners and Modifying the Weight Calculation

Neves de Souza, Erico

Extending AdaBoost:Varying the Base Learners and Modifying the Weight Calculation

dc.contributor.author	Neves de Souza, Erico
dc.contributor.supervisor	Matwin, Stan
dc.date.accessioned	2014-05-27T13:35:31Z
dc.date.available	2014-05-27T13:35:31Z
dc.date.created	2014
dc.date.issued	2014
dc.degree.discipline	Génie / Engineering
dc.degree.level	doctorate
dc.degree.name	PhD
dc.description.abstract	AdaBoost has been considered one of the best classifiers ever developed, but two important problems have not yet been addressed. The first is the dependency on the ``weak" learner, and the second is the failure to maintain the performance of learners with small error rates (i.e. ``strong" learners). To solve the first problem, this work proposes using a different learner in each iteration - known as AdaBoost Dynamic (AD) - thereby ensuring that the performance of the algorithm is almost equal to that of the best ``weak" learner executed with AdaBoost.M1. The work then further modifies the procedure to vary the learner in each iteration, in order to locate the learner with the smallest error rate in its training data. This is done using the same weight calculation as in the original AdaBoost; this version is known as AdaBoost Dynamic with Exponential Loss (AB-EL). The results were poor, because AdaBoost does not perform well with strong learners, so, in this sense, the work confirmed previous works' results. To determine how to improve the performance, the weight calculation is modified to use the sigmoid function with algorithm output being the derivative of the same sigmoid function, rather than the logistic regression weight calculation originally used by AdaBoost; this version is known as AdaBoost Dynamic with Logistic Loss (AB-DL). This work presents the convergence proof that binomial weight calculation works, and that this approach improves the results for the strong learner, both theoretically and empirically. AB-DL also has some disadvantages, like the search for the ``best" classifier and that this search reduces the diversity among the classifiers. In order to attack these issues, another algorithm is proposed that combines AD ``weak" leaner execution policy with a small modification of AB-DL's weight calculation, called AdaBoost Dynamic with Added Cost (AD-AC). AD-AC also has a theoretical upper bound error, and the algorithm offers a small accuracy improvement when compared with AB-DL, and traditional AdaBoost approaches. Lastly, this work also adapts AD-AC's weight calculation approach to deal with data stream problem, where classifiers must deal with very large data sets (in the order of millions of instances), and limited memory availability.
dc.embargo.terms	immediate
dc.faculty.department	Science informatique et génie électrique / Electrical Engineering and Computer Science
dc.identifier.uri	http://hdl.handle.net/10393/31146
dc.identifier.uri	http://dx.doi.org/10.20381/ruor-3756
dc.language.iso	en
dc.publisher	Université d'Ottawa / University of Ottawa
dc.subject	AdaBoost
dc.subject	Machine Learning
dc.subject	Data Stream
dc.title	Extending AdaBoost:Varying the Base Learners and Modifying the Weight Calculation
dc.type	Thesis
thesis.degree.discipline	Génie / Engineering
thesis.degree.level	Doctoral
thesis.degree.name	PhD
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Neves_De_Souza_Erico_2014_thesis.pdf
Taille:: 1.3 MB
Format:: Adobe Portable Document Format

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 4.21 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -