Border sampling techniques in Machine Learning

Title: Border sampling techniques in Machine Learning
Authors: Li, Guichong
Date: 2010
Abstract: Border identification (BI), which is regarded as a sample selection technique in Machine Learning, was previously proposed to help learning systems focus on the most relevant portion of the training set so as to improve learning accuracy. However, the traditional BI implementation suffers from a serious limitation: it is only able to identify partial borders. We first propose a new method called Border Identification in Two Stages, denoted as BI 2, to overcome this limitation by identifying a full border. Based on BI2, we develop a new sample selection method, called Border Sampling, for supervised learning tasks. This is achieved by adopting the Progressive Learning technique for augmenting borders, and by incorporating BI2 with the Markov Chain Monte Carlo technique for scaling up Border Sampling on large datasets, and by assuming a novel geometric computation for improving the algorithmic convergence. Further, we propose a novel Meta learning technique, called Cascading Customized Couple, for scaling up Bayes classifiers such as Naive Bayes by assuming a domain separation strategy, which is regarded as a wrapped sample selection method while Border Sampling is shown as a filter sample selection method in the thesis. Empirical results show that, first, Border Sampling (BS) is an efficient and effective sample selection technique for training common classifiers such as Naive Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM), and Instance-based Learning (IBL), as compared with previously proposed sample selection techniques such as Condensed Nearest Neighbour rule and Edited Nearest Neighbour rule; second, the new Meta learning technique Cascading Customized Couple (CCC) outperforms previously proposed Meta learning techniques such as Bagging, AdaBoost, and MultiBoostAB for boosting Naive Bayes. We also apply our new techniques to a scientific application for explosion detection by building an optimal classification model.
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010
NR66251.PDF7.9 MBAdobe PDFOpen