Repository logo

k-Nearest Neighbour Classification of Datasets with a Family of Distances

dc.contributor.authorHatko, Stan
dc.contributor.supervisorPestov, Vladimir
dc.date.accessioned2015-11-25T19:42:10Z
dc.date.available2015-11-25T19:42:10Z
dc.date.created2015
dc.date.issued2015
dc.degree.disciplineSciences / Science
dc.degree.levelmasters
dc.degree.nameMSc
dc.description.abstractThe k-nearest neighbour (k-NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the k-NN classifier. In this thesis we investigate the use of alternative distances for the k-NN classifier. We start by introducing some background notions in statistical machine learning. We define the k-NN classifier and discuss Stone's theorem and the proof that k-NN is universally consistent on the normed space R^d. We then prove that k-NN is universally consistent if we take a sequence of random norms (that are independent of the sample and the query) from a family of norms that satisfies a particular boundedness condition. We extend this result by replacing norms with distances based on uniformly locally Lipschitz functions that satisfy certain conditions. We discuss the limitations of Stone's lemma and Stone's theorem, particularly with respect to quasinorms and adaptively choosing a distance for k-NN based on the labelled sample. We show the universal consistency of a two stage k-NN type classifier where we select the distance adaptively based on a split labelled sample and the query. We conclude by giving some examples of improvements of the accuracy of classifying various datasets using the above techniques.
dc.faculty.departmentMathématiques et statistique / Mathematics and Statistics
dc.identifier.urihttp://hdl.handle.net/10393/33361
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-3989
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectMachine Learning
dc.subjectk-Nearest Neighbour Classifier
dc.subjectUniversal Consistency
dc.subjectData Science
dc.titlek-Nearest Neighbour Classification of Datasets with a Family of Distances
dc.typeThesis
thesis.degree.disciplineSciences / Science
thesis.degree.levelMasters
thesis.degree.nameMSc
uottawa.departmentMathématiques et statistique / Mathematics and Statistics

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Hatko_Stan_2015_thesis.pdf
Size:
1.49 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
4.07 KB
Format:
Item-specific license agreed upon to submission
Description: