Repository logo

Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease

dc.contributor.authorDuan, Haoyang
dc.contributor.supervisorPestov, Vladimir
dc.contributor.supervisorWells, George
dc.date.accessioned2014-05-15T14:22:47Z
dc.date.available2014-05-15T14:22:47Z
dc.date.created2014
dc.date.issued2014
dc.degree.disciplineSciences / Science
dc.degree.levelmasters
dc.degree.nameMSc
dc.description.abstractFrom a fresh data science perspective, this thesis discusses the prediction of coronary artery disease based on Single-Nucleotide Polymorphisms (SNPs) from the Ontario Heart Genomics Study (OHGS). First, the thesis explains the k-Nearest Neighbour (k-NN) and Random Forest learning algorithms, and includes a complete proof that k-NN is universally consistent in finite dimensional normed vector spaces. Second, the thesis introduces two dimensionality reduction techniques: Random Projections and a new method termed Mass Transportation Distance (MTD) Feature Selection. Then, this thesis compares the performance of Random Projections with k-NN against MTD Feature Selection and Random Forest for predicting artery disease. Results demonstrate that MTD Feature Selection with Random Forest is superior to Random Projections and k-NN. Random Forest is able to obtain an accuracy of 0.6660 and an area under the ROC curve of 0.8562 on the OHGS dataset, when 3335 SNPs are selected by MTD Feature Selection for classification. This area is considerably better than the previous high score of 0.608 obtained by Davies et al. in 2010 on the same dataset.
dc.embargo.termsimmediate
dc.faculty.departmentMathématiques et statistique / Mathematics and Statistics
dc.identifier.urihttp://hdl.handle.net/10393/31113
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-3739
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectSNPs
dc.subjectGWAS
dc.subjectData Science
dc.subjectMass Transportation Distance
dc.subjectDimensionality Reduction
dc.subjectRandom Projections
dc.subjectSupervised Learning Theory
dc.subjectCoronary Artery Disease
dc.subjectK-Nearest Neighbour Classifier
dc.subjectUniversal Consistency
dc.titleApplying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease
dc.typeThesis
thesis.degree.disciplineSciences / Science
thesis.degree.levelMasters
thesis.degree.nameMSc
uottawa.departmentMathématiques et statistique / Mathematics and Statistics

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Duan_Haoyang_2014_thesis.pdf
Size:
741.74 KB
Format:
Adobe Portable Document Format
Description:
Master's thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
4.21 KB
Format:
Item-specific license agreed upon to submission
Description: