Repository logo

Imbalanced Data Classification with the K-Closest Resemblance Classifier for Remote Sensing and Social Media Texts

dc.contributor.authorDuan, Cheng
dc.contributor.supervisorInkpen, Diana
dc.date.accessioned2020-11-10T18:36:20Z
dc.date.available2020-11-10T18:36:20Z
dc.date.issued2020-11-10en_US
dc.description.abstractData imbalance has been a challenge in many areas of automatic classification. Many popular approaches including over-sampling, under-sampling, and Synthetic Minority Oversampling Technique (SMOTE) have been developed and tested in previous research. A big problem with these techniques is that they try to solve the problem by modifying the original data rather than truly overcome the imbalance and let the classifiers learn. For tasks in areas like remote sensing and depression detection, the imbalanced data challenge also exists. Researchers have made efforts to overcome the challenge by adopting methods at the data pre-processing step. However, in remote sensing and depression detection tasks, the main interest is still on applying different new classifiers such as deep learning which has powerful classification ability but still do not consider data imbalance as prime factor of lower classification performance. In this thesis, we demonstrate the performance of K-CR in our evaluation experiments on a urban land cover classification dataset and on two depression detection datasets. The latter two datasets consist in social media texts (tweets), therefore we propose to adopt a feature selection technique Term Frequency - Category-Based Term Weights (TF-CBTW) and various word embedding techniques (Word2Vec, FastText, GloVe, and language model BERT). This feature selection method was not applied before in similar settings and we show that it helps to improve the efficiency and the results of the K-CR classifier. Our three experiments show that K-CR can achieve comparable performance on the majority classes and better performance on minority classes when compared to other classifiers such as Random Forest, K-Nearest Neighbour, Support Vector Machines, Multi-layer Perception, Convolutional Neural Networks, and Long Short-Term Memory.en_US
dc.identifier.urihttp://hdl.handle.net/10393/41424
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-25648
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.subjectK-CRen_US
dc.subjectPrototype-based Classifieren_US
dc.subjectData Imbalanceen_US
dc.subjectFeature Selectionen_US
dc.subjectRemote Sensingen_US
dc.subjectDepression Detectionen_US
dc.titleImbalanced Data Classification with the K-Closest Resemblance Classifier for Remote Sensing and Social Media Textsen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMScen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Duan_Cheng_2020_thesis.pdf
Size:
5.48 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: