Imbalanced Data Classiﬁcation with the K-Closest Resemblance Classiﬁer for Remote Sensing and Social Media Texts

Duan, Cheng

Imbalanced Data Classiﬁcation with the K-Closest Resemblance Classiﬁer for Remote Sensing and Social Media Texts

dc.contributor.author	Duan, Cheng
dc.contributor.supervisor	Inkpen, Diana
dc.date.accessioned	2020-11-10T18:36:20Z
dc.date.available	2020-11-10T18:36:20Z
dc.date.issued	2020-11-10	en_US
dc.description.abstract	Data imbalance has been a challenge in many areas of automatic classiﬁcation. Many popular approaches including over-sampling, under-sampling, and Synthetic Minority Oversampling Technique (SMOTE) have been developed and tested in previous research. A big problem with these techniques is that they try to solve the problem by modifying the original data rather than truly overcome the imbalance and let the classiﬁers learn. For tasks in areas like remote sensing and depression detection, the imbalanced data challenge also exists. Researchers have made eﬀorts to overcome the challenge by adopting methods at the data pre-processing step. However, in remote sensing and depression detection tasks, the main interest is still on applying diﬀerent new classiﬁers such as deep learning which has powerful classiﬁcation ability but still do not consider data imbalance as prime factor of lower classiﬁcation performance. In this thesis, we demonstrate the performance of K-CR in our evaluation experiments on a urban land cover classiﬁcation dataset and on two depression detection datasets. The latter two datasets consist in social media texts (tweets), therefore we propose to adopt a feature selection technique Term Frequency - Category-Based Term Weights (TF-CBTW) and various word embedding techniques (Word2Vec, FastText, GloVe, and language model BERT). This feature selection method was not applied before in similar settings and we show that it helps to improve the eﬃciency and the results of the K-CR classiﬁer. Our three experiments show that K-CR can achieve comparable performance on the majority classes and better performance on minority classes when compared to other classiﬁers such as Random Forest, K-Nearest Neighbour, Support Vector Machines, Multi-layer Perception, Convolutional Neural Networks, and Long Short-Term Memory.	en_US
dc.identifier.uri	http://hdl.handle.net/10393/41424
dc.identifier.uri	http://dx.doi.org/10.20381/ruor-25648
dc.language.iso	en	en_US
dc.publisher	Université d'Ottawa / University of Ottawa	en_US
dc.subject	K-CR	en_US
dc.subject	Prototype-based Classifier	en_US
dc.subject	Data Imbalance	en_US
dc.subject	Feature Selection	en_US
dc.subject	Remote Sensing	en_US
dc.subject	Depression Detection	en_US
dc.title	Imbalanced Data Classiﬁcation with the K-Closest Resemblance Classiﬁer for Remote Sensing and Social Media Texts	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Génie / Engineering	en_US
thesis.degree.level	Masters	en_US
thesis.degree.name	MSc	en_US
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science	en_US

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Duan_Cheng_2020_thesis.pdf
Taille:: 5.48 MB
Format:: Adobe Portable Document Format
Description:

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -