Repository logo

User Modeling in Social Media: Gender and Age Detection

dc.contributor.authorDaneshvar, Saman
dc.contributor.supervisorInkpen, Diana
dc.date.accessioned2019-08-21T19:19:51Z
dc.date.available2019-08-21T19:19:51Z
dc.date.issued2019-08-21en_US
dc.description.abstractAuthor profiling is a field within Natural Language Processing (NLP) that is concerned with identifying various characteristics and demographic factors of authors, such as gender, age, location, native language, political orientation, and personality by analyzing the style and content of their writings. There is a growing interest in author profiling, with applications in marketing and advertising, opinion mining, personalization, recommendation systems, forensics, security, and defense. In this work, we build several classification models using NLP, Deep Learning, and classical Machine Learning techniques that can identify the gender and age of a Twitter user based on the textual contents of their correspondence (tweets) on the platform. Our SVM gender classifier utilizes a combination of word and character n-grams as features, dimensionality reduction using Latent Semantic Analysis (LSA), and a Support Vector Machine (SVM) classifier with linear kernel. At the PAN 2018 author profiling shared task, this model achieved the highest performance with 82.21%, 82.00%, and 80.90% accuracy on the English, Spanish, and Arabic datasets, respectively. Our age classifier was trained on a dataset of 11,160 Twitter users, using the same approach, though the age classification experiments are preliminary. Our Deep Learning gender classifiers are trained and tested on English datasets. Our feedforward neural network consisting of a word embedding layer, flattening, and two densely-connected layers achieves 79.57% accuracy, and our bidirectional Long Short-Term Memory (LSTM) neural network achieves 76.85% accuracy on the gender classification task.en_US
dc.identifier.urihttp://hdl.handle.net/10393/39535
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-23778
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.subjectMachine Learningen_US
dc.subjectMLen_US
dc.subjectDeep Learningen_US
dc.subjectNatural Language Processingen_US
dc.subjectNLPen_US
dc.subjectUser Modelingen_US
dc.subjectAuthor Profilingen_US
dc.subjectGender Identificationen_US
dc.subjectAge Detectionen_US
dc.subjectSocial Mediaen_US
dc.subjectTwitteren_US
dc.titleUser Modeling in Social Media: Gender and Age Detectionen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMScen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Daneshvar_Saman_2019_thesis.pdf
Size:
3.96 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: