Automated Detection of Substance Use Through Social Mining and its Prediction Ability in the Canadian Population

Ibrahim Swailum, Doaa

Automated Detection of Substance Use Through Social Mining and its Prediction Ability in the Canadian Population

dc.contributor.author	Ibrahim Swailum, Doaa
dc.contributor.supervisor	Inkpen, Diana
dc.contributor.supervisor	Al Osman, Hussein
dc.date.accessioned	2025-08-05T19:06:10Z
dc.date.available	2025-08-05T19:06:10Z
dc.date.issued	2025-08-05
dc.description.abstract	According to the latest WHO report in 2024, there is a significant increase in substance use disorders and environmental harms around the world. The report highlights that alcohol consumption was responsible for 2.6 million deaths annually, representing 4.7% of all world deaths, while psychoactive drug use accounted for 0.6 million deaths. The number of drug users increased to 292 million in 2022, reflecting a 20% rise over 10 years. Automated detection of different substance uses through social media can be an effective and practical observational tool for the global substance use problem. Automated detection of online communication has multiple applications, including helping people at-risk and protecting them by predicting and monitoring the early signs of risks on time. Our system can be used by individuals with authority (such as parents or doctors) to detect and monitor different substance users. It could raise an alarm to the relevant individuals to take necessary interventions for the early signs of substance use associated with the flagged posts. This thesis describes the process for classifying online posts to detect substance use problems as early as possible. We began by utilizing two datasets of annotated social media posts to train several classification models that predict whether these posts indicate signs of substance use. We assessed the performance of several traditional and recent deep learning models. Different CNN-based, RNN-based, BERT-based, and GPT models were found to be promising approaches in detecting substance users from their posts. GPT-4o, using a few-shot learning model, outperformed other models with 89.44% F1-score. Also, we built different user-level detection models for common substances (cannabis and alcohol). For cannabis user detection, GPT-4o using a few-shot learning model was the best-performing model with 85.22% F1-score, while the DeBERTa-v3 model was the best-performing model with 65.50% F1-score for alcohol user detection. As a second objective, these models were used for the automated detection of different substance use at the population level in Canada. A common practice for substance use detection at the population level involves conducting surveys via phone calls or interviews; however, this approach is both time-consuming and expensive. Understanding Canadian trends in alcohol and drug use is crucial for developing and evaluating effective policies and programs at both the national and provincial levels. Examining social media posts can serve as a flexible alternative for identifying several substance use problems across Canada. We detected the population-level use of cannabis and alcohol from 2015 to 2018, based on representative samples. Then, we compared these results of the same years' official statistics from Health Canada for the two substances. We used the estimated reports from Health Canada until 2019. Given the lack of annotated data for several substances, such as alcohol, we proposed a data augmentation technique that increased the information within the training phase by building several artificial training sets. Then, we applied the best generalized model (as mentioned before) for population-level detection. The results for population-level detection for both cannabis and alcohol were promising for the tested years and comparable with the results of the Health Canada surveys. The cannabis user detection achieved a difference of 5% or less from the governmental estimations for the nine Canadian provinces included in this study. Similarly, the alcohol user detection achieved a difference of 6.5% or less for the same group of provinces under study. To the best of our knowledge, this is the first study to propose the detection of substance use through social media for an entire country.
dc.identifier.uri	http://hdl.handle.net/10393/50728
dc.identifier.uri	https://doi.org/10.20381/ruor-31296
dc.language.iso	en
dc.publisher	Université d'Ottawa / University of Ottawa
dc.subject	Artificial Intelligence
dc.subject	Natural Language Processing
dc.subject	Substance Use
dc.subject	Social Mining
dc.subject	Data Mining
dc.subject	Text Classification
dc.subject	Text Prediction
dc.title	Automated Detection of Substance Use Through Social Mining and its Prediction Ability in the Canadian Population
dc.type	Thesis	en
thesis.degree.discipline	Génie / Engineering
thesis.degree.level	Doctoral
thesis.degree.name	PhD
uottawa.department	Conception et d'innovation pédagogique en génie / Engineering Design and Teaching Innovation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ibrahim_Swailum_Doaa_2025_thesis.pdf
Size:: 1.43 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

- Thèses, 2011 - // Theses, 2011 -