Automated Detection of Substance Use Through Social Mining and its Prediction Ability in the Canadian Population
| dc.contributor.author | Ibrahim Swailum, Doaa | |
| dc.contributor.supervisor | Inkpen, Diana | |
| dc.contributor.supervisor | Al Osman, Hussein | |
| dc.date.accessioned | 2025-08-05T19:06:10Z | |
| dc.date.available | 2025-08-05T19:06:10Z | |
| dc.date.issued | 2025-08-05 | |
| dc.description.abstract | According to the latest WHO report in 2024, there is a significant increase in substance use disorders and environmental harms around the world. The report highlights that alcohol consumption was responsible for 2.6 million deaths annually, representing 4.7% of all world deaths, while psychoactive drug use accounted for 0.6 million deaths. The number of drug users increased to 292 million in 2022, reflecting a 20% rise over 10 years. Automated detection of different substance uses through social media can be an effective and practical observational tool for the global substance use problem. Automated detection of online communication has multiple applications, including helping people at-risk and protecting them by predicting and monitoring the early signs of risks on time. Our system can be used by individuals with authority (such as parents or doctors) to detect and monitor different substance users. It could raise an alarm to the relevant individuals to take necessary interventions for the early signs of substance use associated with the flagged posts. This thesis describes the process for classifying online posts to detect substance use problems as early as possible. We began by utilizing two datasets of annotated social media posts to train several classification models that predict whether these posts indicate signs of substance use. We assessed the performance of several traditional and recent deep learning models. Different CNN-based, RNN-based, BERT-based, and GPT models were found to be promising approaches in detecting substance users from their posts. GPT-4o, using a few-shot learning model, outperformed other models with 89.44% F1-score. Also, we built different user-level detection models for common substances (cannabis and alcohol). For cannabis user detection, GPT-4o using a few-shot learning model was the best-performing model with 85.22% F1-score, while the DeBERTa-v3 model was the best-performing model with 65.50% F1-score for alcohol user detection. As a second objective, these models were used for the automated detection of different substance use at the population level in Canada. A common practice for substance use detection at the population level involves conducting surveys via phone calls or interviews; however, this approach is both time-consuming and expensive. Understanding Canadian trends in alcohol and drug use is crucial for developing and evaluating effective policies and programs at both the national and provincial levels. Examining social media posts can serve as a flexible alternative for identifying several substance use problems across Canada. We detected the population-level use of cannabis and alcohol from 2015 to 2018, based on representative samples. Then, we compared these results of the same years' official statistics from Health Canada for the two substances. We used the estimated reports from Health Canada until 2019. Given the lack of annotated data for several substances, such as alcohol, we proposed a data augmentation technique that increased the information within the training phase by building several artificial training sets. Then, we applied the best generalized model (as mentioned before) for population-level detection. The results for population-level detection for both cannabis and alcohol were promising for the tested years and comparable with the results of the Health Canada surveys. The cannabis user detection achieved a difference of 5% or less from the governmental estimations for the nine Canadian provinces included in this study. Similarly, the alcohol user detection achieved a difference of 6.5% or less for the same group of provinces under study. To the best of our knowledge, this is the first study to propose the detection of substance use through social media for an entire country. | |
| dc.identifier.uri | http://hdl.handle.net/10393/50728 | |
| dc.identifier.uri | https://doi.org/10.20381/ruor-31296 | |
| dc.language.iso | en | |
| dc.publisher | Université d'Ottawa / University of Ottawa | |
| dc.subject | Artificial Intelligence | |
| dc.subject | Natural Language Processing | |
| dc.subject | Substance Use | |
| dc.subject | Social Mining | |
| dc.subject | Data Mining | |
| dc.subject | Text Classification | |
| dc.subject | Text Prediction | |
| dc.title | Automated Detection of Substance Use Through Social Mining and its Prediction Ability in the Canadian Population | |
| dc.type | Thesis | en |
| thesis.degree.discipline | Génie / Engineering | |
| thesis.degree.level | Doctoral | |
| thesis.degree.name | PhD | |
| uottawa.department | Conception et d'innovation pédagogique en génie / Engineering Design and Teaching Innovation |
