Explainable Depression Detection Using Social Media Data
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université d'Ottawa | University of Ottawa
Abstract
With the advances in Machine Learning (ML) techniques and the wide availability of social media data, early intervention for mental health issues becomes more and more practicable. An important problem for Natural Language Processing (NLP) practitioners is the automatic detection of mental disorders such as depression, on a large scale. Social media analysis is commonly used to tackle this problem. Due to the rapid growth of user interaction on different social media platforms, publicly available social media data has increased substantially. The sheer amount of data and level of personal information being shared on such platforms has made analyzing textual information to predict mental disorders such as depression a reliable preliminary step when it comes to psychometrics. However, it remains a challenge for computing systems to deal with the huge amount of textual information, and to understand the relationships between the content of the texts and the writers' actual mental health conditions.
In this study, we first proposed a system to search for texts that are related to depression symptoms from the Beck's Depression Inventory (BDI) questionnaire, to extract relevant textual data from huge collections and provide a ranking for further investigation. For each of the 21 symptoms on the BDI questionnaire, 21 queries (for the 21 symptoms) were constructed based on the corresponding questions and possible answers on the questionnaire. Several methods focusing on extracting relevant sentences from Reddit social media posts and comments were introduced. To rank the texts (sentences) based on their relevance, neural embedding vectors were computed as representations; then their cosine similarity to each symptom-query embedding was calculated. With a dataset having only texts and no labels that could be used for training, our system obtained competitive results on 4 metrics with efficient computing, particularly for the metric "precision at 10", which measures how many relevant results there are in the top 10 retrieved sentences. These advantages provide an opportunity for the system to be adopted for the next task.
Then, we address the even more challenging task of automatic depression level detection, using the writings and the voluntary answers provided by users on Reddit. Several explainable machine learning algorithms and several Large Language Models (LLMs) were applied in our experiments to provide both predictions and explanations for each question. One of our proposed systems is based on glass box models which are interpretable, and another is based on LLMs that could generate explanations for their predictions even if they are considered black boxes. By combining two LLMs for different questions, we achieved better performance on three of four metrics compared to the state-of-the-art and remained competitive on the one remaining metric. In addition, our system is explainable on two levels: first, predicting the answers to the BDI questions provides clues about the possible symptoms that could lead to a clinical diagnosis of depression; second, our system can explain the predicted answer for each question.
Description
Keywords
depression detection, social media analysis, information retrieval, natural language processing, explainable artificial intelligence, artificial intelligence, large language model, machine learning
