Predicting Depression Symptoms Based on Reddit Posts

Name
Kaire Koljal
Abstract
Using social media posts to predict mental health problems has become a popular topic in Natural Language Processing (NLP). Machine learning has been used for detecting a diagnosis or single symptoms associated with depression. As the clinical picture of depression can differ for people, it is better to detect symptoms instead of diagnosis from the social media posts. In this work, depression symptoms are predicted based on posts from Reddit page r/depression using NLP methods and multi-label classification. This work focuses on evaluating the quality of the annotations and analysing if such data can be used to train a predictive model. Each post is annotated by three annotators and the labels are aggregated in three ways to create three datasets that are used to train Transformers models. The results of this work reveal that on a small dataset with a lower annotation agreement, a majority vote over annotations gives the most reliable dataset and results. RoBERTa model shows the best learning and generalization ability in this work.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Kairit Sirts
Defence year
2022
 
PDF