2018 COLING COLING 2018

SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions

Abstract

AbstractMental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in usersโ€™ language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language.

๐ŸŒ‰ Interdisciplinary Bridge โ€” Machine Learning and Natural Language Processing
๐Ÿฃ Hot Topic Early Bird โ€” mental health
๐Ÿ Cross-Pollinator โ€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio