Do Large Language Models Align with Core Mental Health Counseling Competencies?

Viet Cuong Nguyen; Mohammad Taher; Dongwan Hong; Vinicius Konkolics Possobom; Vibha Thirunellayi Gopalakrishnan; Ekta Raj; Zihang Li; Heather J. Soled; Michael L. Birnbaum; Srijan Kumar; Munmun De Choudhury

2025 NAACL NAACL 2025

Do Large Language Models Align with Core Mental Health Counseling Competencies?

Abstract

AbstractThe rapid evolution of Large Language Models (LLMs) presents a promising solution to the global shortage of mental health professionals. However, their alignment with essential counseling competencies remains underexplored. We introduce CounselingBench, a novel NCMHCE-based benchmark evaluating 22 general-purpose and medical-finetuned LLMs across five key competencies. While frontier models surpass minimum aptitude thresholds, they fall short of expert-level performance, excelling in Intake, Assessment & Diagnosis but struggling with Core Counseling Attributes and Professional Practice & Ethics. Surprisingly, medical LLMs do not outperform generalist models in accuracy, though they provide slightly better justifications while making more context-related errors. These findings highlight the challenges of developing AI for mental health counseling, particularly in competencies requiring empathy and nuanced reasoning. Our results underscore the need for specialized, fine-tuned models aligned with core mental health counseling competencies and supported by human oversight before real-world deployment. Code and data associated with this manuscript can be found at: https://github.com/cuongnguyenx/CounselingBench

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Healthcare & Medicine and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — clinical competency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio