Mental health care faces a global access crisis — too few providers, too much stigma, and too little scalability. The emergence of conversational AI as a potential bridge has generated enormous research interest, but until now, no comprehensive map existed of where this technology actually works, where it fails, and whose needs it is genuinely designed to serve. That gap is exactly what this scoping review addresses, with implications for anyone betting on AI to democratize psychological support.
Drawing on 677 qualifying studies from an initial pool of over 10,000 records across 15 databases, the review reveals a field that is growing rapidly — publications have surged since 2020 — but developing unevenly. Computer science dominates the research landscape (449 studies) compared to medicine (148) and social sciences (80), a disciplinary skew that shapes how problems get framed and what counts as success. The distribution across the care continuum is similarly lopsided: intervention-stage applications account for 66% of studies, detection for 23%, while prevention receives just 8% attention and long-term maintenance a mere 3%. Mood disorders, anxiety, and stress conditions are the most studied; rarer, more complex conditions remain comparatively neglected. Large language models have become the dominant technology, especially for intervention and maintenance applications, while multimodal data inputs — voice, facial expression, behavioral signals — remain underdeveloped.
The prevention and maintenance blind spots are arguably the most consequential finding here. Effective mental health care is longitudinal; crisis intervention without ongoing support is a well-documented failure mode in clinical settings. A field overwhelmingly optimized for acute detection and short-term intervention may be replicating exactly the fragmented care model it was supposed to transcend. The heavy reliance on text-based data also raises questions about equity — populations with limited literacy or those whose distress manifests behaviorally rather than verbally risk being systematically underserved. This review is observational and descriptive by design, so it cannot establish what approaches actually improve patient outcomes. Nonetheless, its human-centered taxonomy offers a useful corrective framework for researchers and developers inclined to optimize for technological novelty over genuine clinical utility.