Six AI Chatbots Tested on Chronic Cough Advice: Reliability Gaps Revealed

Jun 30, 2026

For the roughly one in ten adults who live with chronic cough, the gap between a symptomatic episode and a specialist appointment is often filled by a search engine or a chatbot. Whether that digital intermediary is trustworthy enough to guide real health decisions is no longer a theoretical question — it is now measurable, and the answers are uneven enough to matter clinically.

A cross-sectional comparative assessment published in Digital Health evaluated six leading generative AI chatbots — ChatGPT-4o, ChatGPT-5, DeepSeek V3, Copilot, Gemini 2.5 Flash, and Perplexity — against 25 high-frequency chronic cough queries drawn from Google Trends and Chinese online health communities. Two clinical experts scored responses for accuracy, supplementary value, and completeness using European Respiratory Society guidelines as the reference standard. Reliability was quantified through four validated instruments: DISCERN, EQIP, JAMA, and GQS. Readability was measured across six metrics including Flesch-Kincaid Grade Level. Perplexity emerged as the top reliability performer (DISCERN: 51.00 ± 3.94; EQIP: 69.40 ± 6.34), while Copilot scored lowest on both scales — a statistically significant gap. Copilot, however, outperformed peers on readability, illustrating a recurring tension in health communication: accessibility and accuracy do not always travel together.

This finding lands in a broader conversation about AI as a first-line health resource. Prior assessments of chatbot performance in other chronic-disease contexts — asthma, COPD, cardiovascular conditions — have similarly found that readability and reliability frequently trade off against each other, and that guideline adherence remains the weakest dimension across platforms. Practically, this means adults self-managing a chronic condition may receive responses that are easy to read but clinically incomplete or subtly misleading. The study's limitations include the use of a single guideline framework, a relatively small query set, and expert scoring that, while rigorous, carries inherent subjectivity. Importantly, ChatGPT-5, the newest model tested, warrants particular attention in follow-up research given its recency. For health-conscious adults, the pragmatic takeaway is this: no current chatbot reliably substitutes for clinician-verified information, and response readability is a poor proxy for medical trustworthiness.

Source: Digital health · view source ↗

For informational, non-clinical use. Synthesized analysis of published research — may contain errors. Not medical advice. Consult original sources and your physician.

Related Health Research

Six AI Chatbots Tested on Chronic Cough Advice: Reliability Gaps Revealed

Related Health Research

MRI Texture Features in Hippocampal Subfields Associated with Alzheimer's Severity and Cognition

Arm Dominance Stems From Practice, Not Hardwired Brain Asymmetry

Folinic Acid Evidence Remains Preliminary in Autism; Swedish Sibling Study Clears Acetaminophen

No Significant Differences in Glucose Control Between School Days and Holidays for Children Using Open-Source Artificial Pancreas Systems

Hidden Compulsion to Move in Anorexia Predicts Poor Recovery Outcomes

Kidney Transplant Recipients Face Elevated Cardiovascular Risk From Multiple Compounding Factors

Explore Topics

✉️ Daily Digest