For the roughly one in ten adults who live with chronic cough, the gap between a symptomatic episode and a specialist appointment is often filled by a search engine or a chatbot. Whether that digital intermediary is trustworthy enough to guide real health decisions is no longer a theoretical question — it is now measurable, and the answers are uneven enough to matter clinically.

A cross-sectional comparative assessment published in Digital Health evaluated six leading generative AI chatbots — ChatGPT-4o, ChatGPT-5, DeepSeek V3, Copilot, Gemini 2.5 Flash, and Perplexity — against 25 high-frequency chronic cough queries drawn from Google Trends and Chinese online health communities. Two clinical experts scored responses for accuracy, supplementary value, and completeness using European Respiratory Society guidelines as the reference standard. Reliability was quantified through four validated instruments: DISCERN, EQIP, JAMA, and GQS. Readability was measured across six metrics including Flesch-Kincaid Grade Level. Perplexity emerged as the top reliability performer (DISCERN: 51.00 ± 3.94; EQIP: 69.40 ± 6.34), while Copilot scored lowest on both scales — a statistically significant gap. Copilot, however, outperformed peers on readability, illustrating a recurring tension in health communication: accessibility and accuracy do not always travel together.

This finding lands in a broader conversation about AI as a first-line health resource. Prior assessments of chatbot performance in other chronic-disease contexts — asthma, COPD, cardiovascular conditions — have similarly found that readability and reliability frequently trade off against each other, and that guideline adherence remains the weakest dimension across platforms. Practically, this means adults self-managing a chronic condition may receive responses that are easy to read but clinically incomplete or subtly misleading. The study's limitations include the use of a single guideline framework, a relatively small query set, and expert scoring that, while rigorous, carries inherent subjectivity. Importantly, ChatGPT-5, the newest model tested, warrants particular attention in follow-up research given its recency. For health-conscious adults, the pragmatic takeaway is this: no current chatbot reliably substitutes for clinician-verified information, and response readability is a poor proxy for medical trustworthiness.