Emergency and primary care physicians face a diagnostic dilemma with ear, nose, and throat conditions that drives up healthcare costs and patient anxiety. Diagnostic uncertainty in these settings frequently leads to unnecessary specialist referrals and over-testing, creating bottlenecks in an already strained healthcare system while potentially delaying appropriate care for patients who truly need urgent intervention.
This comparative analysis tested four advanced AI language models against 100 practicing physicians across twelve validated ENT clinical scenarios. The human physicians demonstrated strong diagnostic accuracy at 91.6% but revealed concerning gaps in clinical judgment—nearly one-third made inappropriate referral decisions for non-urgent cases, while only half recognized situations requiring immediate specialist attention. The AI systems showed comparable diagnostic performance while demonstrating more consistent referral decision-making patterns.
The implications extend beyond simple accuracy metrics to fundamental questions about clinical workflow optimization in resource-constrained healthcare environments. Current ENT referral patterns create months-long waiting lists while simultaneously overwhelming specialists with cases that could be managed in primary care. This research suggests AI could serve as a clinical decision support tool to help front-line physicians distinguish between conditions requiring immediate specialist intervention versus those manageable with conservative treatment. However, the study's limitation to twelve vignettes raises questions about real-world performance across the full spectrum of ENT pathology. The technology appears most promising not as a replacement for clinical judgment, but as a standardizing force that could help reduce the wide variability in referral patterns currently observed among physicians with similar training backgrounds.