Would ChatGPT Health Recognize Your Medical Emergency? New Study Raises Doubts
If you asked an AI chatbot whether your symptoms were an emergency, would it get it right? A new study finds OpenAI’s ChatGPT Health, a consumer chatbot designed to answer medical questions, misjudged more than half of serious medical emergencies.
Published in Nature Medicine, the researchers found the system failed to recognize 51.6% of emergency cases, often advising patients to seek care within 24 to 48 hours instead of going to the emergency department.
Stress-testing an AI health chatbot
To test ChatGPT Health, researchers created 60 clinician-written medical scenarios spanning 21 areas of medicine, from routine complaints to life-threatening conditions.
Each scenario was run through 16 variations, varying details such as patient demographics and context, generating 960 responses from the chatbot. The AI tool’s recommendations were then compared with physicians’ assessments based on clinical guidelines.
In this context, triage refers to determining how urgently someone should seek care, from managing symptoms at home to scheduling a doctor’s visit or seeking immediate emergency treatment.
When urgent symptoms didn’t trigger an alarm
The study found that some life-threatening conditions were incorrectly treated as less urgent.
Among the cases were diabetic ketoacidosis, a dangerous complication of diabetes, and impending respiratory failure, both conditions that require immediate medical attention. According to the researchers, such delays could have serious consequences if patients relied on the guidance without seeking urgent care.
The system performed better when symptoms were more unmistakable. Classic emergencies such as stroke and severe allergic reactions (anaphylaxis) were consistently recognized as requiring immediate treatment.
A wider pattern of triage inconsistencies
Aside from missed emergencies, the researchers found other signs of uneven performance across the test scenarios. In some non-urgent cases, the chatbot recommended medical care when it wasn’t necessary, suggesting a doctor’s appointment for symptoms that could typically be managed at home.
Additionally, the study showed that context could influence the chatbot’s recommendations. When family members or friends in a scenario minimized a patient’s symptoms, an effect researchers described as anchoring bias, the tool was much more likely to suggest less urgent care in borderline cases.
There were also inconsistencies in suicide risk responses. Crisis support messages were sometimes triggered when users described suicidal thoughts without mentioning a specific method, but failed to activate when more concrete plans were described.
OpenAI responds to the findings
OpenAI said the study does not fully reflect how ChatGPT Health is designed to work in practice. A company spokesperson told CNBC that the tool is intended for ongoing conversations, where users can ask follow-up questions and provide additional context, rather than relying on a single prompt and response.
The company added that the AI tool remains limited in availability while it continues to refine safety and reliability before a broader rollout.
Experts urge caution on AI medical advice
Dr. Ashwin Ramaswamy, the study’s lead author, cautioned that tools like ChatGPT Health should not yet guide medical decisions without further testing. He said chatbots cannot currently be considered safe sources of medical advice on their own.
Experts also stressed the need for rigorous evaluation before such systems are widely deployed. Dr. John Mafi, an associate professor of medicine at UCLA Health, said technologies capable of influencing health decisions should undergo controlled trials to ensure their benefits outweigh potential risks.
Dr. Ethan Goh, executive director of the AI research network ARISE, added that while chatbots can be helpful, they should not be treated as substitutes for a physician’s judgment.
MyFitnessPal has acquired Cal AI, a teen-built nutrition app that generated $30 million in annual revenue in under two years.
The post Would ChatGPT Health Recognize Your Medical Emergency? New Study Raises Doubts appeared first on eWEEK.