People frequently reach for their phones rather than a doctor late at night, long after the majority of clinics close. In a dark room, the screen glows softly. The following symptoms are entered into a chatbot window: breathing difficulties, dizziness, and tightness in the chest.
The technology is oddly reassuring in that moment. Instant. Be calm. Always accessible. Recent studies, however, indicate that the assurance may occasionally be misguided.
| Category | Details |
|---|---|
| AI Tool | ChatGPT Health |
| Developer | OpenAI |
| Daily Health Queries | Approx. 40 million globally |
| Study Publication | Nature Medicine (2026) |
| Lead Research Institution | Icahn School of Medicine at Mount Sinai |
| Study Design | 60 clinical scenarios across 21 medical specialties |
| Total AI Interactions | About 960 responses tested |
| Major Finding | Failed to recommend emergency care in over 51% of serious cases |
| Key Concern | “Under-triage” of life-threatening symptoms |
| Reference Source | https://www.mountsinai.org |
The effectiveness of ChatGPT Health, an AI chatbot created to respond to medical inquiries, was investigated in a study that was published in Nature Medicine. Dozens of realistic patient scenarios, from minor ailments to actual emergencies, were fed into the system by researchers. They discovered something disturbing: the AI did not suggest urgent care in over half of the cases where doctors thought a patient should visit the ER right away.
Under the direction of doctors from the Icahn School of Medicine at Mount Sinai, the researchers tested 60 clinical scenarios in 21 different medical specialties and generated almost a thousand chatbot responses. The procedure appeared nearly clinical on paper, with measured results and controlled inputs. However, there are actual possibilities hidden behind those figures, such as a diabetic crisis, a struggling asthmatic, or someone reporting suicidal thoughts.
The way these systems are trained may be the source of the issue. When it comes to producing fluid text, summarizing data, and simulating human speech, large language models excel. However, diagnosing medical urgency calls for a different set of skills, including clinical judgment, experience, and occasionally a cautious instinct that compels medical professionals to take action even in the face of insufficient evidence. It’s challenging to program that instinct.
In one case, despite accurately identifying the early indicators of respiratory distress, the chatbot recommended delaying seeking emergency care in favor of seeing a doctor later. The difference may seem insignificant, but anyone who has witnessed an asthma attack worsen understands how quickly things can get out of hand. Minutes count.
Another instance concerned diabetic ketoacidosis, a potentially fatal consequence of diabetes. All of the doctors who examined the situation agreed that emergency care was necessary. However, occasionally the AI response recommended making a regular doctor’s appointment.
It’s hard not to sense a slight conflict between technological optimism and reality when reading those results.
Healthcare has hailed artificial intelligence as a breakthrough. Machine-learning systems are used by hospitals to scan medical images, forecast patient decline, and expedite administrative tasks. Even doctors acknowledge that artificial intelligence (AI) tools can occasionally summarize medical records more quickly than a human assistant.
Physicians who examined the research noted what they refer to as “under-triage.” Stated differently, the AI system sometimes underestimated the severity of a situation. Additionally, users may place more trust in a chatbot than is appropriate if it sounds confident and uses soothing, calm language.
It’s possible that technology developers are still undervaluing the psychological aspect of this.
Conversational systems are often viewed as advisors by users. A chatbot doesn’t seem hurried or worn out. It doesn’t ever stop. It provides answers to all queries. That unending availability can feel almost therapeutic to someone who is concerned about symptoms late at night. As this trend develops, it seems as though AI can occasionally turn into a virtual confidante.
One instance brought up in the debates surrounding the study concerned a man who kept going to an AI chatbot for help with swallowing issues and chronic throat pain. According to the responses, it was unlikely that the symptoms were cancerous. When he eventually saw a doctor months later, he was given a stage-four esophageal cancer diagnosis. It’s hard to ignore stories like that.
To be fair, AI health tool developers frequently stress that their systems are not meant to take the place of medical professionals. Warnings are included by ChatGPT Health itself, stating that the data it offers should not be used to make decisions about diagnosis or treatment. The tool’s creators claim that independent research contributes to increased safety, and the tool is still being improved.
However, there can be a startlingly large discrepancy between design intent and actual behavior. Not everyone reads disclaimers. Even when it shouldn’t, a chatbot can seem authoritative when it answers with clarity and assurance.
When they arrive, patients report that they have already used an AI system to check their symptoms. The advice was harmless at times. At other times, it was deceptive. And sometimes, doctors claim, it postponed a visit that ought to have taken place sooner. However, it doesn’t seem likely that AI will be completely abandoned in healthcare.
Millions of people have difficulty getting medical care quickly or live far from hospitals. AI tools can give them access to fundamental knowledge and direction that they might not otherwise have. Even researchers admit that chatbots frequently do well in simple scenarios, such as questions about medications or common colds.
Test results that appear normal even as a patient deteriorates, or symptoms that appear mild but conceal something dangerous, are all common in medicine. Skilled medical professionals are able to decipher nonverbal cues, such as a patient’s voice tone, posture, or breathing.
There is a sense that society is about to embark on a new phase of experimentation as this debate plays out in hospitals, tech companies, and research labs. Artificial intelligence is becoming more pervasive in everyday life, even in domains where errors have tangible repercussions.
It’s unclear if ChatGPT Health and similar systems will develop into trustworthy medical partners.
One lesson appears to be becoming more and more obvious at the moment. AI can help people consider options, explain symptoms, and respond to inquiries at midnight. However, the simplest advice might still be the safest when something really feels off. Consult a physician.
