Should You Ask AI About Your Health? What the KFF Data and BMJ Research Actually Show in 2026

Q: Is it safe to ask AI about symptoms?

For general information, AI can be a useful starting point — but a 2026 BMJ Open audit found nearly half of AI health responses were problematic, and NIH research found public users only correctly identified conditions in 35% of cases when using AI. Treat AI symptom information as a prompt for questions to bring to a doctor, not a diagnosis.

Q: Can AI give dangerous health advice?

Yes, and there are documented cases. NIH reviewed four major chatbots and found failures including omitting emergency safety information about miscarriage, recommending that infants drink water, and offering inappropriate reassurance for symptoms that needed more thorough evaluation.

Q: Is AI better or worse than WebMD or Google for health questions?

Differently risky rather than better or worse. AI gives conversational responses that feel more reliable — but the BMJ audit's finding that 49.6% of health responses were problematic is significant, and AI's confident presentation makes its errors harder to spot than a clearly low-quality search result.

Person on a phone typing a health question into an AI chatbot with a warning icon overlay

Roughly one in three American adults now turns to AI chatbots for health information at least monthly, according to KFF's 2026 tracking poll — and a separate BMJ Open audit published in April 2026 found that nearly half of the health answers those same chatbots gave were problematic, with about one in five rated as highly problematic or potentially harmful. Both of those statistics are true at the same time, and the tension between them is exactly what makes this a harder question than "yes you should" or "no you shouldn't." AI can be genuinely useful for understanding a diagnosis you just received, translating medical jargon, or knowing which questions to bring to your doctor. It can also confidently tell you the wrong medication dose or hallucinate a study that doesn't exist. Knowing which situation you're in is the whole problem.

This is the third piece in our ongoing series on AI and real life — after covering what happens to your data in ChatGPT and Gemini, and how the three major platforms compare on privacy. This one focuses on a question that KFF has been tracking with real polling data in 2026: are people using AI for health information, does it work, and what does the evidence actually say about when it's safe to do so?

Who's Actually Asking AI Health Questions — and Why

KFF's March 2026 Tracking Poll on Health Information and Trust, conducted among a nationally representative sample of 1,343 U.S. adults, found that about a third (32%) of adults are using AI chatbots for health information or advice. That includes 29% who've used AI for physical health questions and 16% for mental health information. According to OpenAI's own data from 2026, more than 40 million people globally turn to ChatGPT daily for health information. AI chatbots are also handling 1.6 to 1.9 million health insurance and billing questions per week.

The reason isn't hard to understand. Healthcare in the United States is expensive, appointments are hard to get, and Google search results have deteriorated to the point where the first five results on most health queries are ads or SEO content farms. Typing a question into an AI chatbot that responds in plain English and doesn't require a copay feels like a genuine improvement. A 2026 Gallup survey found that 73% of patients would prefer to consult their doctors for health information — but with costs increasing and appointments hard to schedule, AI fills the gap for people who have no immediate alternative.

What's striking in KFF's data is the trust gap. Even as usage climbs, two-thirds of all adults say they trust AI tools "not too much" or "not at all" to provide reliable health information. For mental health specifically, three in four say the same. People are using AI for health advice while simultaneously skeptical of it, which suggests they're not uncritically accepting what they read — but it also means they're making judgment calls about reliability that the evidence suggests they're not well-equipped to make.

What the Research Actually Shows About AI Health Accuracy

Study / Source	Finding	What It Means Practically
BMJ Open audit, April 2026 (5 chatbots, 250 health responses)	49.6% of responses were problematic; 19.6% were highly problematic or potentially harmful	Nearly one in five health answers from a popular chatbot could actively harm you if acted on
BMJ Open audit (same study)	Average reference completeness: 40%. No chatbot produced a fully accurate reference list.	When an AI cites a study to support a health claim, there's a better than even chance that citation is incomplete or fabricated
NIH/PMC study on hallucination in clinical prompts	Major AI models hallucinated in 50–83% of cases when clinical prompts contained fabricated information	AI doesn't flag when your premise is wrong — it builds on it and produces confident nonsense
NIH/PMC study on real-world vs controlled use	AI achieved 95% diagnostic accuracy independently, but public users identifying conditions with AI assistance: only 35%	The AI performing well on a test and a regular person using AI to make a health decision are two completely different situations
KFF Health Misinformation Tracking Poll	50% of AI users not confident they can tell fact from fiction in AI health responses	Users sense the risk but can't reliably identify when they're receiving bad information
NIH/PMC study (Claude, Gemini, ChatGPT, Llama)	Multiple failures: omitted safety information, included unsafe advice (e.g., infants drinking water), offered inappropriate reassurance without considering patient history	Real documented cases of dangerous health advice delivered confidently without any warning

The Specific Things AI Gets Wrong in Health Contexts

The BMJ Open audit is worth understanding in detail because it's the most systematic 2026 evaluation of health chatbots available. Researchers tested five popular AI chatbots — the same ones most people are using — on common health questions, and the problematic response rate of 49.6% wasn't a result of obscure edge-case questions. These were everyday health queries that real users ask every day.

Citation hallucination is the failure mode that catches people most off guard. The BMJ's own reporting on the study noted that reference quality was poor across every model tested, with an average completeness score of 40% and no chatbot producing a fully accurate reference list. The citations look plausible — real-sounding journal names, author names, volume numbers — but a meaningful fraction of them don't exist or don't say what the chatbot claims. If you're trying to verify an AI health claim by following its citations, you may be chasing sources that were invented.

The hallucination problem is worse when your starting premise is wrong. An NIH/PMC study found that when clinical prompts embedded fabricated information, leading AI models hallucinated in 50 to 83% of cases — confidently mentioning non-existent laboratory tests or diseases, building on the false premise rather than correcting it. In practical terms: if you describe your symptoms inaccurately (which people often do), or if you've read something wrong about your condition before asking the AI, the AI is likely to amplify the error rather than catch it.

The readability problem is an underappreciated risk. The same BMJ Open study found all responses graded as "difficult, equivalent to college graduate level." The people most likely to need simple, accessible health information — older adults, those with lower health literacy, non-native speakers — are the least equipped to evaluate whether a technically complex response is accurate. The patients with the highest risk of acting on bad AI health advice are the same patients for whom the responses are hardest to critically evaluate.

Where AI Health Questions Actually Work Well

This isn't a case where the evidence says "never use AI for anything health-related." The evidence is more specific than that, and being honest about where AI is genuinely useful matters as much as being honest about the risks.

AI chatbots perform well on general health education: explaining what a condition generally involves, translating medical terminology from a diagnosis letter, summarizing what a class of medications typically does, or helping you formulate the questions you want to bring to an appointment. These are tasks where the downside of a slightly imprecise answer is low, where the AI's broad training data is an asset rather than a liability, and where there's no time-sensitive decision riding on the accuracy of the response.

AI is also helpful for navigating the administrative side of healthcare — understanding what a prior authorization means, how an EOB (Explanation of Benefits) works, what a specific billing code typically represents. The 1.6 to 1.9 million weekly health insurance questions that AI chatbots handle are mostly in this category, and it's probably where AI adds the most concrete value with the lowest risk.

Where the risk rises sharply is anywhere a specific answer drives a specific action: dosing, diagnosis, triage, deciding whether a symptom needs emergency care, or evaluating whether a medication is safe to take with something else you're already on. These are exactly the contexts where AI's hallucination problem and inability to account for your individual health history create real danger — and they're also the questions people most want to ask because they're the ones where they most urgently need an answer.

Eight Rules for Using AI Health Questions Without Getting Burned

Never use AI for emergency triage. If you're wondering whether something is serious enough to call 911 or go to an ER, call 911 or go to an ER. AI chatbots have documented failures on emergency care scenarios, including giving dangerous advice about cardiopulmonary resuscitation.
Don't ask AI for specific dosing information. Medication dosing is highly individual and context-dependent, and AI's documented hallucination rate on clinical questions makes this a genuinely dangerous application. The same confident tone applies whether the dose is right or dangerously wrong.
Treat any cited study as unverified until you check it yourself. Given that average citation completeness across AI health responses is 40%, and no model produced a fully accurate reference list in the BMJ audit, any study an AI mentions should be independently verified before you rely on it.
Tell the AI your actual symptoms, not a diagnosis you've already concluded. The hallucination research shows AI builds on false premises rather than correcting them. If you start with "I think I have X," the AI is likely to confirm X rather than critically evaluate your reasoning.
Use AI to prepare for appointments, not replace them. Generating a list of questions to ask your doctor, understanding what a condition involves so you can have a more informed conversation, or translating terminology from a report — these are genuinely good uses with low risk.
Be especially careful with mental health questions. KFF found that 77% of adults distrust AI for mental health information specifically. Appropriate emotional support is a domain where AI's inability to understand individual context and its tendency toward generic reassurance is particularly problematic.
Don't let AI's confident tone substitute for verification. The research consistently finds that AI delivers accurate and inaccurate health information in the same confident register. The tone of a response tells you nothing about its accuracy.
Don't use AI if you can't verify what it tells you. If you're asking a health question and you have no way to evaluate the answer — you don't know enough about the topic to spot an error, and you're not going to follow up with a professional — that's a situation where the 49.6% problematic response rate from the BMJ audit should give you genuine pause.

Frequently Asked Questions

Is it safe to ask AI about symptoms?

For general information about what a symptom might indicate, AI can be a useful starting point — but a 2026 BMJ Open audit found nearly half of AI health responses were problematic, and NIH research found public users only correctly identified conditions in 35% of cases when using AI versus the 95% accuracy seen in controlled AI-only testing. Treat AI symptom information as a prompt for questions to bring to a doctor, not a diagnosis.

Can AI give dangerous health advice?

Yes, and there are documented cases. The NIH reviewed four major chatbots and found failures including omitting emergency safety information about miscarriage, recommending that infants drink water (which is dangerous), and offering inappropriate reassurance for symptoms that needed more thorough evaluation. These aren't hypothetical risks.

How many people use AI for health questions?

About 32% of U.S. adults use AI for health information, according to KFF's March 2026 nationally representative survey of 1,343 adults. OpenAI's own data indicates more than 40 million people globally turn to ChatGPT daily for health information.

Why do AI chatbots give wrong health information?

Three main reasons: hallucination (generating plausible-sounding but fabricated information, including fake citations), inability to account for individual patient history and context, and the tendency to build on false premises rather than correct them. The BMJ Open audit found average citation completeness of only 40% and no chatbot produced a fully accurate reference list.

Is AI better or worse than WebMD or Google for health questions?

Differently risky rather than straightforwardly better or worse. AI gives conversational, plain-English responses that feel more accessible than search results or static health pages, which makes it feel more reliable — but the BMJ audit's finding that 49.6% of health responses were problematic is a meaningful data point that static reference sites don't have in the same way. The confidence of AI presentation makes its errors harder to spot than a clearly biased or low-quality search result.