AI chatbots give inaccurate medical info, study warns

Key Summary

Five popular platforms, ChatGPT, Gemini, Meta AI, Grok and DeepSeek, were used for the study.
They were asked questions across five categories - cancer, vaccines, stem cells, nutrition, and athletic performance.
Reference quality was noted to be poor, with an average completeness score of 40 per cent.

An analysis of five popular chatbots' responses to health and medicine questions shows that many of them were inaccurate and incomplete.

This highlights the health risks faced by users who are increasingly relying on these platforms.

The findings, published in The British Medical Journal (BMJ) Open, show that nearly half of the responses were problematic as they were presenting a false equivalence between science and non-science-based claims.

Researchers from the UK, US, and Canada evaluated five popular platforms - ChatGPT, Gemini, Meta AI, Grok and DeepSeek - by asking each of them 10 open-ended and closed questions across five health categories - cancer, vaccines, stem cells, nutrition, and athletic performance.

"The audited chatbots performed poorly when answering questions in misinformation-prone health and medical fields," the authors wrote.

"Nearly half (49.6 percent) of responses were problematic: 30 percent somewhat problematic and 19.6 percent highly problematic," they said.

Performance of the chatbots was found to be the strongest in topics of cancer and vaccines, and weakest in stem cells, athletic performance and nutrition.

Responses were consistently presented with confidence and certainty, with few caveats or disclaimers, the study found.

Reference quality was noted to be poor, with an average completeness score of 40 per cent.

Chatbot hallucinations - creating false information and presenting as fact - and fabricated citations meant that no chatbot provided a fully accurate reference list, the researchers said.

"Our findings regarding scientific accuracy, reference quality, and response readability highlight important behavioural limitations and the need to re-evaluate how AI chatbots are deployed in public-facing health and medical communication," the authors said.

"By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences. They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments," they said.

The researchers had designed prompts to resemble common 'information-seeking' health and medical queries, language used in misinformation online, and in academic discourse.

The prompts were also used to stress test and pick up behavioural vulnerabilities of AI models by 'straining' them towards misinformation or contraindicated advice.

The information in the responses was scored for accuracy and completeness, with particular attention given to whether a chatbot presented a false balance between science and non-science-based claims, regardless of the strength of the evidence.

chatgpt deepseek meta gemini artificial intelligence british medical journal chatbot

Site Navigation

AI chatbots provide inaccurate medical information: Study

Nearly half of the responses were problematic, 30 percent somewhat problematic, and 19.6 percent highly problematic

Drop in pharmacist headcount worries pharmacy bodies

Pharmacy owners should choose right automation tools: Experts

Related News

Breaking: 10.3 percent funding boost for community pharmacies in England

PharmBot AI submits evidence on AI’s role in pharmacy to Parliament

Supermarket scans spot thousands of lung cancers

Pharmacies central to delivering women’s health strategy

Latest Stories

Start your day right!