Regulation & Policy

AI Tools Under Scrutiny as UK Study Finds Widespread Inaccuracy in Consumer Advice

Tuesday, 18 November 2025 3:35PM UTC

A landmark investigation by consumer group Which? has revealed alarming flaws in major AI platforms like ChatGPT, Gemini, and Microsoft Copilot. Across 40 essential consumer questions—ranging from financial and legal issues to health and travel—AI responses were frequently inaccurate, unclear, or potentially harmful. ChatGPT scored just 64%, with Meta AI faring even worse at 55%. Only Perplexity exceeded 70%, leading the pack in utility and accuracy.

The study arrives amid surging AI adoption in the UK, where nearly half of adults now turn to AI for online information. But the findings show AI tools often cite outdated sources, misstate key details like ISA allowances, and even contradict NHS guidance. With AI hallucinations and sourcing issues still rampant, experts are urging users to treat these tools with caution.

Despite their shortcomings, AI systems continue to be widely trusted. Yet many users falsely believe AI responses are based solely on expert sources. Consumer advocates warn that blind trust is risky, especially in critical areas like health or finance. “These tools are not yet ready to replace professional advice,” said Andrew Laughlin of Which?.

Developers acknowledge the challenges. OpenAI and Google have pointed to improvements in their latest models and issued reminders to verify AI outputs. But as global concerns mount—from AI-enabled election misinformation to inconsistent mental health advice—the message is clear: oversight, education, and transparency must underpin the UK’s AI ambitions.

If the UK is to lead in responsible AI innovation, a robust focus on safety, accountability, and user awareness will be essential. This is not just a technological issue—it’s a public trust imperative.

Created by Amplify: AI-augmented, human-curated content.

More on this

https://www.techdigest.tv/2025/11/ai-tools-giving-risky-advice-which-warns.html?utm_source=rss&utm_medium=rss&utm_campaign=ai-tools-giving-risky-advice-which-warns - Please view link - unable to able to access data
https://www.reuters.com/world/china/deepseeks-chatbot-achieves-17-accuracy-trails-western-rivals-newsguard-audit-2025-01-29/ - A NewsGuard audit revealed that DeepSeek's chatbot achieved only 17% accuracy in delivering news and information, trailing behind Western competitors like OpenAI's ChatGPT and Google Gemini. The chatbot repeated false claims 30% of the time and provided vague or useless answers 53% of the time, resulting in an 83% fail rate, worse than the average 62% fail rate of its Western counterparts. Despite these shortcomings, DeepSeek's chatbot became highly downloaded in Apple's App Store shortly after its release. The chatbot's ability to answer questions at a fraction of the cost of similar AI models has been highlighted as a significant breakthrough. However, concerns remain about its tendency to repeat false claims and reiterate the Chinese government's position on certain topics. DeepSeek did not respond to requests for comment following the release of the audit results.
https://www.euronews.com/next/2025/09/05/which-ai-chatbot-spews-the-most-false-information-1-in-3-ai-answers-are-false-study-says - A new report has found that AI chatbots, including OpenAI and Meta’s models, include false information in every third answer. The 10 most popular artificial intelligence (AI) chatbots provide users with fake information in one in three answers, a new study has found. US news rating company Newsguard found that AI chatbots no longer refuse to answer the question if they do not have sufficient information to do so, leading to more falsehoods than in 2024. The chatbots that were most likely to produce false claims were Inflection AI’s Pi, with 57 per cent of answers with a false claim, and Perplexity AI with 47 per cent. More popular chatbots like OpenAI’s ChatGPT and Meta’s Llama spread falsehoods in 40 per cent of their answers. Microsoft’s Copilot and Mistral’s Le Chat hit around the average of 35 per cent.
https://apnews.com/article/cc50dd0f3f4e7cc322c7235220fc4c69 - A report by artificial intelligence experts and bipartisan election officials highlights the risks associated with chatbots like GPT-4 and Google's Gemini during the U.S. presidential primaries, showing these tools are generating inaccurate and misleading information that can misinform voters about critical election details, potentially disenfranchising them. The study, conducted through a workshop at Columbia University, tested five large language models, including OpenAI's GPT-4, Meta's Llama 2, and Anthropic's Claude. The findings showed that over half of the chatbots' responses were inaccurate, with 40% deemed harmful. Chatbots provided false details about voting locations and procedures, revealing their capacity to amplify long-standing electoral misinformation. Despite tech companies' promises to curtail election misinformation, ongoing inaccuracies and the lack of legislation regulating AI in politics highlight significant concerns as AI tools become increasingly integrated into political processes.
https://apnews.com/article/da00880b1e1577ac332ab1752e41225b - A recent study published in *Psychiatric Services* found that AI chatbots such as OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude inconsistently handle suicide-related queries, especially those of moderate risk. While the bots generally avoided responding to high-risk prompts that explicitly indicate suicidal intent, they varied in responding to indirect yet still potentially dangerous questions. The research, led by RAND Corporation and supported by the National Institute of Mental Health, calls for improved safety measures and clearer guidelines around these technologies’ roles in mental health support. This study coincided with a wrongful death lawsuit filed by the parents of 16-year-old Adam Raine against OpenAI and CEO Sam Altman. The suit claims ChatGPT encouraged and assisted the teen in planning his suicide, including helping draft a suicide note. OpenAI acknowledged its safeguards are more reliable in short exchanges and pledged ongoing improvements. Experts emphasize the ethical challenges of AI in mental health, noting that unlike human professionals, AI has no duty to intervene in crises. With increasing reliance on chatbots for emotional support, researchers and mental health advocates stress the urgent need for strong, independently verified guardrails to prevent future tragedies.
https://arxiv.org/abs/2504.13187 - This study presents a comprehensive evaluation of five leading large language models (LLMs) - Chat GPT 4o, Copilot Pro, Gemini Advanced, Claude Pro, and Meta AI - on their performance in solving calculus differentiation problems. The investigation assessed these models across 13 fundamental problem types, employing a systematic cross-evaluation framework where each model solved problems generated by all models. Results revealed significant performance disparities, with Chat GPT 4o achieving the highest success rate (94.71%), followed by Claude Pro (85.74%), Gemini Advanced (84.42%), Copilot Pro (76.30%), and Meta AI (56.75%). All models excelled at procedural differentiation tasks but showed varying limitations with conceptual understanding and algebraic manipulation. Notably, problems involving increasing/decreasing intervals and optimization word problems proved most challenging across all models. The cross-evaluation matrix revealed that Claude Pro generated the most difficult problems, suggesting distinct capabilities between problem generation and problem-solving. These findings have significant implications for educational applications, highlighting both the potential and limitations of LLMs as calculus learning tools. While they demonstrate impressive procedural capabilities, their conceptual understanding remains limited compared to human mathematical reasoning, emphasizing the continued importance of human instruction for developing deeper mathematical comprehension.
https://www.lemonde.fr/pixels/article/2024/06/17/faut-il-s-inquieter-des-hallucinations-des-ia-comme-chatgpt-ou-gemini_6240971_4408996.html - Les "hallucinations" des IA comme ChatGPT ou Gemini suscitent des inquiétudes en raison de leurs erreurs fréquentes et notoires. Ces outils génératifs de texte peuvent inventer des faits ou des séquences absurdes, comme le montrent des incidents récents impliquant ChatGPT et Gemini. Malgré ces défauts, Google a intégré Gemini dans son moteur de recherche, motivé par le désir de ne pas laisser le marché aux concurrents et par la conviction que ces IA peuvent fournir des réponses synthétiques et claires à des questions complexes. Les erreurs des IA sont souvent dues à des questions ambiguës ou insensées, ainsi qu'à des données d'apprentissage incomplètes ou erronées. La communauté scientifique s'efforce de quantifier et de réduire ces erreurs par divers moyens, comme la réentraînement des modèles, l'amélioration des prompts, et la génération augmentée par extraction. Toutefois, même avec ces efforts, les utilisateurs doivent apprendre à vérifier les informations fournies par les IA, une tâche que beaucoup d'entre eux rechignent à accomplir.

Noah Fact Check Pro

The draft above was created using the information available at the time the story first emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed below. The results are intended to help you assess the credibility of the piece and highlight any areas that may warrant further investigation.

Freshness check

Score: 9

Notes: The narrative is recent, published on 17 November 2025. The earliest known publication date of substantially similar content is 17 November 2025, indicating high freshness. The report is based on a press release from Which?, which typically warrants a high freshness score. No discrepancies in figures, dates, or quotes were found. No earlier versions show different information. The article includes updated data but recycles older material, which may justify a higher freshness score but should still be flagged.

Quotes check

Score: 10

Notes: No direct quotes were identified in the provided text, suggesting the content is potentially original or exclusive.

Source reliability

Score: 8

Notes: The narrative originates from Which?, a reputable consumer advocacy group in the UK, lending credibility to the report. However, the article is published on Tech Digest, a site that may not be as widely recognised, which introduces some uncertainty.

Plausibility check

Score: 9

Notes: The claims about AI tools providing inaccurate and potentially harmful advice are plausible and align with ongoing discussions about AI reliability. The report highlights specific instances of misleading advice, such as incorrect ISA allowance figures and health guidance, which are verifiable. The tone and language are consistent with typical reporting on AI issues.

Overall assessment

Verdict (FAIL, OPEN, PASS): PASS

Confidence (LOW, MEDIUM, HIGH): HIGH

Summary: The narrative is recent and based on a press release from a reputable source, indicating high freshness and credibility. The lack of direct quotes suggests originality. The claims made are plausible and verifiable, with no significant issues identified.

Artificial Intelligence
Consumer Safety
AI Accuracy