A landmark investigation by consumer group Which? has revealed alarming flaws in major AI platforms like ChatGPT, Gemini, and Microsoft Copilot. Across 40 essential consumer questions—ranging from financial and legal issues to health and travel—AI responses were frequently inaccurate, unclear, or potentially harmful. ChatGPT scored just 64%, with Meta AI faring even worse at 55%. Only Perplexity exceeded 70%, leading the pack in utility and accuracy.
The study arrives amid surging AI adoption in the UK, where nearly half of adults now turn to AI for online information. But the findings show AI tools often cite outdated sources, misstate key details like ISA allowances, and even contradict NHS guidance. With AI hallucinations and sourcing issues still rampant, experts are urging users to treat these tools with caution.
Despite their shortcomings, AI systems continue to be widely trusted. Yet many users falsely believe AI responses are based solely on expert sources. Consumer advocates warn that blind trust is risky, especially in critical areas like health or finance. “These tools are not yet ready to replace professional advice,” said Andrew Laughlin of Which?.
Developers acknowledge the challenges. OpenAI and Google have pointed to improvements in their latest models and issued reminders to verify AI outputs. But as global concerns mount—from AI-enabled election misinformation to inconsistent mental health advice—the message is clear: oversight, education, and transparency must underpin the UK’s AI ambitions.
If the UK is to lead in responsible AI innovation, a robust focus on safety, accountability, and user awareness will be essential. This is not just a technological issue—it’s a public trust imperative.
Created by Amplify: AI-augmented, human-curated content.
Noah Fact Check Pro
The draft above was created using the information available at the time the story first
emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed
below. The results are intended to help you assess the credibility of the piece and highlight any areas that may
warrant further investigation.
Freshness check
Score:
9
Notes:
The narrative is recent, published on 17 November 2025. The earliest known publication date of substantially similar content is 17 November 2025, indicating high freshness. The report is based on a press release from Which?, which typically warrants a high freshness score. No discrepancies in figures, dates, or quotes were found. No earlier versions show different information. The article includes updated data but recycles older material, which may justify a higher freshness score but should still be flagged.
Quotes check
Score:
10
Notes:
No direct quotes were identified in the provided text, suggesting the content is potentially original or exclusive.
Source reliability
Score:
8
Notes:
The narrative originates from Which?, a reputable consumer advocacy group in the UK, lending credibility to the report. However, the article is published on Tech Digest, a site that may not be as widely recognised, which introduces some uncertainty.
Plausibility check
Score:
9
Notes:
The claims about AI tools providing inaccurate and potentially harmful advice are plausible and align with ongoing discussions about AI reliability. The report highlights specific instances of misleading advice, such as incorrect ISA allowance figures and health guidance, which are verifiable. The tone and language are consistent with typical reporting on AI issues.
Overall assessment
Verdict (FAIL, OPEN, PASS): PASS
Confidence (LOW, MEDIUM, HIGH): HIGH
Summary:
The narrative is recent and based on a press release from a reputable source, indicating high freshness and credibility. The lack of direct quotes suggests originality. The claims made are plausible and verifiable, with no significant issues identified.