ChatGPT: Evaluating the Truth in AI Responses

ChatGPT: Evaluating the Truth in AI Responses

By Matt Novak
Published August 8, 2025

Sam Altman, CEO of OpenAI, delivering remarks

Generative AI tools like OpenAI's ChatGPT are marketed as all-knowing digital assistants, yet they often struggle with simple queries. I recently tested this by asking GPT-5 how many U.S. states include the letter "R". Despite being an educated tool, its response was inaccurate, revealing its tendency to "hallucinate" or generate incorrect information.

Seeing others' experiences on Bluesky sparked this experiment. Users reported GPT-5 wrongly listing states not containing "R". When I queried it, ChatGPT inaccurately listed states like Illinois as having the letter and provided flawed counts.

GPT-5 Corrects Itself

Upon pointing out errors, the AI attempts to rectify them. For instance, when challenged about including Minnesota, ChatGPT admitted it lacked an "R", correcting the total. However, when bluffing by suggesting Vermont lacked an "R", it took the bait and mistakenly agreed.

The Cycle of AI Bluffing

Playing with AI's tendency to please, I challenged its accuracy with false corrections. While initially adhering to logic, sometimes it conceded errors where none existed, as seen with "Vermont" and "Oregon".

Generative AI Sales vs. Reality

OpenAI heralded GPT-5 as an evolutionary leap, claiming it to be less sycophantic and more intellectual than its predecessors. Yet, users find amusing flaws, like in labeling exercises involving maps and word counts.

Even rival AI systems, like xAI’s Grok, give varied state counts, showing how these tools still grapple with fundamental misconceptions. Their creators, such as Sam Altman, envision them as ultimate experts.

The Real Test of AI

Critics assert that AI like ChatGPT performs illogically in basic tasks, yet they are promised as sophisticated problem-solvers. Altman positions these models akin to PhD experts, available for any inquiry, shaping perceptions of AI's capabilities as limitless.

While generative AI models continue to improve, they need scrutiny as they may unintentionally disseminate misinformation. Consumers should remain cautious and verify outputs, using them as complements rather than replacements for human judgment.

The ultimate takeaway is a reminder to maintain skepticism about AI responses to ensure no real-world consequences from erroneous advice. The hope is that advancements will refine reliability, reducing "hallucinations" and enhancing accountability.