Elon Musk praises Anthropic's new AI model Opus 4.8 as company claims improved self-honesty and reasoning

Elon Musk praises Anthropic's new AI model Opus 4.8 as company claims improved self-honesty and reasoning
Elon Musk
Elon Musk has praised the latest AI model from Anthropic after the company unveiled Claude Opus 4.8, its newest flagship system. Responding to Anthropic’s announcement on X (formerly Twitter), Musk wrote, “Nice work.” Musk’s brief comment came as Anthropic highlighted improvements in reasoning, coding performance and long-running autonomous tasks in the new model. The launch has drawn attention across the AI industry as companies including Anthropic, OpenAI, Google and Musk’s own AI startup xAI continue to compete in the rapidly evolving AI market.

Anthropic highlights Claude Opus 4.8 upgrades

In the post, Anthropic said Claude Opus 4.8 offers sharper judgment, better self-awareness about its capabilities and improved ability to work independently for longer periods without losing performance. The company also said the new model will be available at the same price as the previous version. According to benchmark results shared by Anthropic, Claude Opus 4.8 achieved 69.2% on SWE-Bench Pro, a test used to measure software engineering and coding abilities.The company also reported a score of 57.9% with tools on Humanity’s Last Exam, a benchmark designed to evaluate advanced reasoning across multiple subjects.
In agentic financial analysis, Anthropic said the model scored 53.9%, outperforming earlier Claude versions and competing models in the comparison chart it shared.

Anthropic Opus 4.8 claimed to offer 4x honesty

A common issue among advanced AI models is their tendency to jump to conclusions, confidently claiming they have solved a problem even when the evidence is thin. Anthropic claims to have made a major breakthrough in fixing this with Opus 4.8.Citing early testers, Anthropic says that Opus 4.8 is significantly better at flagging uncertainties in its own work and is much less likely to make unsupported claims. Meanwhile, Anthropic's internal evaluations found that the model is about four times less likely than its predecessor, Opus 4.7, to let flaws in its written code pass by unremarked.The model is also claimed to have outperformed its competitors on several key industry benchmarks, specifically excelling in financial analysis, reasoning and agentic coding.

author
About the AuthorTOI Tech Desk

The TOI Tech Desk is a dedicated team of journalists committed to delivering the latest and most relevant news from the world of technology to readers of The Times of India. TOI Tech Desk’s news coverage spans a wide spectrum across gadget launches, gadget reviews, trends, in-depth analysis, exclusive reports and breaking stories that impact technology and the digital universe. Be it how-tos or the latest happenings in AI, cybersecurity, personal gadgets, platforms like WhatsApp, Instagram, Facebook and more; TOI Tech Desk brings the news with accuracy and authenticity.

End of Article
Follow Us On Social Media