Apple Research Questions AI Reasoning Models Just Days Before WWDC

Monday June 9, 2025 5:49 am PDT by Tim Hardwick

A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI "reasoning" large-language models like OpenAI's o1 and Claude's thinking variants, revealing fundamental limitations that suggest these systems aren't truly reasoning at all.

ml research apple
For the study, rather than using standard math benchmarks that are prone to data contamination, Apple researchers designed controllable puzzle environments including Tower of Hanoi and River Crossing. This allowed a precise analysis of both the final answers and the internal reasoning traces across varying complexity levels, according to the researchers.

The results are striking, to say the least. All tested reasoning models – including o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet – experienced complete accuracy collapse beyond certain complexity thresholds, and dropped to zero success rates despite having adequate computational resources. Counterintuitively, the models actually reduce their thinking effort as problems become more complex, suggesting fundamental scaling limitations rather than resource constraints.

Perhaps most damning, even when researchers provided complete solution algorithms, the models still failed at the same complexity points. Researchers say this indicates the limitation isn't in problem-solving strategy, but in basic logical step execution.

Models also showed puzzling inconsistencies – succeeding on problems requiring 100+ moves while failing on simpler puzzles needing only 11 moves.

The research highlights three distinct performance regimes: standard models surprisingly outperform reasoning models at low complexity, reasoning models show advantages at medium complexity, and both approaches fail completely at high complexity. The researchers' analysis of reasoning traces showed inefficient "overthinking" patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.

The take-home of Apple's findings is that current "reasoning" models rely on sophisticated pattern matching rather than genuine reasoning capabilities. It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

The timing of the publication is notable, having emerged just days before WWDC 2025, where Apple is expected to limit its focus on AI in favor of new software designs and features, according to Bloomberg.

Tag: Apple Research

Popular Stories

Apple Just Released a New Accessory

Monday May 4, 2026 8:13 am PDT by Joe Rossignol

Apple today released a new Pride Edition Sport Loop for the Apple Watch. The band features a rainbow design with 11 colors of woven nylon yarns. The new Pride Edition Sport Loop is available to order now on Apple.com and in the Apple Store app in 40mm, 42mm, and 46mm sizes, and it will be available at Apple Store locations starting later this week. In the U.S., the band costs $49. There...

iOS 27 Will Add These New Features to Your iPhone

Saturday May 2, 2026 8:43 am PDT by Joe Rossignol

Apple is expected to unveil iOS 27 during its WWDC 2026 keynote on June 8, and there are already many rumored features and changes for iPhones. The first developer beta of iOS 27 will likely be available immediately following the keynote, and a public beta typically follows in July. Following beta testing, the software update should be released to all users with a compatible iPhone in...

• 93 comments

Apple Announces 2026 Pride Band Watch Face and iPhone Wallpaper Article 2

iOS 26.5 Coming Soon With These New Features

Monday May 4, 2026 8:40 am PDT by Joe Rossignol

iOS 26.5 is expected to be released next week, following more than a month of beta testing. The update is relatively minor, but there are a couple of new features and changes across the operating system that we have recapped below. iOS 26.5 lays the groundwork for end-to-end encryption for RCS in the Messages app and ads in the Apple Maps app, and it will include a new Pride wallpaper and a...

• 45 comments

Top Rated Comments

citysnaps

12 months ago

I don't find this surprising at all.

Score: 24 Votes (Like | Disagree)

zorinlynx

12 months ago

LLM GenAI is pretty garbage technology. The less time it takes people to realize this, the better.

Yes, it does have some niche uses. But people are trying to push it as a solution to everything and even as far as replacing human beings, and it's just not capable of that. Not only that, but why do we want to replace human beings? Especially in the arts? I'd rather look at things made by people. It doesn't matter how visually stunning something is; art has no soul if there is no artist.

Score: 22 Votes (Like | Disagree)

trip1ex

12 months ago

Breaking news. The people who pretended otherwise always had something to sell.

Score: 22 Votes (Like | Disagree)

turbineseaplane

12 months ago

“….and now here’s Ashley to talk about some new Genmoji!”

Score: 18 Votes (Like | Disagree)

Orange Bat

12 months ago

Of course. “AI” is just a marketing term at this point, and not any kind of actual intelligence. These AIs are really just glorified search engines that steal peoples’ hard work and regurgitate that work as if the data is it’s own. We’re just living in an “AI bubble” that will burst sooner rather than later.

Score: 16 Votes (Like | Disagree)

Salty Pirate

12 months ago

So AI is nothing more than clever programing?

Score: 15 Votes (Like | Disagree)

Read All Comments

Apple Research Questions AI Reasoning Models Just Days Before WWDC

Popular Stories

Apple Just Released a New Accessory

iOS 27 Will Add These New Features to Your iPhone

iOS 26.5 Coming Soon With These New Features

Top Rated Comments

Next Article

Videos

Guides

Upcoming

Other Stories