Apple's New Paper on LLM Reasoning: Does LLM Really Think?
A Summary of Apple's Recent LLM Paper: The Illusion of Thinking
This week, Apple published a research paper, “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, “ which landed like an earthquake in the AI community. The central question it addresses is timeless: Can AI models truly think? According to Apple, the answer is a clear no.
Even the most advanced language models—like OpenAI’s O3 or Google’s Gemini 2.5 Pro, which rely on sophisticated “Chain of Thought” (CoT) reasoning—aren’t actually thinking. They're merely recalling patterns from training data. They're mimicking understanding, not demonstrating it.
To test this, the researchers used a classic logic puzzle: A farmer needs to ferry a wolf, a goat, and a bundle of hay across a river using a boat that can only carry one item at a time. He cannot leave the wolf alone with the goat, or the goat with the hay.
Current AI models can solve this riddle. But Apple’s researchers asked: Are they solving it by thinking, or just regurgitating a memorized answer?
To find out, they designed harder versions of this puzzle, adding more animals (like dogs or chickens) and altering the rules (e.g., the dog can't be left with the goat or wolf). This forced the AI into uncharted territory, combinations it likely never saw during training. If the AI were truly capable of reasoning, it should still succeed. But it didn’t.
Instead of solving these novel puzzles through deduction, the models broke down. Apple’s experiments revealed several key issues: