DeepSeek-R1 never ever relaxes...

So, I was testing DeepSeek-R1 with a math problem I found in a textbook for 9-year-olds (yes, really), and the model managed to crack it.

The problem was:

"Find two 3-digit palindromic numbers that add up to a 4-digit palindromic number. Note: the first digit of any of these numbers can't be 0."

R1 starts thinking...

Now, here’s where it gets interesting. R1 thought for a bit, found the correct answer in its <think></think> block, then went ahead to output it—but made a mistake.

R1 makes a mistake...

Before even finishing its response, it caught its own error, backtracked, and corrected itself on the fly outside of the<think></think> block.

R1 corrects itself...

R1's final answer.

DeepSeek-R1 complete answer.

Regarding the problem, no other LLM solved it, except for OpenAI o1.

So now I’m wondering—what's holding them back? Is it the tokenizer's weaknesses? The sampling parameters (even when all where at the recommended settings they failed)? Or maybe, just maybe, non-thinking LLMs are really that bad at math?

Would love to hear thoughts on this.

Unsuccessful attemps by other models: