Skip to content
ArceApps Logo ArceApps
ES

Reasoning Models (o1, R1): Why Prompt Engineering is Dying

3 min read
Reasoning Models (o1, R1): Why Prompt Engineering is Dying

🧠 System 1 vs. System 2

Psychologist Daniel Kahneman described human thinking in two systems:

  • System 1: Fast, instinctive, and emotional (e.g. recognizing a face, completing a sentence).
  • System 2: Slow, deliberative, and logical (e.g. solving an integral, designing a software architecture).

Until late 2024, LLMs like GPT-4o or Claude 3.5 were purely System 1. They were extremely advanced statistical prediction machines, but prone to hallucinations in logical tasks because they “triggered” the first word that seemed correct.

With the arrival of OpenAI o1 and DeepSeek R1, AI has gained a System 2.

⛓️ Native Chain of Thought (CoT)

Previously, to get a good logical answer, we used Prompt Engineering tricks like “Let’s think step by step”. This forced the model to generate intermediate text to “guide” itself.

New reasoning models do this natively and hidden (or visible in the case of R1). Before writing the first letter of the answer, the model generates thousands of “thought tokens”.

What happens during that wait time?

  1. Decomposition: Breaks the problem into sub-tasks.
  2. Hypothesis Generation: “I could use BFS for this graph… no, wait, weights are negative, better Bellman-Ford”.
  3. Verification: “If I use this variable here, I’ll get a NullPointerException. Fix”.
  4. Final Answer: Only when sure, it emits the solution.

💀 The End of Complex Prompt Engineering

This radically changes how we interact with AI.

Before (GPT-4):

“Act as a senior engineer. Write a Python script. Make sure to handle errors. Think step by step. Check that variables have descriptive names…”

Now (o1/R1):

“Write a Python script to migrate this DB.”

Having reasoning capability, the model knows it must handle errors and use good names. You don’t need to micromanage it. In fact, overly complex prompts sometimes worsen the performance of reasoning models because they interfere with their own thought process.

⚖️ When to use what?

Don’t use a jackhammer to hang a picture.

TaskRecommended ModelWhy
Generate Text / EmailsGPT-4o / Claude 3.5 SonnetFast, creative, human tone.
Code AutocompleteQwen 2.5 Coder / CopilotUltra-low latency.
Software Architectureo1 / DeepSeek R1Capable of seeing the “big picture” and avoiding logic errors.
Complex Debuggingo1 / DeepSeek R1Can trace program state step by step.
Math / Physicso1 / DeepSeek R1Unbeatable.

🚀 The Future

We are witnessing the transition from models that “talk” to models that “think”. Latency will increase (thinking takes time), but reliability will skyrocket. By 2025, measuring AI by how fast it types will be absurd; we will measure it by the quality of its decisions.


📚 Bibliography and References

For the writing of this article, the following official and current sources were consulted:

  • OpenAI Research: Learning to Reason with LLMs - OpenAI Blog
  • DeepSeek AI: DeepSeek-R1 Technical Report - GitHub PDF
  • Prompt Engineering Guide: Reasoning Models & Chain of Thought - PromptingGuide.ai

You might also be interested in

Reasoning Models: From o1 to R1
AI January 31, 2025

Reasoning Models: From o1 to R1

The evolution of reasoning in AI. How OpenAI's o1 and DeepSeek's R1 compare. Chain-of-Thought prompting and the future of coding agents.

Read more
Effective Context for AI: Prompt Engineering
AI October 25, 2025

Effective Context for AI: Prompt Engineering

How to craft prompts that work. From simple instructions to complex multi-step reasoning. Optimizing context windows.

Read more
DeepSeek R1: The Coding Review
AI January 31, 2025

DeepSeek R1: The Coding Review

A developer's take on DeepSeek R1 for coding tasks. From impressive reasoning to common hallucinations. Is it ready for production code?

Read more