DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning

Spread the love

Large language models (LLM) have made significant progress in natural language processing, excelling in tasks such as comprehension, generation, and reasoning. However, challenges remain. Achieving robust reasoning often requires extensive supervised fine-tuning, which limits scalability and generalization. Furthermore, issues such as poor readability and balancing computational efficiency with logic complexity persist, prompting researchers to seek new approaches.

Table of Contents

DeepSeq-R1: A New Approach to LLM Reasoning

Recent work of DeepSeek-AI introduces DeepSeek-R1A model designed to enhance reasoning abilities through reinforcement learning (RL). This effort resulted in two models:

deepseek-r1-zeroWhich is fully trained with RL and exhibits emergent reasoning behavior such as long chain-of-thought (COT) reasoning.
DeepSeek-R1Which builds on its predecessor by incorporating a multi-stage training pipeline, addressing challenges such as readability and language mixing while maintaining high reasoning performance.

These models aim to overcome existing limitations by combining innovative RL techniques with structured training processes to achieve scalability and applicability.

Technological Innovation and Benefits

1. Reinforcement Learning on Logic Tasks: DeepSeek-R1-Zero employs RL without relying on supervised data. Using Group Relative Policy Optimization (GRPO), it optimizes the logic by evaluating multiple outputs, significantly improving benchmark performance. For example, its AIME 2024 pass@1 score increased from 15.6% to 71.0% during training.

2. Multi-Stage Training in DeepSeek-R1: DeepSeek-R1 includes cold-start data – thousands of curated COT examples – to fine-tune your base model before reasoning-centric RL. This process incorporates language consistency rewards to ensure that the output is both consistent and user-friendly.

3. Distillation for Small Models: To overcome computational constraints, DeepSeq-AI distilled six smaller models (1.5B to 70B parameters) from DeepSeq-R1 using QUEN and LAMA architectures. These models maintain strong reasoning capabilities, with the 14B distilled model achieving a Pass@1 score of 69.7% on AIME 2024, outperforming some larger models.

Results: Performance Insights

The performance of DeepSeek-R1 is supported by the benchmark results:

Argumentation Criteria:
- AIME 2024: 79.8% pass@1, beating OpenAI’s O1-Mini.
- MATH-500: 97.3% pass@1, equivalent to OpenAI-o1-1217.
- GPQA Diamond: 71.5% pass@1, excellent in fact-based reasoning.
Coding and STEM Tasks:
- Codeforces Elo Rating: 2029, outperforming 96.3% human participants.
- SWE-Bench Verified: 49.2% resolution rate, competitive with other leading models.
General Capabilities:
- Strong generalization was demonstrated on the ArenaHard and AlpacaEval 2.0 benchmarks, achieving 92.3% and 87.6% win rates, respectively.

Distilled Model Highlights: Small models such as DeepSeq-R1-Distill-QWEN-32B showed strong performance on AIME 2024 with a pass@1 score of 72.6%, demonstrating impressive scalability and practicality.

Conclusion: Refining reasoning in AI

DeepSeek-AI’s DeepSeek-R1 and DeepSeek-R1-Zero represent meaningful advances in reasoning capabilities for LLM. By leveraging RL, cold-start data, and distillation techniques, these models address important limitations while promoting accessibility through open-source availability under the MIT License. The API (‘model=deepseek-reasoner’) further increases usability for developers and researchers.

Looking ahead, DeepSeek-AI plans to refine multilingual support, enhance software engineering capabilities, and improve rapid sensitivity. These efforts aim to establish DeepSeq-R1 as a robust solution for reasoning-centric AI applications. By integrating thoughtful training paradigms, DeepSeq-R1 demonstrates how AI can move toward solving increasingly complex challenges.

check out Pepper, DeepSeek R1 and DeepSeek R1 Zero. All credit for this research goes to the researchers of this project. Also don’t forget to follow us Twitter and join us telegram channel And linkedin groupDon’t forget to join us 65k+ ml subreddit,

[Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings, and LoRa ^(Promoted)

Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

📄 Meet ‘Elevation’: The Only Autonomous Project Management Tool (Sponsored)

Source link