Together AI has released Deepsway, a state-of-the-art, fully open-source software engineering agent trained through fully reinforcement learning (RL). Built on top of the Qwen3-32B language model, Deepswe gained 59% accuracy on the Svezrench-satisfied benchmark and 42.2% pass@1, topping the leaderboard between open-weight models. This launch simultaneously represents a significant change for AI, towards creating a autonomous language agent from traditional preteering pipelines that continuously learn and improve through the real -world response.
Reinforcement learns from code generation
The result of after training after the Qwen3-32B Foundation model using Deepswe RLLM is sewn to the outline language agents of the agentica’s modular reinforcement. Unlike traditional supervised fine-tuning approaches, RLLM enables agents to adapt to real-world workflow through experience. Deepswe has been trained to solve complex software engineering functions using a response-driven loop instead of a particularly stable dataset.
The training pipeline includes the R2EGYM dataset of the agentica-a software engineering benchmark RL-style agent designed for development. Framework focuses on training language models with action-oriented purposes, such as fixing bugs, completing functions, and editing code, predicting only next-token distribution instead. This more closely aligns Deepswe on how human engineers recur and learn from the results.
Display benchmark and capacity
On SWEBENCH-verified, the most rigorous benchmark for software engineering agents, Deepswe scored 59% with Test-Time Scaling. This makes the previous open-weight model much better. Pass@1 in evaluation – which measures the possibility that the agent resolves a problem correctly on the first attempt – Deepas reaches an impressive 42.2%.
These results underline the power of RL-based training in increasing agentic behavior, especially the domain requires recurring arguments and accurate outputs, such as code synthesis. The architecture of the model inherited from Qwen3-32B enables it effectively while being suitable for real-world applications.
Open source and copy of its origin
One of the standout features of this release is its complete transparency. Simultaneously AI and Agentica have open-sourced not only the DEPSWE model, but also the entire training recipe, including RLLM Framework, R2 EGM Dataset and Training Configuration script. This promotes copy of the reproducible ability and invites extensive research and developer communities to expand or construct on the lampsway without restrictions.
Developers can use Deepas and RLLM through the following:
From the reasons of language to language agents
Deepswe marks a philosophical and practical change: from the construction model that is the reason for the language that for construction agents who learn through interaction. Traditional LLM has shown strong arguments abilities, but often lacks the ability to be adapted or improve with reaction. Strengthening learning enables these models not only to perform well at the time of launch, but to be better over time, prepares for new problems distribution and domains.
This approach also opens the door for local deployment. Because Deepswe is completely open-source and modular, it can be extended and retreated for cases of organization-specific use. Developers and researchers can create their own agents on top of Deepswe using RLLM to serve various domains such as web navigation, robotics or autonomous research assistance.
conclusion
Deepswe is a milestone in the development of generic AI for software engineering. By learning reinforcement in large language models such as Qwen3-32B, AI is enabling a future simultaneously, not only showing and deployed, but continuously trained and improved. From the understanding of the language to the action-oriented agency, this jump is an important implication in programming, automation and intelligent system design.
All credit for this research goes to the researchers of this project. Also, feel free to follow us Twitter And don’t forget to join us 100k+ mL subredit More membership Our newspaper,
Asif razzaq is CEO of Marktechpost Media Inc .. As a visionary entrepreneur and engineer, ASIF is committed to using the ability of artificial intelligence for social good. His most recent effort is the launch of an Artificial Intelligence Media Platform, Marktekpost, which stands for his intensive coverage of machine learning and deep learning news, technically sound and easily understand by a comprehensive audience. The stage claims more than 2 million monthly ideas, reflecting its popularity among the audience.