Baidu AI Research Team has just released Eernie-4.5-21B-A3B-ThinkingA new logic-focused large language model has been designed around efficiency, long reference logic and tool integration. Being part of the Erny -4.5 family, this model is one 21b Mixing with total parameters-Specialist (Moe) architecture but only 3B active parameters per tokenMaintaining it computably efficient while maintaining competitive logic capacity. Issued under Apache -2.0 LicenseIt is accessible to both research and commercial deployment Throat face,
What is the architectural design of Eernie-4.5-21B-A3B-Thinking?
Eernie-4.5-21B-A3B-Thinking is built on one Spinal cordInstead of activating all 21b parameters, the router selects one of the experts, resulting in 3B active parameter per tokenThis structure reduces calculation without compromising the expertise of various experts. Research team applies Router orthogonalization loss And Token-balanced loss To encourage diverse experts activation and stable training.
This design offers a middle ground between small dense models and ultra-big systems. Research team’s beliefs include a theory that ~ 3B active parameters can represent a practical per token Sweet version vs perineogenic efficiency for performance,
How does the model handle references for a long time?
ERNIE-4.5-21B-A3B-Thinking is a defined ability 128K reference lengthThis allows the model to process very long documents, expand multi-step regional and integrate structured data sources such as academic paper or multi-phile codebase.
Research team receives through it Rotary Status Progressive Scaling of Embeding (rope)-The frequency base during training is increasing the base from 10K to 500K. Additional adaptation, including Flashmask meditation And Memory-skilled scheduling, make these long reference operations possible.
Does the training strategy support its argument?
Model Eernie-4.5 follows the multi-step recipe defined in the family:
- Stage I-Text-Cowlie The core creates the spinal cord of the language, which begins with an 8K reference and expands up to 128k.
- Stage II – Vision Training This lesson is left for the version.
- Stage III – Joint Multimodal Training It is not used here, because the A3B-Thinking is purely text.
Focuses on training after training Argument workResearch team provides employment Supervised fine-tuning (SFT) After mathematics, logic, coding and science, after Progressive reinforcement teaching (prl)Strengthening stages begin with logic, then expanded to mathematics and programming, and finally for broad logic functions. It has been extended by Integrated preference adaptation (UPO)Which integrates learning with PPOs to stabilize alignment and reduce reward hacking.
What role does the tool use play in this model?
ERNIE-4.5-21B-A3B-Thinking Support Structured Equipment and Function CallingIt is useful for landscapes where external calculations or recovery are required. Developers can integrate it VLLM, Transformer 4.54+And FastdaleThis tool-use capacity is particularly favorable Program synthesis, symbolic logic and multi-agent workflows,
The built -in function allows calling models to argue on long references, while dynamically inviting external APIs, there is a significant requirement for the argument applied in the enterprise system.
How does Eernie-4.5-21B-A3B-Thinking perform on the argument benchmark?
It shows improvement in strong performance Logical logic, mathematics, scientific QA and programming functionsIn the evaluation, the model displays:
- Increase in accuracy in Multi-step argument datasetWhere longer chains of thought are required.
- Competition with large dense models Stem logic function,
- steady Text generation and academic synthesis displayAdvantage of extended reference training.
These results show that MOE structure increases logic expertiseMake it efficient without the requirement of trillion-scale dense parameters.

How does it compare other logic-focused LLM?
This release joins the landscape that includes OPENAI’s O3, Anthropic Cloud 4, Deepsek -R1, and Qwen -3Many of these contestants rely on dense architecture or large active parameters count. Baidu research team choice Compact MOE with 3B active parameters Provides a different balance:
- Scalability: Spars reduce the calculation of overhead when scaling activation experts.
- Long reference readiness: 128K reference is directly trained, not retrofitted.
- Commercial openness: Apache-2.0 reduces adoption friction for license enterprises.
Summary
Ernie-4.5-21B-A3B-Thinking states how A deep argument can be achieved without counting of dense parameters on a large scaleBy combining efficient MOE routing, 128K reference training, and equipment integration, Baidu’s research team offers a model that balances research-grade logic with deployment viability.
Check it Model to hug And paper, Feel free to check us Github page for tutorials, codes and notebooksAlso, feel free to follow us Twitter And don’t forget to join us 100k+ mL subredit More membership Our newspaper,
Asif razzaq is CEO of Marktechpost Media Inc .. As a visionary entrepreneur and engineer, ASIF is committed to using the ability of artificial intelligence for social good. His most recent effort is the launch of an Artificial Intelligence Media Platform, Marktekpost, which stands for his intensive coverage of machine learning and deep learning news, technically sound and easily understand by a comprehensive audience. The stage claims more than 2 million monthly ideas, reflecting its popularity among the audience.