SwiReasoning: Entropy-Driven Alternation of Latent and Explicit Chain-of-Thought for Reasoning LLMs

Spread the love

SwerveReasoning is a decoding-time framework that lets a reasoning LLM decide when to think in the dark and when to do it write a clear sequence of ideasto use Block-wise confidence estimated from entropy trends in the next-token distributionmethod is training-free, model-agnosticand goals Pareto-optimal Accuracy/efficiency trade-off on mathematics and STEM benchmarks. Show reported results +1.5%–2.8% Improve average accuracy with unlimited tokens and +56%-79% Average token-efficiency gains under limited budgets; At AIME’24/’25, it reaches maximum logic accuracy First From standard COT.

Table of Contents

What changes in swerving at the time of estimation?

controller monitors decoder next-token entropy to make one blockwise confidence Signal. When confidence is low (entropy increases), it enters implicit logic– the model continues to reason without emitting tokenWhen confidence recovers (entropy is trending downwards), it returns to clear logicEmitting COT tokens to consolidate and commit a single path. A switch count control Limits the maximum number of thinking-block infection To suppress excessive thinking before finalizing the answer. This dynamic choice is the main mechanism behind the reported accuracy-per-token gains.

Results: Accuracy and efficiency on standard suites

It reports improvements on math and STEM reasoning tasks:

Pass@1 (Unlimited Budget): accuracy increases to +2.8% (mathematics) and +2.0% (STEM) In Figure 1 and Table 1, with a +2.17% Average over baseline (Sample with COT, COT Greedy and Soft Thinking).

Symbolic Efficiency (Limited Budget): average improvement up to +79% (Figure 2). A comprehensive comparison shows that Swirzening has achieved Highest token efficiency in 13/15 evaluation, with a +84% Average improvement on COT in those settings (Figure 4).
Pass@K Mobility: with Qwen3-8B AIME 2024/2025, Maximum logic accuracy +50% is achieved before Averaged compared to COT (Figure 5), indicating faster convergence to the ceiling with fewer sample trajectories.

Why does switching help?,

clear cot Is different and readable, but prematurely locks into one path, which can lead to useful options being missed. implicit logic Per step is continuous and information-intensive, but fully latent strategies can spread the probability mass and hinder convergence. Swirzening adds one confidence-guided choice: Latent steps extend the exploration when the model is uncertain; clear phase exploit Increasing confidence in solidifying a solution and Give tokens only when profitable, switch count control Regulates the process by limiting oscillations and limiting long-term “silent” wandering – addressing both Loss of accuracy due to diffusion And symbolic waste from overthinking Cited as challenges for training-free latent methods.

Positioning vs Baseline

The project is compared to COT with samples, cot greedyAnd soft thinkingreporting a +2.17% Average accuracy increase over unlimited budget (Table 1) and consistent efficiency-per-token Profit under budget constraints. conceived pareto frontier shift outward – either High accuracy in the same budget Or Same accuracy with fewer tokens-across Various model families and scalesAt AIME’24/’25, pass@k Curves show that swerving performance reaches limits with fewer samples Improvement is being reflected compared to COT convergence behavior Instead of just a better kutcha roof.

key takeaways

Training-Free Controller: Swirling choices between latent logic and explicit thought-chains using block-wise confidence from next-token entropy trends.
efficiency gains: reports +56-79% Improved average token-efficiency under limited budget vs. COT, with larger benefits when budgets are tighter.

Accuracy Lift: get +1.5-2.8% Average pass@1 improvement on math/STEM benchmarks on unlimited budget.
Fast convergence: AIME reaches maximum logic accuracy, at 2024/2025 First Compared to COT (better pass@k dynamics).

Swirling is a useful step toward practical “reasoning policy” control at decode time: it is training-free, slots behind the tokenizer, and highlights measurable benefits on math/STEM suites by toggling between latent and explicit COT using an entropy-trend confidence signal with a capped switch count. Open-source BSD implementation and explicit flags (--max_switch_count, --alpha) Simplify replication and reduce the bottleneck in stacking with orthogonal efficiency layers (e.g., quantization, speculative decoding, KV-cache tricks). The value proposition of the method is “per token accuracy” rather than raw SOTA accuracy, which is operationally important for budget estimation and batching.

check it out paper And project pageFeel free to check us out GitHub page for tutorials, code, and notebooksAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.

Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

🙌 Follow MarketTechPost: Add us as a favorite source on Google.

Source link