Alibaba’s Quven team has released the FP8-Quantic posts for its new Qwen3-NEXT-80B-A3B model in two post-training variants-Instruction And ThinkingAt high-ingredient estimates with ultra-long reference and MOE efficiency. The FP8 reposes Repos BF16, but the package “Fine-Grand FP8” weight (block size 128) and signs for SGLANG and VLLM build the notes at night. The cards live in the benchmark original BF16 model; FP8 is provided “for convenience and performance”, not as a separate assessment run.
What’s in A3B Stack
Qwen3-NEXT-80B-A3B is a hybrid architecture, mixed with a gate detailate (a linear/convention-style meditation surrogate), mixed with a gauron meditation, which is interlerated with an ultra-sparrous mixture (MOE). The 80B total parameter activates ~ 3B per consequences per token through budget 512 experts (10 rooted + 1 shared). The layout is specified as 48 layers organized in 12 blocks: 3×(Gated DeltaNet → MoE) After 1×(Gated Attention → MoE)The original reference is 262,144 tokens, which is valid to ~ 1,010,000 tokens using rope scaling (yarn). The hidden size is 2048; Meditation uses 16 Q Heads and 2 KV Heads Head on Dim 256; Deltnett used 32V and 16 cake linear heads on Head Dim 128.
The QWEN team reported the 80B-A3B base model Qwen3-32B on the downstream functions at ~ 10% of their training cost and distributed ~ 10 × infection beyond the reference to ~ 10 × Infererance Throopoot-MOE and Multi-token prediction (MTP) are inspired by low activation. The instruction version is non-purpose (not <think> Tags), while the thinking variant default by default to marks of logic and adapt to complex problems.
FP8 release: what exactly changed
The FP8 model card suggests that the volume is “fine FP8”, with a block size 128. The peripinogen varies slightly from BF16: both sglang and VLLM require current main/night construction, for example 256K references and commands provided for alternative MTP. Thinking FP8 Card also recommends a logic flag (eg, --reasoning-parser deepseek-r1 In sglang, deepseek_r1 In VLLM). These releases maintain Apache-2.0.0 licensing.
Benchmark (reported on BF16 Weight)
Instructions FP8 Card Qwen3-NEXT-80B-A3B-ISSTRUCT to QWEN3 -235B-A2222B-Intruct-25507 with many knowledge/logic/coding benchmark with Qwen3-NEXT-80B-A3B-INSTRUCT to QWEN3-ANEXT-A3B-INSTRUCT to Qwen3-ANEXT-A3B-INSTRUCT Risk represents, and long -term executive (up to 256k). Thinking FP8 cards AIME’25, HMMT’25, MMLU-PRO/Redux, and Livecodebench V6, where Qwen3-Next-80B-A3B-A3B-A3B-ATINKIN first has kept the Qwen3 thinking (30b A3B-2507, 32B) on Mudin-2.5-Flash.

Post -training and training signs
The series is trained on ~ 15T tokens before training after the series. QWEN highlights stability additions (zero-centered, weight-fatal layer criteria, etc.) and uses GSPO in RL post-training for the think model to handle hybrid attention + high-sparitic MOE combination. MTP is used to estimate and improve pretrening signals.
Why does FP8 matters?
On modern accelerator, FP8 activation/weight memory bandwidth pressure and resident footprint vs. BF16, leading to long sequences on large batch size or identical delay. Because the A3B root only ~ 3B parameters per token, FP8 + Moe Sparcity Compounds have longer benefits in references, especially when combined with speculative decoding through MTP, as exposed to serving flags. He said, the permissionization interacts with routing and meditation variants; The real-world acceptance rate for speculative decoding and end-task accuracy may vary with engine and kernel implementation-Guidance of QWEN to use the present SGLANG/VLLM and tune the speculative settings.
Summary
Qwen’s FP8 release 80B/3B-active A3B stacks to serve in terms of 256K on mainstream engines, preserving hybrid-moa design and MTP path for high throwputs. Model cards keep benchmarks from BF16, so teams should validate FP8 accuracy and delay on their own piles, especially with arguing and speculative settings. Pure results: Lower memory bandwidth and better association without architectural regression, deployed for long reference production workload.
Check it Qwen3-Next-80B-A3B model in two post-training variants-Instruction And ThinkingFeel free to check us Github page for tutorials, codes and notebooksAlso, feel free to follow us Twitter And don’t forget to join us 100k+ mL subredit More membership Our newspaper,
Asif razzaq is CEO of Marktechpost Media Inc .. As a visionary entrepreneur and engineer, ASIF is committed to using the ability of artificial intelligence for social good. His most recent effort is the launch of an Artificial Intelligence Media Platform, Marktekpost, which stands for his intensive coverage of machine learning and deep learning news, technically sound and easily understand by a comprehensive audience. The stage claims more than 2 million monthly ideas, reflecting its popularity among the audience.
[Recommended Read] Nvidia AI Open-SUS Wipe (Video Pose Engine): A powerful and versatile 3D video anony tool for spatial AI