Black Forest Labs releases FLUX.2 [klein]A compact image model family that targets interactive visual intelligence on consumer hardware. flux.2 [klein] SUB expands the FLUX.2 line with a second generation and editing, a unified architecture for text-to-image and image-to-image, and deployment options ranging from local GPU to cloud API while maintaining state-of-the-art image quality.
from flux.2 [dev] For Interactive Visual Intelligence
flux.2 [dev] is a 32 billion parameter rectified flow transformer for text conditioned image generation and editing, consisting of structures with multiple reference images, and runs primarily on data center class accelerators. It is designed for maximum quality and flexibility with long sampling schedules and high VRAM requirements.
flux.2 [klein] Takes the same design direction and compresses it into smaller rectified flux transformers with 4 billion and 9 billion parameters. These models are distilled to very short sampling schedules, support the same text for image and multi-context editing tasks, and are optimized for less than 1 second response time on modern GPUs.
Ideal Family and Capabilities
flux.2 [klein] The family consists of 4 main open weight variants using the same architecture.
- flux.2 [klein] 4b
- flux.2 [klein] 9b
- flux.2 [klein] 4b base
- flux.2 [klein] 9b base
flux.2 [klein] 4B and 9B are step distilled and guidance distilled models. They use 4 estimation steps and are positioned as the fastest option for production and interactive workloads. flux.2 [klein] 9b combines a 9b flow model with the 8b Qwen3 text embedder and is described as the dominant small model on the Pareto frontier for quality versus latency in text to image, single context editing and multi context generation.
Base variants are undistilled versions with longer sampling schedules. The documentation lists them as basic models that preserve the entire training signal and provide high output diversity. They are intended for fine tuning, LoRA training, research pipelines, and custom post training workflows where control is more important than minimum latency.
all flux.2 [klein] The models support three main functions in a single architecture. They can generate images from text, they can edit a single input image, and they can perform multi context generation and editing where multiple input images and a prompt jointly define the target output.
Latency, VRAM, and scaled variants
flux.2 [klein] The model page provides estimated end-to-end estimation times on the GB200 and RTX 5090. Flux.2 [klein] 4B is the fastest version and is listed at about 0.3 to 1.2 seconds per image depending on the hardware. flux.2 [klein] 9B Aim for about 0.5 to 2 seconds at high quality. Base models require several seconds as they run with a 50 step sampling schedule, but they expose more flexibility for custom pipelines.
flux.2 [klein] The 4B model card states that 4B fits approximately 13GB of VRAM and is suitable for GPUs like the RTX 3090 and RTX 4070. Flux.2 [klein] The 9b card is reported to require around 29GB of VRAM and targets hardware like the RTX 4090. This means that a single high-end consumer card can host distilled variants with full resolution sampling.
To expand access to more devices, Black Forest Labs also releases FP8 and NVFP4 versions for all FLUX.2. [klein] Variants, developed in collaboration with NVIDIA. FP8 quantization is said to be 1.6 times faster with 40 percent less VRAM usage, and nVFP4 is said to be 2.7 times faster with 55 percent less VRAM usage on RTX GPUs, while keeping core capabilities the same.
Benchmark against other image models
Black Forest Labs evaluates FLUX.2 [klein] Elo style comparison from text to image, through single context editing and multi context functions. Performance charts show FLX.2 [klein] On Elo score vs latency and Elo score vs Pareto frontier of VRAM. The comment states that FLX.2 [klein] Matches or exceeds the quality of Quen-based image models at a fraction of the latency and VRAM, and it outperforms ZImage while supporting integrated text-to-image and multi-context editing in a single architecture.

Base variants trade some speed for full customization and fine tuning, which aligns with their role as foundation checkpoints for new research and domain specific pipelines.
key takeaways
- flux.2 [klein] A compact rectified flow transformer family with 4B and 9B variants that supports text to image, single image editing and multi reference generation in a unified architecture.
- Distilled flux.2 [klein] The 4b and 9b models use 4 sampling stages and are optimized for sub second inference on a modern GPU, while the undistilled base models use longer schedules and are intended for fine tuning and research.
- Quantized FP8 and NVFP4 variants, built with NVIDIA, deliver up to 1.6x speedup with approximately 40 percent VRAM reduction for FP8 and up to 2.7x speedup with approximately 55 percent VRAM reduction for NVFP4 on RTX GPUs.
check it out Technical Details, Repo And model weight. Also, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Michael Sutter is a data science professional and holds a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michael excels in transforming complex datasets into actionable insights.