MiniMax M3 launches with 1M-token context for long-video AI reasoning
NVIDIA announced the availability of MiniMax M3, a 428B parameter multimodal AI model, on its accelerated infrastructure. This model supports long-context reasoning and agentic workflows, enabling applications such as long video understanding and extended coding sessions for enterprise AI adoption. Developers can deploy MiniMax M3 using NVIDIA TensorRT LLM, SGLang, or vLLM, and customize it with the NVIDIA NeMo Framework.
Key Takeaways
- MiniMax Sparse Attention (MSA) replaces quadratic attention with a pre-filtering stage, delivering 15x faster decoding than previous implementations.
- The 428B parameter Mixture-of-Experts (MoE) model natively processes text, images, and video, supporting up to 1M tokens of context.
- Deployment is supported on NVIDIA Blackwell via TensorRT-LLM, SGLang, and vLLM frameworks, with a specific MXFP8 quantized version for reduced memory overhead.
- Integrated 12-hour agentic tests showed the model can autonomously execute experimental workflows, producing 18 code commits and 23 charts without human intervention.
- MiniMax M3 uses mixed-modality training from 'step 0' on 100 trillion interleaved tokens rather than aligning modalities post-training.
Why It Matters
The release of M3 marks a shift toward unified multimodal architectures that treat video as a native input rather than an appended modality. By achieving 1/20th the per-token compute of previous generations at 1M-token context, MiniMax and NVIDIA are lowering the hardware barriers for long-form video analysis and autonomous agent teams. This moves the industry closer to 'always-on' AI agents capable of managing complex, multi-day media workflows. Watch for adoption rates of the 'Thinking' versus 'Non-thinking' API modes as developers balance reasoning depth against real-time latency requirements.
Additional Context
Following its June 2026 release, MiniMax M3 enters a highly competitive landscape where 1M-token context has become the baseline for enterprise-grade multimodal models. Per Overchat AI (June 2026), M3 competes directly with Alibaba’s Qwen 3.7 Max, though MiniMax maintains a distinct edge in natural prose and native image/video processing. This launch follows MiniMax’s significant momentum in late 2025, during which the Shanghai-based company reached a $4 billion valuation and successfully debuted on the Hong Kong Stock Exchange in January 2026, raising approximately $618 million (HK$4.82 billion) per reporting from ElectroIQ (March 2026). Financial data from PitchBook and Substack (January 2026) reveals that MiniMax grew its revenue sevenfold between 2024 and 2025, reaching roughly $79 million. However, the company continues to invest heavily in R&D, with spending estimated to reach 160% of sales in 2026 to stay ahead of the technical curve. This aggressive reinvestment is aimed at challenging established Western flagships like Claude Opus 4.7 and GPT-5.5. Technically, the M3’s performance is tightly coupled with NVIDIA’s Blackwell (B200) transition. Per 6g-ai (April 2026), Blackwell systems offer up to 15x higher inference performance for trillion-parameter models compared to the previous Hopper generation. By optimizing the MSA architecture specifically for Blackwell's Transformer Engine and FP4/FP8 precision formats, MiniMax is targeting the high-load scenarios—such as full-repository code understanding and multi-hour video surveillance analysis—that were previously cost-prohibitive on older DGX H100 clusters.
Read full article at developer.nvidia.com
