SoTA Feed — Every open-weights release from the labs that matter

Ad: Read SoTA Feed without this slot — ad-free site plus a personal ad-free feed URL $3/month

MiniMax-M2.1

Dec 20, 2025 · MiniMax · license: other · view on Hugging Face ↗
230 GB · MoE: 229B total, ≈11B (≈11.1 GB) active


Join Our 💬 WeChat | 🧩 Discord community.
MiniMax Agent | ⚡️ API | MCP | MiniMax Website
🤗 Hugging Face | 🐙 GitHub | 🤖️ ModelScope | 📄 License: Modified-MIT

Meet MiniMax-M2.1

Today, we are handing MiniMax-M2.1 over to the open-source community. This release is more than just a parameter update; it is a significant step toward democratizing top-tier agentic capabilities.

M2.1 was built to shatter the stereotype that high-performance agents must remain behind closed doors. We have optimized the model specifically for robustness in coding, tool use, instruction following, and long-horizon planning. From automating multilingual software development to executing complex, multi-step office workflows, MiniMax-M2.1 empowers developers to build the next generation of autonomous applications—all while being fully transparent, controllable, and accessible.

We believe true intelligence should be within reach. M2.1 is our commitment to the future, and a powerful new tool in your hands.

How to Use

Benchmarks

MiniMax-M2.1 delivers a significant leap over M2 on core software engineering leaderboards. It shines particularly bright in multilingual scenarios, where it outperforms Claude Sonnet 4.5 and closely approaches Claude Opus 4.5.

BenchmarkMiniMax-M2.1MiniMax-M2Claude Sonnet 4.5Claude Opus 4.5Gemini 3 ProGPT-5.2 (thinking)DeepSeek V3.2
SWE-bench Verified74.069.477.280.978.080.073.1
Multi-SWE-bench49.436.244.350.042.7x37.4
SWE-bench Multilingual72.556.56877.565.072.070.2
Terminal-bench 2.047.930.050.057.854.254.046.4

We also evaluated MiniMax-M2.1 on SWE-bench Verified across a variety of coding agent frameworks. The results highlight the model's exceptional framework generalization and robust stability.

Furthermore, across specific benchmarks—including test case generation, code performance optimization, code review, and instruction following—MiniMax-M2.1 demonstrates comprehensive improvements over M2. In these specialized domains, it consistently matches or exceeds the performance of Claude Sonnet 4.5.

BenchmarkMiniMax-M2.1MiniMax-M2Claude Sonnet 4.5Claude Opus 4.5Gemini 3 ProGPT-5.2 (thinking)DeepSeek V3.2
SWE-bench Verified (Droid)71.368.172.375.2xx67.0
SWE-bench Verified (mini-swe-agent)67.061.070.674.471.874.260.0
SWT-bench69.332.869.580.279.780.762.0
SWE-Perf3.11.43.04.76.53.60.9
SWE-Review8.93.410.516.2xx6.4
OctoCodingbench26.113.322.836.222.9x26.0

To evaluate the model's full-stack capability to architect complete, functional applications "from zero to one," we established a novel benchmark: VIBE (Visual & Interactive Benchmark for Execution in Application Development). This suite encompasses five core subsets: Web, Simulation, Android, iOS, and Backend. Distinguishing itself from traditional benchmarks, VIBE leverages an innovative Agent-as-a-Verifier (AaaV) paradigm to automatically assess the interactive logic and visual aesthetics of generated applications within a real runtime environment.

MiniMax-M2.1 delivers outstanding performance on the VIBE aggregate benchmark, achieving an average score of 88.6—demonstrating robust full-stack development capabilities. It excels particularly in the VIBE-Web (91.5) and VIBE-Android (89.7) subsets.

BenchmarkMiniMax-M2.1MiniMax-M2Claude Sonnet 4.5Claude Opus 4.5Gemini 3 Pro
VIBE (Average)88.667.585.290.782.4
VIBE-Web91.580.487.389.189.5
VIBE-Simulation87.177.079.184.089.2
VIBE-Android89.769.287.592.278.7
VIBE-iOS88.039.581.290.075.8
VIBE-Backend86.767.890.898.078.7

MiniMax-M2.1 also demonstrates steady improvements over M2 in both long-horizon tool use and comprehensive intelligence metrics.

BenchmarkMiniMax-M2.1MiniMax-M2Claude Sonnet 4.5Claude Opus 4.5Gemini 3 ProGPT-5.2 (thinking)DeepSeek V3.2
Toolathlon43.516.738.943.536.441.735.2
BrowseComp47.444.019.637.037.865.851.4
BrowseComp (context management)62.056.926.157.859.270.067.6
AIME2583.078.088.091.096.098.092.0
MMLU-Pro88.082.088.090.090.087.086.0
GPQA-D83.078.083.087.091.090.084.0
HLE w/o tools22.212.517.328.437.231.422.2
LCB81.083.071.087.092.089.086.0
SciCode41.036.045.050.056.052.039.0
IFBench70.072.057.058.070.075.061.0
AA-LCR62.061.066.074.071.073.065.0
𝜏²-Bench Telecom87.087.078.090.087.085.091.0

Evaluation Methodology Notes:

Local Deployment Guide

Download the model from HuggingFace repository: https://huggingface.co/MiniMaxAI/MiniMax-M2.1

We recommend using the following inference frameworks (listed alphabetically) to serve the model:

SGLang

We recommend using SGLang to serve MiniMax-M2.1. Please refer to our SGLang Deployment Guide.

vLLM

We recommend using vLLM to serve MiniMax-M2.1. Please refer to our vLLM Deployment Guide.

Transformers

We recommend using Transformers to serve MiniMax-M2.1. Please refer to our Transformers Deployment Guide.

KTransformers

We recommend using KTransformers to serve MiniMax-M2.1. Please refer to KTransformers Deployment Guide

Other Inference Engines

Inference Parameters

We recommend using the following parameters for best performance: temperature=1.0, top_p = 0.95, top_k = 40. Default system prompt:

You are a helpful assistant. Your name is MiniMax-M2.1 and is built by MiniMax.

Tool Calling Guide

Please refer to our Tool Calling Guide.

Contact Us

Contact us at model@minimax.io.

← all releases