MiniMax-M2.5
Core Overview
MiniMax-M2.5 is MiniMax's independently developed flagship multimodal general large model, designed for high-throughput and low-latency production environments.
It achieves industry-leading performance in coding and Agent capabilities, with the ability to natively understand, generate, and integrate multiple modalities including text, audio, images, video, and music. M2.5 aims to provide top-tier performance at extremely low costs, excelling particularly in complex task processing and professional office scenarios.
Key Features
- Industry-Leading Coding & Agent Capabilities: Achieved best-in-industry performance on the Multi-SWE-Bench benchmark, demonstrating higher decision-making maturity and more efficient token utilization.
- Efficient Multimodal Processing: Natively supports the integration of text, audio, images, video, and music, providing a truly rich multimodal interactive experience.
- Ultra-Long Context Processing: Features a substantial context window of 197K tokens, optimized through reinforcement learning for precise task decomposition.
- High Throughput, Low Latency: Optimized for production with 100 TPS and 50 TPS versions. Pricing is significantly lower (1/10 to 1/20) than comparable models.
- Enhanced Office Scenarios: Significant capability improvements in handling professional software tasks such as Word, PPT, and Excel financial modeling.
Best Use Cases
- Enterprise-Level Automated Workflows: Ideal for automation requiring fast multimodal processing and complex Agent decision-making.
- Software Development & Code Assistance: Industry-leading generation and debugging, especially in large, complex codebases.
- Multimodal Content Creation: Innovative cross-modal generation (e.g., text-to-video or image-to-music integration).
- Advanced Office Document Processing: High-efficiency information extraction and modeling for Word, Excel, and PPT.
Capabilities and Limitations
| Capability | Detailed Description |
|---|---|
| Reasoning Ability | Extremely Strong. High decision-making maturity for multi-step Agent tasks. |
| Creative Ability | Extremely Strong. Proficient in multimodal creation and office document automation. |
| Multimodal Ability | Native Multimodal. Supports text, audio, images, video, and music. |
| Response Speed | Extremely Fast. Offers 100 TPS and 50 TPS versions for high-throughput needs. |
| Context Window | 197,000 Tokens |
| Max Output | 131,000 Tokens |
Credits Usage
| Model | Input (Credits/Token) | Cache Write (Credits/Token) | Cache Read (Credits/Token) | Output (Credits/Token) | Web Search (Credits/Use) | Billing Notes |
|---|---|---|---|---|---|---|
| MiniMax M2.5 | 0.30 | 0.375 | 0.03 | 1.20 | - | - |