The AI cost curve didn’t just bend — it snapped. Chinese AI lab MiniMax just released M2.7, a model that scores within spitting distance of Claude Opus 4.6 and GPT-5.4 on coding benchmarks, runs on modest hardware, and costs $0.30 per million input tokens. That’s roughly 17x cheaper than Opus on input and 21x cheaper on output.
But the price isn’t even the headline. The headline is how they built it: M2.7 helped build itself.
A Model That Does Its Own R&D
MiniMax didn’t just throw more data at a bigger model. They deployed an earlier version as what they call an “Agentic Researcher” — the AI handled 30-50% of its own training workflow. It analyzed its own training logs, identified failure patterns, generated synthetic data to patch weaknesses, and improved the code used to train it.
The result: a 30% performance jump without a proportional increase in human labor or compute.
This is “autoresearch” — the concept Andrej Karpathy flagged months ago as the next frontier. MiniMax is the first major lab to ship a production model that demonstrably used this approach. They’re being careful, noting M2.7 “deeply participated in its own evolution” rather than claiming full autonomy. Humans still set objectives and approved critical merges. But the trajectory is unmistakable.
The Numbers Don’t Lie
Third-party benchmarks from Artificial Analysis and independent developer testing paint a clear picture:
- SWE-Pro (coding): 56.22% — nearly matching GPT-5.4’s 57.7% and Opus 4.6’s 56.5%
- GDPval-AA (office productivity): 1,495 Elo — actually ahead of GPT-5.4 (1,480) and Opus (1,488)
- Intelligence Index: 49.6 vs. GPT-5.4’s 52.4 and Opus’s 51.2
- GPQA (PhD-level logic): 88.4% — trailing Gemini 3.1’s 94.3%, but still strong
The cost gap is absurd. M2.7 charges $0.30/$1.20 per million tokens (input/output). Claude Opus charges $5.00/$15.00. GPT-5.4 sits at $2.50/$10.00.
Independent testing by Kilo Code ran both M2.7 and Claude Opus through identical TypeScript challenges. Both found all planted bugs and security vulnerabilities. Opus produced more thorough fixes and twice as many tests — but M2.7 delivered what testers called “90% of the quality for 7% of the cost.” Total spend: $0.27 versus $3.67.
The catch? M2.7’s reliability rate sits at 69% compared to 100% for both GPT-5.4 and Opus. Frontier-capable, not frontier-consistent. For high-stakes production work, the established players still win. For high-volume agentic tasks where retries are cheap? The math just changed.
230 Billion Parameters, 10 Billion Active
M2.7 uses a Sparse Mixture-of-Experts (MoE) architecture — 230 billion total parameters, but only 10 billion active per token. Think of it like a company with 230 specialists where any given task only needs 10 at their desks.
This matters for deployment. A quantized M2.7 can run inference on a single NVIDIA A30 GPU. For teams building autonomous coding agents or always-on digital workers, that’s frontier-grade intelligence without a million-dollar server rack.
MiniMax reports 97% skill adherence across 40+ agent skills and parity with Anthropic’s Sonnet 4.6 on agentic benchmarks. The architecture was built for the world of AI agents — not chatbots.
China Closed the Gap
For years, conventional wisdom held Chinese AI labs trailed Silicon Valley by 12-18 months. That narrative just got a lot harder to maintain.
M2.7 launched on every major platform within hours — Ollama, OpenRouter, Vercel. MiniMax itself IPO’d on Hong Kong’s exchange in January 2026 and is competing for what they call the “efficiency frontier” — not the smartest model, but the best intelligence-per-dollar ratio in the world.
They’re not alone. Xiaomi’s MiMo-V2-Pro, released the same week, scored 49 on the Intelligence Index with a 1M token context window at $1/$3 pricing. Chinese labs are competing ferociously on cost-efficiency, and the implications for U.S. chip export controls are worth considering: if architectural innovation and recursive self-optimization deliver frontier performance with dramatically less compute, hardware restrictions lose their teeth.
Who Should Care
High-volume agentic workflows: Customer service, automated code review, document processing. Millions of API calls per day at 17x cost reduction with 90% quality retention? Transformative.
Local deployment: MoE architecture means viable inference on modest hardware. Teams needing on-premises AI for security or latency just got a frontier-capable option.
Development teams: At $0.30 per million input tokens, you can iterate 20x more for the same budget.
Where it falls short: Mission-critical applications needing 100% reliability. Complex reasoning where the Intelligence Index gap matters. Anything where you need the absolute best and cost is secondary.
The Question Nobody’s Answering
If M2.7 handled 30-50% of its own training workflow, what percentage will M2.8 handle? And M3.0?
We now have commercial proof that AI can meaningfully accelerate its own development. The company that proved it is selling access for less than a dollar per million tokens. Current AI governance frameworks have essentially nothing to say about recursive self-optimization.
The real disruption isn’t any single model. It’s the collapse of the price-performance curve. When frontier intelligence costs pennies per task, the bottleneck shifts from “can we afford AI?” to “can we build systems that use AI effectively?” The harness, the workflow, the integration — that’s where the value lives now.
And somewhere in Beijing, a model is already helping design its successor.