AI Hacking Skills Are Doubling Faster Than Anyone Expected — And the Benchmarks Just Broke

There’s a chart circulating in cybersecurity circles this week — the kind that makes seasoned professionals sit up straight and quietly close their laptop lids. The UK’s AI Security Institute just published data showing AI cyber capabilities have blown past every trend line they’ve drawn.

The models aren’t just getting better at hacking. They’re getting better at hacking faster than we can measure.

The Doubling Time That Keeps Shrinking

AISI tracks “cyber time horizons” — how long and complex a cyberattack an AI can autonomously execute, benchmarked against human experts doing the same work. They’ve been fitting exponential curves to this data since late 2024.

In November 2025, they estimated capabilities doubled every eight months. By February 2026, that shrank to 4.7 months. Already alarming.

Then Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5 showed up for evaluation. Both “substantially exceeded” the projected trend. The models now hit near-100% success rates on 12-hour expert-level attack simulations — so consistently that the benchmarks themselves are becoming obsolete.

Whether this is a new, even faster doubling time or a one-time jump remains unclear. Either answer is unsettling.

Mythos Clears Every Simulation — A First

Here’s the headline result: Claude Mythos Preview became the first AI model to complete both of AISI’s cyber ranges. These aren’t simple capture-the-flag puzzles. They’re full enterprise network attack simulations.

One range — “The Last Ones” — simulates a 32-step corporate network attack requiring roughly 20 hours of human expert time. Mythos completed the full chain in 6 out of 10 attempts. The earlier version managed just 3 out of 10.

Even more striking: Mythos also cracked “Cooling Tower,” a simulation targeting industrial control systems. No AI model had ever passed this test. Mythos did it in 3 out of 10 attempts.

A year ago, frontier models could barely handle multi-step exploits that would take a human an hour. Now they’re autonomously executing 20-hour attack chains against simulated industrial infrastructure. That’s not incremental progress. That’s a phase change.

GPT-5.5: Right Behind — And Publicly Available

While Mythos grabbed the dramatic firsts, GPT-5.5’s results might be more consequential. AISI found it performing at near-parity with Mythos on vulnerability finding across a 95-task benchmark.

The critical difference: Mythos is still in limited release. GPT-5.5 is generally available. As Bruce Schneier noted this week, these capabilities aren’t locked behind special access anymore.

GPT-5.5 completed the same 32-step corporate attack in 2 out of 10 attempts. Lower than Mythos’s 6, but a year ago no model could do this at all. Separate analysis found that a smaller, cheaper model with better scaffolding could match these results — meaning the capability is commoditizing, not just concentrating at the frontier.

XBOW’s Independent Verdict: “A Major Advance”

Offensive security firm XBOW provided crucial validation. After Anthropic invited their team of ten security experts to evaluate Mythos, the verdict was unambiguous: “This model is a major advance.”

Compared to Opus 4.6, Mythos cut false negatives in vulnerability detection by 42 percent. With source code access, that improvement hit 55 percent.

Where Mythos particularly shines is reading code. It found vulnerabilities in Chromium’s V8 sandbox — an area where every previous model produced nothing but false positives. XBOW noted a recurring theme: “Mythos Preview is impressive at writing code, but even more impressive at reading it.”

The reality check: the model still needs live system interaction to reach full potential. Source code analysis alone misses vulnerabilities that emerge from configuration, dependencies, and component interactions. AI is becoming an extraordinary code auditor, but autonomous pentesting still needs a “body” to go with the “brain.”

Microsoft’s MDASH: AI Is Already Finding Real Bugs

While AISI tests models in simulations, Microsoft proved the concept in production. This week’s Patch Tuesday included 16 CVEs discovered by MDASH — Microsoft’s Multi-Model Agentic Scanning Harness — including four critical remote code execution bugs in Windows networking and authentication.

That’s not a benchmark. Those are real vulnerabilities in production Windows code, found by AI, now being patched for billions of users.

Microsoft said AI-assisted vulnerability discovery is “expected to increase the scale of Patch Tuesday releases in the coming months.” Translation: the patch pipeline is about to get a lot busier.

What This Actually Means

For cybersecurity teams: The window between vulnerability discovery and exploitation is compressing violently. The UK’s NCSC has already published guidance urging organizations to prepare for a “vulnerability patch wave.” Patch management just went from important to existential.

For developers: Your code is about to face scrutiny at unprecedented scale. Mythos found vulnerabilities in V8 — one of the most audited codebases on Earth. If you’re shipping software assuming human code review catches everything, that assumption is dangerously outdated.

For businesses: Expect your attack surface to be mapped faster than your security team can respond manually. The cost of AI-powered offensive tools is dropping while capability rises. Budget-constrained attackers now get frontier-grade capability.

For everyone: The internet just got simultaneously more dangerous and more defensible. The same AI finding vulnerabilities for attackers finds them for defenders too. The question is whether defenders adopt fast enough.

The Uncomfortable Truth

Perhaps the most telling detail in AISI’s report: their test suite is becoming obsolete. Mythos hits near-100% success on their hardest tasks even with artificially constrained token budgets. They’re already building harder evaluations — networks that actively fight back — just to keep pace.

METR independently found a consistent 4.2-month doubling time on software engineering tasks, closely matching AISI’s cyber estimates. This isn’t a fluke in one domain. AI is getting better at understanding and manipulating complex systems across the board.

A year from now, the capabilities making headlines this week will look quaint. The models will be faster, cheaper, and more autonomous. The only real variable is whether defenses — technical, institutional, and regulatory — can keep up with what’s already here.