Anthropic Built an AI So Good at Hacking, They Won't Release It

There’s a new AI model that finds software vulnerabilities humans missed for decades — and its creators are so spooked by what it can do, they’re refusing to release it.

Anthropic just unveiled a preview of Claude Mythos, their most powerful model yet. But instead of the usual launch playbook — benchmarks, API waitlists, developer hype — the announcement came wrapped in a warning. Mythos is so capable at finding and exploiting bugs that Anthropic formed an emergency coalition of tech giants to deploy it defensively before the bad guys build something similar.

AI doesn’t just write code anymore. It breaks it.

What Makes Mythos Different

Claude Mythos sits above Opus, Anthropic’s previous top-tier model. It wasn’t specifically trained for cybersecurity — it’s a general-purpose frontier model that happens to be terrifyingly good at finding holes in software.

Originally codenamed “Capybara” internally, Mythos leaked last month when a draft blog post turned up in an unsecured public data lake. (The irony of a security-focused AI company having a data leak writes itself.) In that leaked document, Anthropic described Mythos as having capabilities that “far exceeded” their current models in coding, reasoning, and — critically — cybersecurity.

The kicker: Anthropic isn’t selling it. No API. No open-source release. Instead, they’re deploying it through a controlled initiative called Project Glasswing.

Project Glasswing: The Avengers of Cybersecurity

The coalition reads like a tech industry all-star roster. Twelve founding partners including Amazon, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Another 40+ organizations maintaining critical software infrastructure also got access.

The mission: use Mythos to scan proprietary and open-source systems for vulnerabilities before attackers find them. Anthropic is backing the effort with up to $100 million in usage credits and $4 million in direct donations to open-source security organizations.

“This work is too important and too urgent to do alone,” said Cisco’s chief security officer Anthony Grieco. The numbers back him up.

Thousands of Zero-Days, Some Hiding for 27 Years

In just weeks of testing, Mythos identified thousands of zero-day vulnerabilities — security flaws that were previously unknown and unpatched. Many are classified as high-severity. Some have been hiding in code for one to two decades. The oldest dates back 27 years.

These are bugs that human researchers, automated testing suites, and existing AI tools all missed. One highlighted example: Mythos found a flaw in video software that had been tested more than 5 million times by its creators. Five million tests, and the AI caught what none of them did.

“AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities,” Anthropic wrote. “The fallout — for economies, public safety, and national security — could be severe.”

These aren’t theoretical risks. These are bugs in every major operating system and web browser. Software billions of people use daily.

The Uncomfortable Truth About Dual-Use AI

Here’s the thing nobody wants to say out loud: the same capability that makes Mythos brilliant at finding bugs makes it equally brilliant at exploiting them. A model that identifies a zero-day can also write the exploit code. Defense and offense are the same skill — only intent separates them.

This is why Anthropic chose controlled release. As Anthropic Labs’ Mike Krieger explained at the HumanX conference: “We have a new model that we’re explicitly not releasing to the public.” Instead, they’re “arming cybersecurity specialists ahead of time.”

CrowdStrike CTO Elia Zaitsev was blunter: “The window between a vulnerability being discovered and being exploited by an adversary has collapsed — what once took months now happens in minutes with AI.”

And Anthropic isn’t alone. OpenAI is reportedly finalizing a similar model for release through its existing “Trusted Access for Cyber” program. The capability is proliferating. It won’t stay contained to responsible actors forever.

The Security Company That Can’t Stop Leaking

Anthropic has had a rough stretch. The Mythos leak itself was caused by human error — a draft left in a publicly accessible data lake. Then they accidentally exposed nearly 2,000 source code files through a botched Claude Code update. Their cleanup attempt accidentally took down thousands of GitHub repositories.

The company building the most powerful cybersecurity AI in the world keeps leaking its own stuff. Not a great look. But it paradoxically reinforces their core message: software security is hard. If Anthropic — with all their resources and expertise — can’t prevent leaks, what hope does the average enterprise have without AI-powered defenses?

What This Actually Means

There’s a geopolitical dimension too. Anthropic and the Trump administration are locked in a legal battle after the Pentagon labeled the AI lab a supply-chain risk — reportedly in retaliation for Anthropic’s refusal to allow autonomous targeting or surveillance of U.S. citizens.

For the rest of us, the practical implications are significant:

Patch cycles accelerate. As AI discovers vulnerabilities faster, software companies face pressure to fix them faster. Expect more frequent updates and less tolerance for unpatched systems.

Open-source software gets a lifeline. That $100M in credits and $4M in donations targets projects that power the internet but run on volunteer effort — like OpenSSL and the Linux kernel. They’ll get security audits they could never afford.

The talent gap narrows. AI won’t replace human security researchers, but it dramatically multiplies their effectiveness. One researcher with Mythos-level tooling does what previously required a team.

The offense-defense balance shifts. Anthropic is giving defenders a head start. But as these capabilities spread, the advantage goes to whoever deploys first — good actor or bad.

AI’s First Real “Too Dangerous to Release” Moment

We’ve heard this rhetoric before. OpenAI said it about GPT-2 in 2019, and that decision aged poorly — the model turned out to be fairly tame. But Mythos feels different. This isn’t about generating text. It’s about a model that finds exploitable vulnerabilities in software hardened for decades, at a speed no human team can match.

Anthropic’s approach — controlled release, industry partnership, proactive disclosure — is arguably the most responsible thing an AI company has done with a genuinely dangerous capability. They’re putting $100M behind a concrete plan to use it for good before the capability inevitably proliferates.

But the clock is ticking. As AI labs race to build more capable models and nation-states invest billions in offensive cyber, this feels less like a one-time event and more like the opening chapter of a new era.

The defenders got a head start. Whether they can keep it is another question entirely.