Agentic Coding Can't Close the Mythos Remediation Gap

Contents

Remediation is the constraint now
Why an agent alone can't close it
The deterministic substrate
Red team, blue team
The code you own and the code you don't
The work ahead
Further reading

Key takeaways

Mythos shifted the discovery regime, not the remediation regime. Mythos found nearly 300 new vulnerabilities in Firefox in a single sweep, against roughly 20 from prior Claude models. Discovery just step-changed. The work that closes a finding has not.
Most enterprise exposure rides in on inherited code. Eighty to ninety percent of a typical enterprise application is OSS and third-party dependencies. A single CVE in a single library becomes a live vulnerability in every application that ships that library, which is why one disclosure multiplies across the estate the moment it lands.
Probabilistic fixes can’t close vulnerabilities. Forty-five percent of AI-generated code introduces a vulnerability, with a seventy-two percent failure rate for Java (Veracode 2025). Closing security findings with the same probabilistic process that introduces them is an unstable loop, and re-running fresh inference across two thousand repositories means paying to rediscover the same fix two thousand times.
The blue team is the coding agent plus the Moderne Platform. Mythos finds. A coding agent equipped with Moderne writes the fix as a deterministic recipe against the Lossless Semantic Tree. The Moderne Platform runs that recipe across every repository in the code estate, in parallel, with full audit trail.

Anthropic’s Mythos announcement was read across the industry as a discovery milestone, and it is one. In a single sweep it found close to 300 vulnerabilities in Firefox, where earlier Claude models found around 20. Generalized across the ecosystem, Anthropic describes vulnerabilities in the tens of thousands, most of them still undisclosed because nobody’s patched them yet. The ability to find weaknesses in software just took a step change.

That was April, and now already in June, Anthropic has made a Mythos-class model generally available as Fable 5 and moved its Project Glasswing partners onto an upgraded Mythos version. The Mythos milestone is now a standing condition, meaning discovery at this level is broadly available and it isn’t going back in the box.

The way I think about it is that Mythos (and now Fable) is the agentic red team. It’s a model that probes the estate and surfaces weaknesses at a scale no human research effort could match. The natural answer to an agentic red team is an agentic blue team, and the obvious candidate is the coding agent your developers already use: point it at the findings and let it close them. That instinct points in the right direction, yet it underestimates what standing up a blue team takes. Finding a weakness and closing it everywhere are different jobs, held to different standards, and an agent on its own struggles with that second part in particular.

I’ve spent most of the last several weeks talking with CISOs and platform engineering leaders, and the consequence they keep coming back to is what happens after a finding lands. Discovery has been getting faster for a decade, while the work that closes a finding hasn’t. Mythos widens that distance faster than anything before it. That distance is the gap a blue team has to cross, and it’s where the real work begins.

Remediation is the constraint now

The remediation gap predates Mythos by years. Veracode’s 2025 State of Software Security report puts the average time to fix an application vulnerability at 252 days, up 47% since 2020. Edgescan’s 2025 report puts the share of enterprise vulnerabilities still unpatched after a year at 45.4%. FIRST expects more than 59,000 CVEs in 2026, the first year to cross 50,000. None of those numbers describe a discovery shortfall. The industry has invested heavily in finding things and it’s succeeded, and SAST, DAST, SCA, and ASPM all do their job.

What never scaled is everything that happens after a finding reaches a queue: the change to the code, the test cycle, the review, the deployment, the verification, and the record of what changed. A scanner can surface a thousand findings in an afternoon, but closing them is a project that runs for the rest of the year. Now Mythos has removed any remaining room to ignore this imbalance.

Why an agent alone can’t close it

Start with the obvious move: point a coding agent at the scanner output and let it generate fixes. Coding agents can certainly write a fix, but the trouble is that a probabilistic fix can’t give you the guarantee a vulnerability demands.

There are two reasons. The first is that AI-written code is itself a source of risk. Veracode’s 2025 GenAI Code Security Report found that 45% of AI-generated code introduced a security vulnerability, with the rate for Java reaching 72% across more than a hundred models. Closing vulnerabilities with the same probabilistic process that introduces them is an unstable loop.

The second reason matters more, and it’s the asymmetry between attack and defense. An attacker succeeds by finding a single instance of a vulnerability. A defender has to close every instance, everywhere it exists, or the business is still exposed. Those are very different jobs, and the distance between them is where remediation gets hard.

It helps to picture what one CVE actually looks like inside an enterprise. Usually it’s a single vulnerable open source library, and that library is pulled in by dozens or hundreds of applications across the portfolio. The one CVE becomes a live vulnerability in every application that depends on it, and every one of those occurrences has to close before the business is secure. This is the part that should worry people. Somewhere between 80 and 90 percent of a typical enterprise application is code the business didn’t write and doesn’t control (OSS and third-party dependencies), so most of the exposure rides in on dependencies you inherited, and a single disclosure multiplies across the estate the moment it lands.

That’s the standard a blue team is held to, and it’s why an agent working on its own falls short. The agent reports success, and you spend the evening re-prompting it and re-running a scanner to check whether the fix actually landed, hoping it finds every instance behind a wrapper, a helper, or in one of the hundreds of other repositories that pull the same dependency. The scanner answers with a pile of findings you now have to validate by hand, most of them false positives. Even after you’ve cleared them, you have no reliable way to know it reached every instance, and every instance it missed stays open to the same attack. When your CISO asks what the exposure is after the next disclosure, without a record of what changed, where, and why, the honest answer is that you don’t know.

There’s an economic version of the same problem as well. Closing one CVE across two thousand repositories by running fresh inference against each one means paying to rediscover the same fix two thousand times. That’s a waste at any token price, and it scales the wrong way as the estate grows.

The deterministic substrate

In an IDE, renaming or extract-method refactoring transforms the code the same way every time. You trust code change inside the IDE because it’s deterministic, but it stops at the edge of the editor. OpenRewrite, the open source project we maintain, takes that determinism outside the IDE and makes it programmatic, so the same change runs identically across an entire portfolio. A recipe is a small, verifiable program that produces the same change every time it runs, against any codebase. That sameness is exactly what closing every instance of a vulnerability needs, and exactly what agent-generated patches can’t promise.

Moderne’s recipes run against the Lossless Semantic Tree, a type-attributed, semantically complete representation of your code. That matters because the vulnerabilities worth worrying about rarely sit in one obvious place. They cross method boundaries, follow object fields, hide behind wrapper subclasses, and pass through helpers that hide where the value came from. Because the LST carries full type and field information, a recipe can follow the value through all of those shapes instead of catching only the easy case.

In April I published a walkthrough of a recipe for CVE-2026-22732, a 9.1-severity Spring Security issue. It took only about an afternoon to write and it recognizes five distinct patterns. Three of them were handled by surface-level pattern matching, but the other two are indirect, where the sink is reached through a helper or parked in a field. The recipe catches all five, runs the same way against any repository in a portfolio, and produces the same shape of fix each time.

This is the substrate the agent needs. With Moderne’s tools in hand, a coding agent both authors recipes and runs them: when the recipe a finding requires already exists, the agent applies it, and when it doesn’t, the agent writes it. The work can run through an agent over MCP, a platform team, or a single engineer at a terminal, and the result is the same: identical, verifiable, and auditable. This is deliberately agent-agnostic too. Copilot, Cursor, Windsurf, Devin, Codex, and Claude Code all reach the same tools the same way, and so do the open source agents teams are starting to adopt.

That gets you a fix you can trust in one codebase. Making it true across all of them is the other part of the blue team’s responsibility.

Red team, blue team

Put the pieces together and the architecture is easy to describe. Mythos finds a vulnerability. A coding agent equipped with Moderne writes the recipe that closes it. The Moderne Platform defends the perimeter, running that recipe across every repository in your code estate, in parallel, and keeping the record of what changed and where. That last step is the part of the blue team most teams underestimate, and it’s the part that meets the standard raised at the start. A fix that’s correct in one repository but never run against the other two thousand still leaves the perimeter open. Defending it means the same fix reaches every repository the weakness touches, and the audit trail left behind is what lets a CISO answer the exposure question with something better than a guess.

So the blue team has two parts that depend on each other. The agent writes the fix as a recipe, and the platform makes it true everywhere. The red team and the blue team turn out to be two parts of one problem that fit together by design.

I don’t think the industry has settled on vocabulary for this yet. The default framing is still “scanners” against “fixers”. A more accurate one is a discovery engine and a remediation engine, both running at AI-scale, both required. The remediation engine is the deterministic substrate and the platform that runs recipes across the estate, with a coding agent driving it.

Moderne has enabled us to adopt secure development practices that scale with our architecture.

— Jason Simpson, VP of Engineering, Choice Hotels

The code you own and the code you don’t

Code-level remediation falls into two groups. The first is third-party code which are libraries and frameworks you depend on and also where most of that inherited exposure lives. When a fixed version exists, a recipe upgrades to it across every repository that pulls the vulnerable package, with the same upgrade applied the same way everywhere. When a fixed version isn’t a realistic option because the framework is end of life and the upgrade is a multi-quarter migration you don’t have time for, Moderne can backport the fix to the version you’re running and distribute that.

The second group is first-party code. That’s the code your teams write. Here, recipes close the OWASP Top 10 classes wherever they appear and trace the data-flow and control-flow vulnerabilities through the wrappers, fields, and methods that hide them. They can also handle the migration off of the cryptography that post-quantum standards are about to retire.

The work ahead

The next year will be defined by the distance between what gets discovered and what gets closed. The organizations that come through it well are the ones that build a deterministic remediation pipeline before the next disclosure reaches them, not after.

That’s what we’re building. The agent tooling lets a coding agent author and run deterministic recipes against the LST, and the platform carries those recipes across the estate and defends it. Underneath sits a growing recipe library, with backpatching covering the frameworks an upgrade can’t reach in time. None of it removes the need for judgment, and none of it claims the whole of security. It moves the bottleneck off the one part of the job that never scaled.

If your team is working through the same problem, I’d be glad to compare notes.

Remediation is the constraint now

Why an agent alone can’t close it

The deterministic substrate

Red team, blue team

The code you own and the code you don’t

The work ahead

Further reading

Code generation was the easy part

AI coding agents don't need bigger models. They need better tools.

Kotlin Recipes for OpenRewrite