Autonomous Testing for Code Productivity

There’s a quiet revolution happening in software development.

Developers today don’t just write code, they manage distributed systems, navigate layers of compliance, and battle legacy complexity. And they’re doing it all while trying to move faster and build better. The usual playbook of test automation, CI pipelines, and static analysis isn’t keeping up.

At Code Remix Summit, several sessions showed where things are headed: toward autonomous testing, codebase simplification, and smarter observability with continuous profiling.

This isn’t about hypothetical tooling. It’s about real teams using AI to generate tests, profile live systems, clean up legacy code, and reduce the daily drag developers face. If you care about developer productivity, it’s time to look at what’s actually working.

AI-generated tests without the rework

Interactive assistants like Copilot can increase output, but if that output isn’t trustworthy, it just shifts the burden downstream. Developers often end up having to rewrite AI-generated code, which is why rework has become a key metric for engineering velocity.

Autonomous testing flips the equation. In his session, Andy Piper, VP of Engineering at Diffblue, made the case for using agentic AI to generate self-verifying unit tests with no prompts, no review cycles, no manual intervention.

What is Autonomous Testing?

Autonomous testing is a step beyond traditional test automation. Instead of relying on developers to write and maintain tests, AI agents are trained to understand code, generate useful tests, and verify their correctness automatically. This helps teams keep pace with constant code changes while reducing manual effort and improving test reliability.

These agents are trained via reinforcement learning to:

Understand code behavior
Generate meaningful, maintainable tests
Self-verify correctness
Operate continuously across entire codebases

This AI-powered test creation doesn’t just assist, it generates, verifies, and adapts tests for enterprise codebases on its own. As Piper points out, “Because it’s trusted, because it’s operating autonomously, it can operate at scale.”

This results in fewer regressions, fewer rewrites, and fewer hours lost to writing boilerplate and fixing bugs. That’s why Moderne partnered with Diffblue to develop an OpenRewrite recipe that delivers their autonomous testing capabilities to enterprises who can run it at scale on the Moderne Platform. Piper explores this integration in his session:

From observability to understanding

Autonomous testing increases code confidence, but that doesn’t help the developer that gets paged at 3:00 AM and has to figure out what’s going wrong in production. That’s where continuous profiling comes in.

Spotify’s Mohammed Aboullaite opened his talk with this exact scenario. While logs, metrics, and traces (the traditional “three pillars” of observability) can often tell you what happened, they are rarely able to explain why it went wrong.

Continuous profiling fills that gap. Unlike traditional profilers that are often too heavy to run in production, modern profilers operate with low overhead, collecting live detailed performance data like CPU usage, memory allocation, and thread states right from production systems.

The hope with continuous profiling and monitoring in general is to give us enough knowledge to react, or even better, more proactively avoid or minimize the outages and the issues. –Mohammed Aboullaite

By working directly against real production data, developers can identify performance bottlenecks faster, with fewer guess-and-check cycles in staging. It shortens the feedback loop and lifts the cognitive burden for developers on call.

In his session, Aboullaite outlines a basic continuous profiling architecture, from collecting to storing to analyzing to presenting data, as well as shares techniques, challenges, and how to manage data and cost:

Effective observability isn’t about more dashboards. It’s about correlating the right signals so teams can act with confidence.

Code cleanup at scale as a productivity strategy

Productivity isn’t just about building new things. Sometimes, it’s about simplifying unnecessary complexity. What’s the easiest way to simplify? Every developer’s favorite activity: deleting unused code!

At his session on codebase management, Pratik Patel, Java Champion and VP of Developer Relations at Azul Systems, called out the reality: engineers spend much of their time maintaining code they didn’t write or don’t need.

Referencing keynote speaker Dov Katz, Patel framed this as “capacity recovery”: the idea that teams can reclaim engineering time without changing team size or process.

You end up spending so much time trying to just maintain this code, but if you can get some time back to work on other stuff, or maybe just take a vacation occasionally, that’s great. –Pratik Patel

Patel described how Azul Code Inventory integrates with OpenRewrite to automate the removal of unused code, enabling enterprise codebase management using the Moderne Platform. Together, these tools can:

Detect and identify dead or redundant code
Mark it for deprecation
Safely remove it from production

Patel recommends running this process in production, just as Aboullaite advocated for monitoring tools. There is similarly low overhead, and this provides the most accurate information about what code is actually running in production to derive what code is not being used.

This isn’t theoretical. Companies like Netflix have used this codebase management method to uncover 20-30% of code in production that isn’t being used. Some enterprise teams have found 50% or more—even in actively maintained services.

The result isn’t just a cleaner codebase. With AI-powered codebase management, engineering teams can reduce maintenance work, reclaim focus, and speed up onboarding.

AI without trust is just noise

All of these tools, from autonomous testing and smarter profiling to code cleanup, only work if teams can trust the results.

In his session on preparing engineering teams for the genAI revolution, Ben Lloyd Pearson of LinearB explained how AI’s “high-accuracy but low-precision” output can introduce risk and friction if not managed carefully.

Your engineering team has universal truths for software quality, but introducing a high-accuracy and low-precision actor into your organization creates a lot of risk, primarily because humans are actually very poor at identifying errors for precision. –Ben Lloyd Pearson

Pearson shared research regarding the use of AI in the software development lifecycle (SDLC), with key development phases seeing the largest adoption to date.

He then highlighted key risks and bottlenecks on the path to AI-enabled engineering, including data silos and knowledge gaps, lack of feedback loops, and human-to-agent friction. He explained how companies like Meta and Google handle this not by removing humans, but knowing when to keep them in the loop. Especially in high-stakes environments, validation isn’t optional. It’s part of the workflow.

To make AI work at enterprise scale, engineering orgs need:

Well-organized, centralized knowledge to avoid context gaps
Better orchestration between agents and workflows
Strong feedback loops to continuously improve AI outputs
Governance and auditability to ensure trust

As these disciplines evolve and AI workflows mature, the future will bring orchestrated, inter-agent workflows that are well integrated across SDLC phases. Pearson offers a simple ‘AI Journey’ map to help you understand your own usage and chart your course.

A new model for developer productivity

The sessions at Code Remix Summit pointed to a clear direction. The future of developer productivity isn’t about working faster, it’s about reducing unnecessary work. Developer time is too valuable to waste on rechecking AI code, digging through logs to understand an outage, or tiptoeing around ancient code that no one remembers writing.

AI isn’t the solution to every problem, but when applied thoughtfully, it can automate the work developers shouldn’t have to do and leave more space for the work only they can.

Autonomous testing. Continuous profiling. Intelligent, AI-powered code cleanup. These are building blocks for a more sustainable developer experience that reduces toil, surfaces meaningful insights, and enables developers working in large enterprise codebases to spend time where it counts.

The path forward isn’t just technical. It’s cultural. But for those ready to embrace it, the payoff is real: faster feedback, higher trust, and more room for developers to do what they do best—create.

To learn how Moderne is a key accelerator for developers by automating the most tedious work of code maintenance and migration, schedule a demo.

Links to videos:

Architect's Guide to Managing a Code Base, Pratik Patel
Are Your Engineering Teams Really Prepared for the GenAI Revolution?!, Ben Lloyd Pearson
Continuous Profiling, the missing piece in your observability puzzle!, Mohammed Aboullaite‍
Reducing Developer Toil: Autonomous AI for Enterprise Testing at Scale, Andy Piper

Autonomous testing and the future of developer productivity

Key Takeaways