The 2026 State of Quality Engineering Report

How agentic AI is redefining the standards for delivering high-quality software at scale

Welcome to mabl’s seventh annual research report, The 2026 State of Quality Engineering. Each year, we survey a diverse range of software professionals to better understand how testing and quality engineering are evolving across numerous industries. This year, one finding rose above all others: AI has moved from pilot programs to production reality in software development, and the growing disparity between teams can’t be ignored

For teams with strong quality engineering foundations, AI is increasingly becoming a force multiplier. It makes them faster, more collaborative, and more effective. For teams without those foundations, AI is accelerating them toward the same bottlenecks and roadblocks they’ve always had, only at a much faster rate.

There’s a lot to be optimistic about when looking at this year’s data. AI is already delivering measurable gains in speed, collaboration, and customer satisfaction for software teams that have the right quality practices and infrastructures in place. For teams without those, the path is clear: the organizations seeing real ROI from AI aren’t out of reach. They're making deliberate investments in quality infrastructure that any team can start making today.

About this survey

A total of 996 software quality professionals, software developers, and technology decision-makers across the US and UK participated in our 2026 survey, coming from organizations of all sizes and a broad range of industries.

Survey Demographics
 

The AI Amplifier: Not All Boats Rise

ph_scales-duotone

AI doesn’t manufacture quality problems or solve them; it amplifies what’s already there. That’s the defining insight from this year’s data, and it reshapes how organizations should think about every AI investment they’re making in what enables the consistent delivery of high quality software at speed.

Software teams with highly automated workflows are seeing dramatically better outcomes than those without. One glaring example being that teams with greater amounts of automation reported an average customer satisfaction score of 91%, while teams with few automated workflows reported a customer satisfaction score of 74%. These same high-maturity teams also report the clearest organizational benefits from AI. 63% say it has increased collaboration between QA and development, and 60% say it has accelerated their adoption of test automation.

First Graph

63% (1)

 

For teams still in early stages of their automation journey, the picture is more complicated. Among teams using AI, 41% say it has improved code quality and reduced QA needs — while 37% say it has produced code faster but at lower quality, increasing the burden on QA. Both experiences are real, and the difference comes down to foundation.

Teams with strong test automation and QA embedded in the development lifecycle are seeing AI accelerate outcomes they were already achieving. Teams without those things are more likely to find that AI produces code faster but at lower quality, increasing the burden on QA rather than reducing it.

In conclusion, the organizations pulling ahead aren't doing something out of reach. Start with an honest assessment of your automation maturity, how embedded QA is in your development process, and whether your infrastructure could absorb a 10x increase in test volume. The teams already doing that math are the ones this report's best numbers belong to.

The Shift-Left Hangover

ph_arrow-fat-lines-left-duotone

Shift-left testing was the right idea. For many organizations, the execution went too far.

The premise was sound: engage quality engineering earlier in the development lifecycle, catch defects before they compound, and make quality a shared concern rather than a final checkpoint. When implemented that way, with quality professionals and testing agents involved sooner, not removed from the equation, shift-left works. What the data shows is that many organizations interpreted it differently, offloading testing responsibility onto developers wholesale, without a holistic and proactive strategy for consistently delivering high-quality software. Those are two very different approaches, and the outcomes reflect it.

84% (1)

43% (1)

 

The most mature organizations have built a version of shift-left that works. One where quality strategists are not replaced by developers; they’re empowered to work alongside them. These teams have the infrastructure to make developer participation in testing meaningful, rather than burdensome or superficial.

For teams without that foundation, the consequences show up in talent, too. “Access to skilled quality talent” now ranks third among barriers to investing in quality, behind only budget and technology limitations. Teams that redistributed QA responsibilities during leaner times are discovering that rebuilding those capabilities is harder than they expected.

The Verification Gap

ph_list-checks-duotone

AI generates code faster than most teams can validate it, and the gap between generation and verification is widening. For now, that gap is largely being closed through manual review, a workload that grows alongside increased AI adoption.

Respondents reported spending approximately 20% of their working week manually verifying AI-generated tests and code. That's a full working day, every week, dedicated to checking the work of tools designed to accelerate software delivery. And the problem compounds: if AI tools were to 10x the size of an automated test suite, 54% of respondents identified some form of manual human intervention as their primary bottleneck, either verifying that AI-generated tests are actually correct (33%), or manually fixing the tests that break as a result (21%).

35% (1)

23% (1)

 

The downstream cost is visible in the data. For the second consecutive year, 35% of respondents report that most production bugs are found by customers, not by internal testing. It’s noteworthy that a full year of additional investment in quality has not moved that number. When we examine why bugs reach production, 42% of respondents point to incomplete or missing requirements, while nearly a quarter cite AI hallucinations or misinterpreted context. The tools are generating at speed. The guardrails haven’t kept up.

Top Bottleneck

 

Teams are already spending 20% of their working week manually verifying AI-generated tests and code, and 35% of bugs are still reaching production unchanged from last year. The validation workload is strained, and AI is generating faster than it is being checked.

That imbalance has a real cost, and the most visible one is customer experience. Bugs that reach production erode the trust customers place in a product, and that trust is far harder to rebuild than it is to protect.

The data points consistently in one direction: teams are already struggling to validate AI-generated code within their existing release schedules, and that pressure is only growing.

The Maintenance Wall

ph_image-broken-duotone

Test maintenance has been a significant pain point in every edition of this report. In 2026, it is getting worse, and the accelerating pace of AI-generated code means the teams still largely relying on manual processes to manage it will find themselves falling further behind.

For the second consecutive year, respondents ranked test maintenance as their most significant testing challenge, ahead of fixing defects, test analysis, and adding test coverage. This year, 41% named it their top pain point, a 6-point increase over 2024. As AI makes it faster and easier to generate new tests, the maintenance burden that comes with them grows in proportion.

Pain Points

 

When an automated test breaks, only 11% of teams have systems in place to fix it without human intervention. The majority use AI-assisted suggestions that still require human approval (35%), semi-autonomous healing that requires a manual restart (29%), or a human rewriting the test entirely (25%). For teams in higher automation maturity tiers, fully autonomous healing reaches 30%, nearly 4x the rate of less mature teams. Those teams are also the ones reporting the highest customer satisfaction scores, the fastest release cycles, and the greatest gains from AI adoption overall.

Beneath the maintenance burden lies a quieter risk: intent drift. When AI tools fix a broken test, they typically address the locator or execution path, not the underlying quality intent. A test can be “healed” back to a passing state while silently losing fidelity to what it was originally designed to verify. This is a form of technical debt that is nearly invisible until it surfaces in production. The industry does not yet have standard ways to measure it.

33% percent of respondents note that manual review of AI-generated tests is already the top barrier to scaling their test suites. At AI-generated volumes, the gap between what passes and what is actually correct becomes a critical risk.

The teams that are pulling ahead are the ones building systems to capture and enforce test intent, not relying on practitioners to manually catch drift every time a test is healed.

The Transparency Tax

ph_unite-duotone

AI adoption in testing is not being held back by resistance to change. What the data reflects is an industry still working out what responsible AI adoption in testing actually looks like, specifically how to maintain the visibility and auditability that quality and compliance standards demand when autonomous systems are making consequential decisions about what passes and what fails.

Quality and security concerns tie as the top barriers to AI adoption in testing, each cited by 32% of respondents. Lack of trust in AI follows at 29%. These aren't vague anxieties. They reflect a real operational concern: organizations need to be able to explain, audit, and stand behind the decisions their testing tools are making, especially in regulated environments or customer-facing applications.

72% (1)

32% (1)

 

The adoption gap between development and QA teams reflects this dynamic clearly. While 72% of development teams report using AI in their workflows, only 57% of QA teams say the same. When the tools accelerating code creation are outpacing the tools validating it, the result is a widening gap between how fast software is built and how confidently it can be shipped.

10% of respondents report that corporate policy currently prohibits AI adoption in their testing workflows altogether. The question for these organizations is not whether AI will eventually reach them. It is whether they will be in a position to adopt it effectively when it does, or whether they will spend the intervening time falling further behind.

Looking Forward

ph_rocket-duotone

Where investments are going, and where they need to go

The investment intent reflected in this year's data is meaningful, but what matters more than the size of the budget is where it goes. The organizations best positioned for the next wave of AI adoption are not simply spending more; they are spending deliberately, prioritizing the kind of autonomous testing infrastructure, governance tooling, and quality strategy that allows AI to be adopted at scale without sacrificing the reliability and auditability that enterprise software demands.

The risk is in the balance. The data suggests that investment continues to favor code generation over quality validation, a ratio that compounds the problems documented throughout this report. The teams pulling ahead are investing in the outer loop at the same rate they’re investing in the inner one: autonomous quality infrastructure, transparent AI-driven testing, and quality strategy that scales at the pace of development.

The QA role is evolving in real time to meet this moment. Practitioners are moving away from manual test execution and toward judgment-based work: exploratory testing, risk assessment, quality strategy. That evolution is real, and it matters. The organizations navigating it most deliberately, meaning with the right tools, the right team structures, and the right investment balance, are the ones the data consistently shows performing best on every dimension that matters: speed, stability, and customer satisfaction.

Quality is no longer a checkpoint at the end of a release cycle. It is the foundation on which everything else in the AI-driven development era is built.

 

End Card (1)