What Your Scanner Report Is Actually Telling You.

Somewhere in your organization right now, there is a report with a number on it. It might say 47 issues. It might say 312. It was generated by an automated accessibility scanner, and the team that received it is treating it as a verdict. They are fixing the items on the list, checking them off, and preparing to report that the work is done.

The problem is not what the scanner found. The problem is what it didn't.

The problem is not what the scanner found. The problem is what it didn't.

Understanding the gap between scanner output and a genuine accessibility evaluation is not a technical nuance. It is the difference between a defensible compliance posture and the kind of confidence that collapses the moment it meets a demand letter, a procurement questionnaire, or an actual user with a disability trying to complete a task on your site.

What Scanners Can and Cannot See

By the Numbers

WCAG 2.1 AA success criteria: 50 total (30 Level A + 20 Level AA)
Criteria testable by automation with high accuracy: approximately 13%
Criteria that cannot be detected by scanners at all: approximately 42%
Share of total issues found by automated tools (Deque study): 57% by volume

Automated accessibility tools work by parsing the DOM of a web page and checking it against a library of predefined rules. They can measure color contrast ratios mathematically, detect missing alt attributes on images, flag empty links, identify duplicate IDs, and verify that a document language is declared. These are binary, structural checks: the attribute is present or it is not. The ratio meets the threshold or it does not.

Research from Deque Systems, the company behind the widely used axe-core testing engine, found that automated scanning can test approximately 30% of WCAG success criteria: 15 out of the 50 criteria in the WCAG 2.1 AA standard.

A separate analysis from Accessible.org found that only about 13% of criteria can be flagged with high accuracy, meaning minimal false positives. Another roughly 45% of criteria fall into a gray zone where scanners can offer partial signals but cannot make a reliable determination. And approximately 42% of WCAG criteria cannot be detected by automated tools at all because they require human judgment about context, intent, or user experience.

A scanner can tell you that an image has an alt attribute. It cannot tell you whether the alt text is accurate, useful, or meaningful in context. The FTC's 2025 enforcement action against overlay vendor accessiBe documented exactly this kind of failure: one example cited an image of filet mignon on a ceramic plate that the AI-powered tool had labeled as "brown bread on white ceramic plate."

The alt text existed. It was wrong. A scanner would have marked it as passing.

The Volume Problem

Here is where the numbers get interesting and where organizations often misunderstand their own risk posture. Deque's large-scale study, covering over 2,000 audits across 13,000 pages and nearly 300,000 individual issues, found that automated testing identified 57% of total accessibility issues by volume.

That number is meaningfully higher than the widely cited 20 to 30% figure. The reason is distribution. Not all WCAG criteria produce issues at equal rates. A small number of issue types account for the vast majority of real-world failures. Deque found that the top five issue categories accounted for over 78% of all issues discovered, and color contrast alone represented roughly 30% of total findings. Since contrast is one of the criteria that automation handles well, the sheer volume of contrast errors inflates the percentage of issues that scanners can catch.

This is important to understand because it means two things simultaneously. First, scanners are genuinely useful. They catch high-frequency issues efficiently and at scale. Dismissing them entirely would be a mistake. Second, the issues that scanners miss are disproportionately the ones that matter most to actual users with disabilities.

Scanners excel at catching the issues that occur most frequently. They are largely blind to the issues that block access most completely.

A missing alt attribute appears hundreds of times across a large site and is trivially detectable. A keyboard trap in a modal dialog appears once and renders an entire workflow unusable. A screen reader announcing form fields in an illogical order creates confusion that no automated rule can measure. An error message that provides no guidance to someone who cannot see the visual layout of the page is invisible to every scanner on the market.

What Requires Human Judgment

The 42% of WCAG criteria that automation cannot touch require something no algorithm currently provides: the ability to evaluate whether content is understandable, whether navigation is consistent, whether error recovery is helpful, and whether the overall experience is usable for someone relying on assistive technology.

These include criteria like whether captions accurately represent audio content (not just whether they exist), whether audio descriptions convey the visual information a sighted user would receive, whether a site provides multiple ways to locate content, and whether error messages offer useful correction guidance.

These are judgment calls. They require understanding context, recognizing intent, and evaluating quality.

They also include keyboard operability in complex interactive patterns: custom dropdown menus, date pickers, modal dialogs, carousels, tab interfaces, and drag-and-drop interactions. A scanner can verify that a button element is focusable. It cannot verify that pressing Enter activates it, that Escape closes it, that arrow keys navigate within it, or that focus returns to the correct element when the interaction is complete. These are the interactions where real users get trapped, lost, or locked out entirely.

The Overlay Lesson

If there is a single case study that illustrates the danger of treating automated output as a compliance strategy, it is the FTC's action against accessiBe. In April 2025, the Federal Trade Commission finalized a $1 million penalty against the overlay vendor for falsely claiming that its AI-powered plug-in could make any website WCAG-compliant within 48 hours.

The FTC's complaint documented that the tool failed to make basic components like menus, headings, tables, and images accessible to people using assistive technology. The agency barred the company from claiming that any automated product can make a website WCAG-compliant or ensure continued compliance unless it has evidence to support those claims. The order remains in effect for twenty years.

The accessiBe case is not an outlier in the broader pattern.

According to data from UsableNet, 119 defendants were sued in May 2025 alone while using third-party accessibility widgets. The presence of an overlay did not prevent litigation. In many cases, it may have invited it, because the overlay created a visible indicator that the organization was aware of accessibility as an issue while the underlying experience remained inaccessible.

What Your Scanner Report Does Not Tell You

A scanner report is a signal layer, not a conclusion

Scanner output is the first pass, not the final word. It is useful for identifying the most common structural issues at scale: missing alt attributes, contrast failures, empty links, missing form labels, incorrect heading hierarchy, and absent document language declarations. The WebAIM Million report found that 96% of all detectable errors on the top one million homepages fall into just six categories. If your scanner report addresses those six categories, you have handled the high-frequency surface issues. You have not evaluated accessibility.

A low error count is not the same as low exposure

A scanner report showing zero errors does not mean zero accessibility issues. It means zero issues were found among the subset of criteria the tool can test. Google Lighthouse, which runs on the axe-core engine, can report a perfect score of 100. That score reflects the results of the tests it ran. It says nothing about the 42% of criteria it cannot evaluate at all. A clean scan is a necessary starting point. It is not a conclusion.

A clean scan is a necessary starting point. It is not a conclusion.

The gap between automated coverage and actual coverage is where risk lives

The organizations with genuinely defensible accessibility postures use scanners for what they do well: fast, repeatable detection of structural issues across large codebases. But the distance between what automation covers and what full conformance requires is significant. Deque's own research shows that combining automated testing with guided manual evaluation raises total coverage to approximately 80% of issues. The remaining 20% requires assistive technology testing and real-user feedback. The question is not whether your scanner is good. It is whether your interpretation of its output accounts for what it was never designed to find.

The highest-consequence issues are not always the most numerous

Scanner reports often sort findings by severity as defined by the tool's internal logic. That severity rating may not reflect the actual impact on a real user. A missing document language declaration is a WCAG failure. A keyboard trap in your checkout flow is a lawsuit. The number of issues in your report matters far less than which issues block access, which create legal exposure, and which affect the critical paths your users depend on.

The Real Question

The question your scanner report cannot answer is the one that matters: can a person with a disability use your site to accomplish the task they came to accomplish? That question requires expertise, methodology, and manual evaluation. It requires someone who understands not just the WCAG criteria but the assistive technologies that real users depend on and the interaction patterns that either support or block them.

Your scanner gave you a signal. The signal is not the conclusion. What you do with the space between the two is what determines your actual accessibility posture, and your actual risk.