This morning, Victor from payPal and I got into an exchange on Twitter regarding the ChromeVox extension. ChromeVox is a Chrome extension which provides screen reading functionality for blind users. Through keyboard commands, the user can navigate page content by different levels like object by object, heading by heading, form control by form control, etc.

Wait, you might say, but this is screen reader functionality I know from NVDA or JAWS on Windows, or VoiceOver on the Mac! And you’re absolutely right! The difference is that ChromeVox is working in the web content area of Chrome only, not in the menus or tool bars.

That in itself is not a bad thing at all. The problem comes with the way ChromeVox does what it does. But in order to understand that, we need to dive into a bit of history first.

Back in the late 1990s, when the web became more important every day, and Windows was the dominant operating system. With the big browser war coming to an end with Microsoft’s Internet Explorer as the great winner, it was apparent that, in order to make the web accessible to blind users and those with other disabilities, assistive technologies had to deal with IE first and foremost. Other browsers like Netscape quickly lost relevance at that time in this regard, and Firefox wasn’t even thought about yet.

Microsoft took sort of a mixed approach. They were bound by their own first implementation of an accessibility API, called Microsoft Active Accessibility, or MSAA. MSAA had some severe limitations in that, while it knew very well how to make a button or checkbox accessible, it had no concept of document-level accessibility or concepts such as headings, paragraphs, hyperlinks etc. Microsoft managed to evolve MSAA bit by bit over time, giving it more role mappings (meaning different types of content had more vocabulary in saying what they were), but there were still severe limitations when it came to dealing with actual text and attributes of that text.

But screen readers wanted more than just knowing paragraphs, links, form fields and buttons. So the Windows assistive technologies were forced to use a combined approach of using some information provided by MSAA, and other information provided by the browser’s document object model (DOM). There were APIs to get to that DOM in IE since version 5.0, and for the most part, the way rich internet information is accessed in IE by screen readers has not changed since 1999. Screen readers still have to scrape at the DOM to get to all the information needed to make web content accessible in IE.

In 2001, Aaron Leventhal of Netscape, later IBM, started working on a first implementation of accessibility in the Mozilla project, which later became Firefox. To make it easier for assistive technology vendors to come aboard and support Mozilla in addition to Firefox, a decision was made to mimic what IE was exposing. That part of interface is with Firefox until today, and being used by some Windows screen readers, although we nowadays strongly evangelize for use of a proper API and hope to deprecate what we call the ISimpleDOM interfaces in the future.

In 2004/2005, accessibility people at Sun Microsystems and the GNOME foundation, as well as other parties, became interested in making Firefox accessible on the Linux desktop. However, this platform had no concept similar to the interfaces used on Windows, and a whole new and enhanced set of APIs had to be invented to satisfy the needs of the Linux accessibility projects. Other software packages like much of the GNOME desktop itself, and OpenOffice, also adopted these APIs and became accessible. While some basic concepts are still based on the foundation laid by MSAA, the APIs on Linux quickly surpassed these basics on a wide scale.

Around the same time, work began on the NVDA project, the first, and to date only, open-source screen reader on Windows. The NVDA project leaders were interested in making Firefox accessible, giving users a whole open-source combination of screen reader and browser. However, they were not planning on building screen-scraping technology into NVDA that was used by other Windows screen readers, but wanted API-level access to all content right from the start. Out of this requirement, an extension to MSAA, called IAccessible2, was born, which enhanced MSAA with stuff already present in the Linux accessibility APIs. As a consequence, they are very similar in capability and nomenclature.

In parallel to that, Apple had been developing their own set of APIs to make OS X accessible to people with visual impairments. Universal Access and the NSAccessibility protocol are the result, accessibility from an API level that also does not require the screen reader to scrape video display content to get to all the information. This protocol is in many ways very different in its details, but offers roughly the same capabilities.

Within Firefox, this meant that gaps that were previously only pluggable by using the browser DOM directly, needed to be closed with proper API mappings. Over time, these became very rich and powerful. There is a platform-independent layer with all capabilities, and platform-specific wrappers on top which abstract and slightly modify (on occasion) the exposed information to make it suitable for each platform. Both Firefox for Android and Firefox OS JavaScript bridges to Talkback and a speech synthesizer respectively, use the platform-independent layer to access all information. Whenever we find the JavaScript code needs to access information from the DOM directly, we halt and plug the hole in the platform-independent APIs instead, since there will no doubt be a situation where NVDA or Orca could also run into that gap.

So to re-cap: In IE, much information is gathered by looking at the browser DOM directly, even by NVDA because there is no other way. In Firefox, some of the more legacy screen readers on Windows also use this technique provided by Firefox as a compatibility measure, but all newer implementations like NVDA use our IAccessible2 implementation and no DOM access to give users the web experience.

Safari on OS X uses the Apple NSAccessibility protocol obviously. It has since been discontinued on Windows, and never had much of an  MSAA support to speak of.

Google Chrome also exposes its information through Apple’s NSAccessibility protocol on OS X, and uses MSAA and IA2, at least to some degree, on Windows.

And what does ChromeVox use?

Here’s the big problem: ChromeVox uses DOM access exclusively to provide access to web content to blind users. It does, as far as I could tell, not use any of Chrome’s own accessibility APIs. On the contrary: The first thing ChromeVox does is set aria-hidden on the document node to make Chrome not expose the whole web site to VoiceOver or any other accessibility consumer on OS X or Windows. In essence, both Crome and ChromeVox perform their own analysis of the HTML and CSS of a page to make up content. And the problem is: They do not match. An example is the three popup buttons at the top of Facebook where the number of friend requests, messages, and notifications are displayed. While Crome exposes this information correctly to VoiceOver, ChromeVox only reads the button label if there is a count other than 0. Otherwise, the buttons sound like they are unlabeled.

In my half hour of testing, I found several pages where there were these inconsistencies between what Chrome exposes, and what chromeVox reads to users. An example quite to the contrary is the fact that Google Docs is only accessible if one uses Chrome and chromeVox. What Chrome exposes to VoiceOver or NVDA is not sufficient to gain the same level of access to Google Docs.

If you are a web developer, you can imagine what this means! Even if you go through the trouble of testing your web site with Chrome and Safari to make sure they expose their information to VoiceOver, it is not guaranteed that ChromeVox users will benefit, too. Likewise, if you use chromeVox exclusively, you have no certainty that the other APIs are able to cope with your content.

Web developers on Windows have also learned these lessons the hard way with different screen readers in different browsers: Because with IE, each is forced to do their own interpretation of HTML, at least on some level, results will undoubtedly differ.

There is a harmonization effort going on at the W3C level to make sure browsers interoperate on what they expose on each platform for different element types. However, if prominent testing tools like ChromeVox or some legacy screen readers on Windows hang on to using their own HTML interpretation methods even when there are APIs available to provide them with that information, this effort is made very very difficult and puts a big extra burden on those web developers who are making every effort to make their sites or web apps accessible.

When we started work at Mozilla to make both Firefox for Android and Firefox OS accessible, we made a conscious decision that these platforms needed to use the same API as the desktop variants. Why? Because we wanted to make absolutely sure that we deliver the same kind of information on any platform for any given HTML element or widget. Web developers can count on the fact that we at Mozilla will always ensure that if your stuff works in a desktop environment, it is highly likely that it will also speak properly on Firefox OS or through TalkBack in Firefox for Android. That is why our JS bridge does not use its own DOM access methods, so there are no interpretation gaps and API diversities.

And here’s my pledge to all who still use their own DOM scraping methods on whichever platform: Stop using them! If you have an API available by the browser, use that whenever possible! You will make your product less prone to changes or additions in the HTML spec and supported elements. As an example: Does anyone remember earlier versions of the most prominent screen reader on Windows which suddenly stopped exposing certain Facebook content because Facebook started using header and footer elements? The screen reader didn’t know about those and ignored everything contained within the opening and closing tags of those elements. It required users to buy an expensive upgrade to get a fix for that problem, if I remember correctly. NVDA and Orca users with Firefox, on the other hand, simply continued to enjoy full web access to Facebook when this change occurred, because Firefox accessibility already knew how to deal with the header and footer elements and told NVDA and Orca everything they needed to know.

On the other hand, if you are using DOM scraping because you find that something is missing in the APIs provided by the browser, work with the browser vendor! Identify the gaps! Take the proposals to close those gaps to the standards bodies at the W3C or the HTML accessibility task force! If you’re encountering that gap, it is very likely others will, too!

And last but not least: Help provide web developers with a safer environment to work in! Web apps on the desktop and mobile operating systems are becoming more and more important every day! Help to ensure people can provide accessible solutions that benefit all by not making their lives unnecessarily hard!