Sometimes you have to use illegal WAI-ARIA to make stuff work

In this blog post, I’d like to recap an experience I just had while trying to apply some accessibility enhancements to the NoodleApp client.

The problem

NoodleApp uses keyboard shortcuts to allow users to switch back and forth between posts, messages etc. that are displayed on the screen. Using the j and k keys, one can move down and up through the lists respectively. However, this will only change a visual indicator, done in CSS, but not give any indication that a real focus change occurred. If one presses tab, for example, focus will move to the next item depending on where keyboard focus last was, and not where the j and k shortcuts took the user.

This is not new: Twitter uses similar shortcuts, too, and even GMail has them, allowing to move among message threads.

All of these implementations, however, only change a visual indicator. They neither adjust keyboard focus, nor do they communicate a focus change to screen readers. In addition, at least the screen readers on Windows would not immediately be able to use these keyboard shortcuts anyway, since their virtual buffers and quick navigation keys would be captured before they reached the web application.

The easy part

The easy part of the solution to the problem is this:

  1. Add tabindex=”0″ to the ol-element that comprises the whole list to make it keyboard focus-able and include it in the tab order at the order determined by the flow of elements. Since its default location is appropriate, 0 is the correct value here.
  2. Add tabindex=”-1″ to each child li element, of which each contains a single post. This is so they become focus-able, but are not included in the tab order individually. Such an extra tab stop is unnecessary here.
  3. Add a .focus() call to the next reachable message, which is determined in handling the j and k keys, to set focus to the element, or the first post when none is not focused yet, but the user presses j or k.

These give us keyboard focus-ability. What one can do now is press j or k to move through the list of posts, and then press tab to actually move into the post details and onto items such as the user name link or one of the reply, re-post etc. actions. Very handy if one only uses the keyboard to work NoodleApp.

The tricky, AKA screen reader part

All the above does not yet give us any speech when it comes to screen readers. Generic HTML list items, normally non-focus-able, are not something screen readers would speak on focus. Moreover, the list would not be treated as a widget anyway yet.

The latter is easily solved by just adding an appropriate role=”listbox” to the ol element we already added the tabindex+”0″ attribute to above. This causes screen readers on Windows to identify this list as a widget one can enter focus or forms mode on, allowing keys to pass directly to the browser and web app instead of being captured by the screen reader’s virtual buffer.

And here is where it gets nasty. According to the documentation on the listbox role, child elements have to be of role option.

OK, I thought. Great, let’s just add role=”option” to each li element, then.

In Firefox and NVDA, this worked nicely. Granted, there was not any useful speech yet, since the list item spoke all text contained within, giving me the user name and such a couple of times, but hey, for a start, that was not bad at all! NVDA switched into focus mode when it was supposed to, tabbing gave me the child accessibles, all was well.

And then came my test with Safari and VoiceOver on Mac OS X.

And what I found was that role=”option”, despite it being said that this could contain images, caused all child items to disappear. The text was concatenated, but the child accessibles were all flattened straight. Tabbing yielded silence, VoiceOver could interact with text that it then found was not there anyway, etc., etc.

So, while my solution worked great on Windows with Firefox and NVDA, Safari and VoiceOver, a popular combination among blind people, failed miserably.

The solution

I then tried some things to see what effect they would have on VoiceOver>

  • I just added an aria-label to the existing code to see if that would make things better. It did not.
  • I tried the tree and treeitem roles. Result: List was gone completely. Apparently VoiceOver and Safari do not support tree views at present.

Out of desperation, I then thought of the group role. Those list items are essentially grouping elements for several child widgets. So I changed role=”option” to role=”group” and made an aria-label (the name has to be specified by the author) containing the user name, post text and relative time stamp.

And miraculously, it works! It works in both Firefox and NVDA, and Safari and VoiceOver combinations. Screen reader users now get speech when they navigate through the list with j and k, after they have switched their screen reader to focus or forms mode.

Yes, I know it is illegal to have group elements as child elements of a listbox role. But the problem is: neither WAI-ARIA nor HTML5 give me an equivalent to a rich list item known, for example, to XUL. And there is no other equivalent. Grid, gridrow, treegrid, rowgroup etc. are all not applicable, since we are not dealing with tabular, editable content.

Moreover, I cannot even be sure which of the browser/screen reader combinations is right with regards to flattening or not flattening content in role=”option”. The spec is not a hundred percent clear, so either could be right or wrong.

So, to have a solution that works now in popular browser/screen reader combinations, I had to resort to this admittedly illegal construct. Fortunately, it works nicely! Next step is obviously to advocate for a widget type either in HTML or WAI-ARIA that is conceptually an option item, but can hold rich compound child content.

What am I doing this for, anyway?

You may ask yourself: “If I can just read through with my virtual cursor, wyh do I want to use the other navigation method?”

The answer is: Yes, you can read through your list of posts using the virtual cursor. Problem is: Once the view is refreshed, either because you’ve reached the bottom and it loads older posts, or because there were new posts arriving at the top and the view needed refreshing, you lose your place. Using forms/focus mode and the j and k keys will remember your choice even if you load older posts or newer posts arrive after you started reading. You can also use other quick keys like r to reply, f to follow a user, and more, documented at the top of the page of NoodleApp once you open it for the first time. This is not unimportant for efficient reading, to have a means for the screen reader to keep track of where you are. And the virtual buffer concept does not always make this easy with dynamic content.

If you have suggestions

Please feel free to comment if you feel that I’m going about this the wrong way altogether, or if you think there are existing roles more suitable for the task than what I’ve chosen. Just remember that it has to meet the above stated criteria of focus-ability and interact-ability on the browser, not the virtual buffer level.

The code

If you’re interested in looking at the actual code, my commit can be found on Github.

If you use the WAI-ARIA role “application”, please do so wisely!

This goes out to all web developers out there reading this blog and implementing widgets and other rich content in HTML, CSS and JavaScript! If you think of using the WAI-ARIA role “application” in your code, please do so with caution and care! And here’s why:

What is it?

“application” is one of the WAI-ARIA landmark roles. If you’d like to read up on landmarks, please go here. It is used to denote a region of a web application that is to be treated like a desktop application, not like a regular web page. In other words, if you use something that is not part of standard HTML, like a mashup widget, and your page has no resemblance to a classic document in, roughly, over 90% of its content, “application” is for you. If you, on the other hand, make up a user interface solely of elements that are part of standard HTML like selects, check boxes, text boxes etc., and in addition use only common compound widgets and lots of document-like content like hyperlinks, you will most probably not want to use “application” because browsers and assistive technologies provide a standard interaction model for these already and don’t need special support from you in that.

Why use it at all?

Traditionally, assistive technologies like screen readers for the blind convert a page’s content into a format that is easier for a person with a disability to comprehend. A two-column newspaper style text, for example, is reformatted so that the text flows from beginning to end like it would in a single-column document. Multi-column layouts of pages are streamlined so that there’s a structured flow a person who, for example, cannot see the screen, can understand.

For this, screen readers especially on Windows have adopted a two-part user interaction model:

virtual cursor or browse mode
This is the mode screen readers especially on Windows operate in when the user browses a web page. The term virtual cursor has been used since its inception in 1999 because this feels to the user like a document in, for example, WordPad or MS Word. The user walks the document using the arrow keys and has the text to them read via the speech synthesizer. In addition, semantic information is spoken to indicate whether a particular piece of text is a link, graphic, form element, part of a data table structure, list etc. In addition, several keys are captured by the assistive technology and are not processed by the browser. These allow navigation by headings, lists, links, tables, form elements and others. Usually, these are done via single letters. The visual focus may or may not follow the virtual cursor onto focusable items, depending on the assistive technology in use and its settings.
Forms mode or focus mode
This is a mode where the user interacts with elements that accept a form of data entry. This may be via entering text, checking a check box, selecting one of several possible radio buttons, or making a selection in a select element. This mode is either invoked by putting the virtual or browse focus onto such an element and pressing a key (usually Enter), or by the assistive technology switching to this mode automatically when the virtual focus encounters such an element. Others may only activate this mode automatically when specifically using the tab key. In this mode, all keys are passed through to the browser. It is as if you were sitting in front of your browser using it with the keyboard, and don’t have a screen reader running. Likewise, if the application focus leaves such an element that supports or requires direct keyboard interaction, if it was switched on automatically, it will be switched off and the user returned to browse/virtual mode. Note: Some elements like buttons and links do not require the user to switch into focus or forms mode, because for efficiency, screen readers allow activation of these elements directly from virtual/focus modes.

The challenge is that you may be creating widgets that require you to force the user into direct interaction with the browser. You know that your widget can best be used via the keyboard if the user is not in virtual mode. In addition, you know that you have no classic document content to display, but only use widgets and provide all necessary context through them. This is what role “application” is for. It is under your control whether the user is being thrown into focus mode once your widget gains keyboard focus. Also, contrary to standard focus mode, if an assistive technology encounters an area that is marked up with role “application”, it is usually not so easy to manually exit out of that mode to review the surrounding content in browse mode.

So when should I use it, and when not?

You do not use it if a set of controls only contains these widgets, that are all part of standard HTML. This also applies if you mark them up and create an interaction model using WAI-ARIA roles instead of standard HTML widgets:

  • text box. This also applies to password, search, tel and other newer input type derivates
  • textarea
  • check box
  • button
  • radio button (usually inside a fieldset/legend element wrapper)
  • select + option(s)
  • links, paragraphs, headings, and other things that are classic/native to documents on the web.

You also do not use the “application” role if your widget is one of the following more dynamic and non-native widgets. Screen readers and other assistive technologies that support WAI-ARIA will support switching between browse and focus modes for these by default, too:

  • tree view
  • slider
  • table that has focusable items and is being navigated via the arrow keys, for example a list of e-mail messages where you provide specific information. Other examples are interactive grids, tree grids etc.
  • A list of tabs (tab, tablist) where the user selects tabs via the left and right arrow keys. Remember that you have to implement the keyboard navigation model for this!
  • dialog and alertdialog. These cause screen readers to go into a sort of application mode implicitly once focus moves to a control inside them. Note that for these to work best, set the aria-describedby attribute of the element whose role is “dialog” to the id of the text that explains the dialog’s purpose, and set focus to the first interactive control when you open it.
  • toolbar and toolbarbuttons, menus and menu items, and similar

You only want to use role “application” if the content you’re providing consists of only interactive controls, and of those, mostly advanced widgets, that emulate a real desktop application. Note that, despite many things now being called a web application, most of the content these web applications work with are still document-based information, be it Facebook posts and comments, blogs, Twitter feeds, even accordeons that show and hide certain types of information dynamically. We primarily still deal with documents on the web, even though they may have a desktop-ish feel to them on the surface.

In short: The times when you actually will use role “applications” are probably going to be very rare cases!

So where do I put this thing?

Put it on the closest containing element of your widget, for example the parent div of your element that is your outer most widget element. If that outer div wraps only widgets that need the application interaction model, this will make sure focus mode is switched off once the user tabs out of this widget.

Only put it on the body element if your page consists solely of a widget or set of widgets that all need the focus mode to be turned on. If you have a majority of these widgets, but also have something you want the user to browse, use the role “document” on the outer-most element of this document-ish part of the page. It is the counterpart to “application” and will allow you to tell the screen reader to use browse mode for this part. Also make this element tabbable by setting a tabindex value on it so the user has a chance to reach it. As a rule of thumb: If your page consists of over 90 or even 95 percent of widgets, role “application” may be appropriate. Even then, find someone knowledgeable who can actually test two versions of this: One with and one without role “application” set to see which model works best.

NEVER put it on a widely containing element such as body if your page consists mostly of traditional widgets or page elements such as links that the user does not have to interact with in focus mode. This will cause huge headaches for any assistive technology user trying to use your site/app.

Some examples

The behavior that originally prompted me to write this is the newest version of the layout of Gmail. It looked to me like Gmail treats the whole thing as role “application”, causing the user to tab a zillion times before actually getting to the inbox message table. It turned out to be a bug which has been haunting us for a few months now, where parts of the accessible tree (the thing we create from HTML and CSS) gets detached from the rest, causing a very similar behavior in screen readers like role “application”. If virtual cursor was still active, the users could press t in their screen reader once and jump to that table right away instead of tabbing 30 or 40 times. A temporary solution to this problem seems to be this: Google provide us with keyboard shortcuts j and k. These can be disabled in the account settings of Gmail. That makes the bug disappear. Had Google used role “application” here, it would have been inappropriate. Gmail is, below the surface, still strongly document based and thus many traditional interaction modes do apply here.

An example where roles “application” and “document” are really used wisely is the current Yahoo! mail web interface. The table where the messages are being displayed in a list is marked up to be an “application”, because the arrow keys are used to navigate between messages, enter opens one etc. Once a mail is being displayed, everything around the actual mail header and text is an application, but the mail header and text are a document, so role “document” is used and initial focus is put there so the user can immediately start reading their mail in a familiar browsing fashion.

funny disclaimer: Yahoo! do not pay me for constantly calling them out as a good example of an accessible web app. They just did a bang-up job, that’s all! 🙂

Closing comments

If you have questions about this, feel free to post them here on the blog, I’ll do my best to answer them and also incorporate answers to updates to this post. You can also find the Google Group free-aria and discuss questions there with other web developers, standards experts and users. This is an important topic that, if done right, can provide all your users with a rich and pleasant experience, but if done wrong, can also cause headaches, complaints, and cause your web app to be less likeable by a sizeable number of users.

So use role “application” only if you absolutely have to, and tests show that this provides the better interaction model! Use it wisely when you do it, it’s worth the effort!

Update February 7

I’ve incorporated comments that were posted to the original version of this blog post into the post, making the statements even stronger hopefully. Thanks to Jamie for pointing out a couple of very important points that make the point against using role “application” in most cases even stronger!

Update Feb 9, 2012

Google’s accessibility team contacted me, stating that they actually don’t use role “application” in Gmail. After further investigation, it turns out that this seems to be another case of a bug we’ve been hunting for months, but which the main person who can fix it, cannot reproduce. I’ve thus updated the example section with the appropriate info. Thanks to Google for getting back to me and providing these details!