Advanced ARIA Tip #2: Accessible modal dialogs

One question that came up more and more in recent months is how to create an accessible modal dialog with WAI-ARIA. So without further ado, here’s my take on the subject!

An example

To lay the ground work for some terms we’ll be using, let’s look at an example. A dialog is basically a sub window of an application that asks a question, requires some input, or allows to set options etc. Desktop applications have dialogs of all shapes and sizes, but there are some unique properties that distinguish them from other windows. The primary one is that it usually cannot be moved out of the main application window. Another is that usually, when it is open, the application around it becomes inert, and you cannot do anything with it as long as the dialog is open. The dialog is what we call modal. Yes, there are also non-modal dialogs, but they are then more like different panels in the main application window rather than a dialog in the classic sense of the word.

Let’s look at a simple example: Open your favorite word processor, type in something or paste your lorem ipsum, and then just close the window. You’ll get a dialog that consists of the following:

  • A title that contains the name of the application or some phrase like “Save the document”.
  • Text that states that you haven’t saved the document and what you want to do now.
  • A series of buttons like “Save”, “Don’t Save”, “Cancel” etc.

Screen readers will speak the title, the text, and the currently focused button automatically when you do the above steps with it running. This is important, so keep that in mind. It automatically reads the title and the text.

In the following example, we’ll call the title the “label”, and the text the “description”.

The simple Confirmation dialog

In HTML5 markup, this dialog could look something like this.

<div id="myDialog" role="dialog" aria-labelledby="myTitle" aria-describedby="myDesc">
  <div id="myTitle">Save "untitled" document?</div>
  <div id="myDesc">You have made changes to "untitled.txt" that have not been saved. What do you want to do?</div>
  <button id="saveMe" type="button">Save changes</button>
  <button id="discardMe" type="button">Discard changes</button>
  <button id="neverMind" type="button">Cancel</button>

Let’s go through this step by step:

  1. The first line declares a container that encompasses the whole dialog. Since HTML5 still doesn’t have a proper dialog element that is supported everywhere and gives us what we want, we will not use that element due to possible unpredictability of browser implementations. Instead, we’ll solely rely on WAI-ARIA to give us the semantics we need. The role is “dialog”.
  2. The aria-labelledby references the ID of our title. This is important since that will give the accessible object for the dialog the proper title, or label, or accessible name, that is spoken first by the screen reader when focus moves into the dialog.
  3. The aria-describedby associates the main dialog text with the dialog as well, which will turn it into the accessible description of the dialog object upon later translation. AccDescription is additional information that enhances the name and is spoken after name, role, and states. This concept goes back to the very old Windows days and is supported by all modern screen readers. If screen readers find an accDescription for a dialog accessible, they’ll speak it. Linux and OS X have adopted this concept later.
  4. Below the actual divs that contain the title and description texts, there are three buttons. These are regular button elements with a type of “button” so they are considered valid even outside of a form element.

From this, there are some rules to be deduced:

  1. If you use the role “dialog”, you must use aria-labelledby to point to the element containing the visible dialog title. The aria-labelledby attribute must be on the same HTML element as the role “dialog” attribute.
  2. If your dialog has one main descriptive text, you must use aria-describedby to point to its element, also on the same element that has role “dialog”. You can also point to multiple text elements by separating their IDs with spaces.

One note about Safari on OS X: In my testing, I found that the dialog text would not immediately be read. Safari sometimes has the tendency to map aria-describedby in a funny way that makes the description end up as help text, spoken after a several second delay, or retrievable via Ctrl+Option+Shift+H. Apple should really change this to something that gets picked up as dialog text. The fact that VoiceOver reads dialogs correctly in, for example, TextEdit shows that it has principal provisions for that, it just needs to be mapped correctly from Safari.

The heavy-lifting

OK, that was the easy part. For the following, you must remember that WAI-ARIA merely provides semantic information. It tells the screen reader what the elements represent. It does not automatically introduce certain types of behavior. You have to use JavaScript to do that. The following sections describe what you need to do, and why.

Initial focus

When you open your dialog, for example by setting it from display: none; to display: block;, or by inserting its markup into the DOM, you have to give it keyboard focus. Focus should be on the first keyboard focusable element. In our example above, keyboard focus would go to the “Save changes” button. Opening the dialog will not cause the browser to set focus in there automatically, so you’ll have to do it manually in JS via the .focus() method. Setting focus will cause the screen reader to recognize the change of context and detect that focus is now inside a dialog, and will do its magic to read the title and description of the dialog automatically.

Trapping keyboard focus inside the dialog

Since your dialog is most likely modal, you will probably have something in place that ignores mouse clicks outside of it. You will have to do something similar for keyboard focus. The common way to move around controls is via the tab key. So you’ll have to make sure your keyboard focus never leaves the dialog. A few rules for this:

  1. Create a natural tab order by arranging elements in the DOM in a logical order. That will cause the tab order to flow naturally without you having to do much. If you have to use the tabIndex attribute for anything, give it a value of 0 so the element fits into the natural tab order.
  2. On the very last element of your dialog, add a special keyPress handler that traps the forward tab key to redirect it to the first element of the dialog. Otherwise, tab would jump outside the dialog into the surrounding document or to the browser UI, and we don’t want that.
  3. Likewise, add a keyPress handler to the very first focusable element inside the dialog that traps the shift+tab key combination so the backwards tab order also doesn’t leave the dialog.

Provide an escape route

As is customary in dialogs, there is usually a way to escape out of it without making changes. Often, this does the same as whatever the Cancel button does. You’ll have to provide the same functionality to your dialog. So, make sure pressing Escape will close your dialog, and usually discard any changes.

Restoring focus after closing the dialog

Of course, Escape isn’t the only way to close your dialog. You would also provide a button to accept changes and execute on whatever functionality you desire. In any case, after you close the dialog, you must!!!, you absolutely must!!! then set focus back to the element that opened the dialog, or any other useful element from which the user of your app will most likely continue working. Never ever let keyboard focus go into Nowhere Land after the user dismisses a dialog you opened. In the worst case, it lands on the document or somewhere else far away from where the user’s context was before they opened the dialog. There is nothing more frustrating than to have to spend 10 or more seconds trying to find one’s last place again from where one left off.

Other dialog cases

Of course, your dialogs will most likely be more complex than what I showed above. Here are some more hints on what to do in certain situations.

A dialog with only a single button

Sometimes, you’ll create a dialog just to let users know that something completed successfully. Just follow the rules laid out above with focus on the single Close or OK button, and make sure the title and description are fully wired up via ARIA attributes as shown.

A dialog without descriptive text

If you don’t have general descriptive text, well then that’s that. Just make sure your form inputs are properly labeled as is customary for HTML forms, and if you group things together, use fieldset and legend elements to provide context. Also, of course, you can always use aria-describedby on individual inputs or other focusable elements to provide additional contextual information that you visually put outside the main label for that element or such. It all comes down to listening to your dialog and seeing if you can still understand what you’re asking of the user.

Multi-tabbed dialogs

Of course these are possible in HTML as well! You’ll probably not have general descriptive text, but yes, you can have a tablist/tabs/tabpanels construct inside a dialog just fine. You may then have aria-labelledby on the tabpanel pointed to its tab, and aria-describedby pointing to descriptive text within that panel that may give some context. Screen readers will pick that up when you move into the panel as well. Initial focus will then go to either the first tab in the tab list with the tab panel already opened, or the selected tab with its associated tab panel opened, if the dialog had been opened before and you want to preserve that when the user comes back. Your choice.

What about role “alertdialog”?

Hmm… Uhh… Don’t use it. Seriously, I’ve been working with ARIA for over seven years now, and I still haven’t figured out why role “alertdialog” was ever invented. It has all the properties of a “dialog”, and yet inherits the aria-live attribute with a value of “assertive”. That will cause its label, or title, to be spoken by a screen reader immediately upon its appearance, interupting everything else.

But a correctly marked up dialog will already speak all relevant information upon initial focus, and there should not be a reason not to give a dialog keyboard focus when it opens, so having a dialog yell at the screen reader user would only make sense if one would not initially set focus to it. And that is, from a usability standpoint, generally not a good thing to do in my opinion. I know opinions may differ on this, but since you’re on my blog and I can give you my advice, that is: Don’t use role “alertdialog”.

Other examples

Other fellow accessibility evangelists have created dialog examples that work similarly, but also have a few differences to what I discussed above. Lisa Seeman’s dialog example, for example, sets focus to the initial heading that contains the dialog title, and doesn’t associate the text above the fake login form to the dialog description. Also, in my testing, while shift+tab was trapped, tab was not, so I could move outside the dialog into the browser UI.

HeydonWork’s example shows the technique of using role “document” inside the dialog to make screen readers switch back to their reading mode. Cleverly used, this technique can help to work in application and browse mode automatically wherever appropriate.

And the most complete and comprehensive implementation I’ve seen, and which worked in a lot of combinations I tested, is this example by Greg Kraus called Incredible Accessible Modal Window. Thanks, Sina, for the pointer, as I didn’t even know about this one when I wrote the initial version of this post.


Creating a modal dialog that is fully accessible requires some work, but it is not rocket science or even magic. The important things to be aware of have been highlighted throughout this article, like associating both a label and descriptive text which screen readers will then pick up. It is also important to always control where focus goes inside the dialog, and when it closes.

I hope that this introduction to the topic of modal dialogs on the web helps you create more accessible experiences in the future! And of course, your questions and comments are always welcome!

happy hacking!

Easy ARIA Tip #7: Use “listbox” and “option” roles when constructing AutoComplete lists

One question that comes up quite frequently is the one of which roles to use for an auto-complete widget, or more precisely, for the container and the individual auto-complete items. Here’s my take on it: Let’s assume the following rough scenario (note that the auto-complete you have developed may or may not work in the same, but a similar way):

Say your auto-complete consists of a textbox or textarea that, when typing, has some auto-complete logic in it. When auto-complete results appear, the following happens:

  1. The results are being collected and added to a list.
  2. The container gets all the items and is then popped into existence.
  3. The user can now either continue typing or press DownArrow to go into the list of items.
  4. Enter or Tab select the current item, and focus is returned to the text field.

Note: If your widget does not support keyboard navigation yet, go back to it and add that. Without that, you’re leaving a considerable amount of users out on the advantages you want to provide. This does not only apply to screen reader users.

The question now is: Which roles should the container and individual items get from WAI-ARIA?Some think it’s a list, others think it’s a menu with menu items. There may be more cases, but those are probably the two most common ones.

My advice: use the listbox role for the container, and option for the individual auto-complete items the user can choose. The roles menubar, menu, and menuitem plus related menuitemcheckbox and menuitemradio roles should be reserved for real menu bar/dropdown menu, or context menu scenarios. But why, you may ask?

The short version: Menus on Windows are a hell of a mess, and that’s historically rooted in the chaos that is the Win32 API. Take my word for it and stay out of that mess and the debugging hell that may come with it.

The long version: Windows has always known a so-called menu mode. That mode is in effect once a menu bar, a drop-down menu, or a context menu become active. This has been the case for as long as Windows 3.1/3.11 days, possibly even longer. To communicate the menu mode state to screen readers, Windows, or more precisely, Microsoft Active Accessibility, uses four events:

  1. SystemMenuStart: A menu bar just became active.
  2. SystemMenuPopupStart: If a SystemMenuStart event had been fired before, a drop-down menu just became active. If a SystemMenuStart event had not been fired before, a context menu just became active. If another SystemMenuPopupStart preceeded this one,  a sub menu just opened.
  3. SystemMenuPopupEnd: The popup just closed. Menu mode returns to either the previous Popup in the stack (closing of a sub menu), the menu bar, or falls out of menu mode completely.
  4. SystemMenuEnd: A menu bar just closed.

These events have to arrive in this exact order. Screen readers like JAWS or Window-Eyes rely heavily on the even order to be correct, and they ignore everything that happens outside the menus once the menu mode is active. And even NVDA, although it has no menu mode that is as strict as that of other “older” screen readers, relies on the SystemMenuStart and SystemMenuPopupStart events to recognize when a menu gained focus. Because the menu opening does not automatically focus any item by default. An exception is JAWS, which auto-selects the first item it can once it detects a context or start menu opening.

You can possibly imagine what happens if the events get out of order, or are not all fired in a complete cycle. Those screen readers that rely on the order get confused, stay in a menu mode state even when the menus have all closed etc.

So, when a web developer uses one of the menu roles, they set this whole mechanism in motion, too. Because it is assumed a menu system like a Windows desktop app is being implemented, browsers that implement WAI-ARIA have to also send these events to communicate the state of a menu, drop-down or context or sub menu.

So, what happens in the case of our auto-complete example if you were to use the role menu on the container, and menuitem on the individual items? Let’s go back to our sequence from the beginning of the post:

  1. The user is focused in the text field and types something.
  2. Your widget detects that it has something to auto-complete, populates the list of items, applies role menuitem to each, and role menu to the container, and pops it up.
  3. This causes a SystemMenuPopupStart event to be fired.

The consequences of this event are rather devastating to the user. Because you just popped up the list of items, you didn’t even set focus to one of its items yet. So technically and visually, focus is still in your text field, the cursor is blinking away merrily.

But for a screen reader user, the context just changed completely. Because of the SystemMenuPopupStart event that got fired, screen readers now have to assume that focus went to a menu, and that just no item is selected yet. Worse, in the case of JAWS, the first item may even get selected automatically, producing potentially undesired side effects!

Moreover, the user may continue typing, even use the left and right arrow keys to check their spelling, but the screen reader will no longer read this to them, because their screen reader thinks it’s in menu mode and ignores all happenings outside the “menu”. And one last thing: Because you technically didn’t set focus to your list of auto-complete items, there is no easy way to dismiss that menu any more.

On the other hand, if you use listbox and option roles as I suggested, none of these problems occur. The list will be displayed, but because it doesn’t get focus yet, it doesn’t disturb the interaction with the text field. When focus gets into the list of items, by means of DownArrow, the transition will be clearly communicated, and when it is transitioning back to the text field, even when the list remains open, that will be recognized properly, too.

So even when you sighted web developers think that this is visually similar to a context menu or a popup menu or whatever you may want to call it, from a user interaction point of view it is much more like a list than a menu. A menu system should really be confined to an actual menu system, like the one you see in Google Docs. The side effects of the menu related roles on Windows are just too severe for scenarios like auto-completes. And the reason for that lies in over 20 years of Windows legacy.

Some final notes: You can spice up your widget by letting the user know that auto-complete results are available via a text that gets automatically spoken if you add it in a text element that is moved outside the viewport, but apply an attribute aria-live=”polite” to it. In addition, you can use aria-expanded=”true” if you just popped up the list, and aria-expanded=”false” if it is not there, both applied to your input or textarea element. And the showing and hiding of the auto-complete list should be done via display:none; or visibility:hidden; and their counterparts, or they will appear somewhere in the user’s virtual buffer and cause confusion.

A great example of all of this can be seen in the Tweet composition ContentEditable on

I also sent a proposal for an addition to the Protocols and Formatting Working Group at the W3C, because the example in the WAI-ARIA authoring practices for an auto-complete doesn’t cover most advanced scenarios, like the one on Twitter and others I’ve come across over time. Hope the powers that may be follow my reasoning and make explicit recommendations regarding the use of roles that should and shouldn’t be used for auto-completes!

Why accessibility APIs matter

This morning, Victor from payPal and I got into an exchange on Twitter regarding the ChromeVox extension. ChromeVox is a Chrome extension which provides screen reading functionality for blind users. Through keyboard commands, the user can navigate page content by different levels like object by object, heading by heading, form control by form control, etc.

Wait, you might say, but this is screen reader functionality I know from NVDA or JAWS on Windows, or VoiceOver on the Mac! And you’re absolutely right! The difference is that ChromeVox is working in the web content area of Chrome only, not in the menus or tool bars.

That in itself is not a bad thing at all. The problem comes with the way ChromeVox does what it does. But in order to understand that, we need to dive into a bit of history first.

Back in the late 1990s, when the web became more important every day, and Windows was the dominant operating system. With the big browser war coming to an end with Microsoft’s Internet Explorer as the great winner, it was apparent that, in order to make the web accessible to blind users and those with other disabilities, assistive technologies had to deal with IE first and foremost. Other browsers like Netscape quickly lost relevance at that time in this regard, and Firefox wasn’t even thought about yet.

Microsoft took sort of a mixed approach. They were bound by their own first implementation of an accessibility API, called Microsoft Active Accessibility, or MSAA. MSAA had some severe limitations in that, while it knew very well how to make a button or checkbox accessible, it had no concept of document-level accessibility or concepts such as headings, paragraphs, hyperlinks etc. Microsoft managed to evolve MSAA bit by bit over time, giving it more role mappings (meaning different types of content had more vocabulary in saying what they were), but there were still severe limitations when it came to dealing with actual text and attributes of that text.

But screen readers wanted more than just knowing paragraphs, links, form fields and buttons. So the Windows assistive technologies were forced to use a combined approach of using some information provided by MSAA, and other information provided by the browser’s document object model (DOM). There were APIs to get to that DOM in IE since version 5.0, and for the most part, the way rich internet information is accessed in IE by screen readers has not changed since 1999. Screen readers still have to scrape at the DOM to get to all the information needed to make web content accessible in IE.

In 2001, Aaron Leventhal of Netscape, later IBM, started working on a first implementation of accessibility in the Mozilla project, which later became Firefox. To make it easier for assistive technology vendors to come aboard and support Mozilla in addition to Firefox, a decision was made to mimic what IE was exposing. That part of interface is with Firefox until today, and being used by some Windows screen readers, although we nowadays strongly evangelize for use of a proper API and hope to deprecate what we call the ISimpleDOM interfaces in the future.

In 2004/2005, accessibility people at Sun Microsystems and the GNOME foundation, as well as other parties, became interested in making Firefox accessible on the Linux desktop. However, this platform had no concept similar to the interfaces used on Windows, and a whole new and enhanced set of APIs had to be invented to satisfy the needs of the Linux accessibility projects. Other software packages like much of the GNOME desktop itself, and OpenOffice, also adopted these APIs and became accessible. While some basic concepts are still based on the foundation laid by MSAA, the APIs on Linux quickly surpassed these basics on a wide scale.

Around the same time, work began on the NVDA project, the first, and to date only, open-source screen reader on Windows. The NVDA project leaders were interested in making Firefox accessible, giving users a whole open-source combination of screen reader and browser. However, they were not planning on building screen-scraping technology into NVDA that was used by other Windows screen readers, but wanted API-level access to all content right from the start. Out of this requirement, an extension to MSAA, called IAccessible2, was born, which enhanced MSAA with stuff already present in the Linux accessibility APIs. As a consequence, they are very similar in capability and nomenclature.

In parallel to that, Apple had been developing their own set of APIs to make OS X accessible to people with visual impairments. Universal Access and the NSAccessibility protocol are the result, accessibility from an API level that also does not require the screen reader to scrape video display content to get to all the information. This protocol is in many ways very different in its details, but offers roughly the same capabilities.

Within Firefox, this meant that gaps that were previously only pluggable by using the browser DOM directly, needed to be closed with proper API mappings. Over time, these became very rich and powerful. There is a platform-independent layer with all capabilities, and platform-specific wrappers on top which abstract and slightly modify (on occasion) the exposed information to make it suitable for each platform. Both Firefox for Android and Firefox OS JavaScript bridges to Talkback and a speech synthesizer respectively, use the platform-independent layer to access all information. Whenever we find the JavaScript code needs to access information from the DOM directly, we halt and plug the hole in the platform-independent APIs instead, since there will no doubt be a situation where NVDA or Orca could also run into that gap.

So to re-cap: In IE, much information is gathered by looking at the browser DOM directly, even by NVDA because there is no other way. In Firefox, some of the more legacy screen readers on Windows also use this technique provided by Firefox as a compatibility measure, but all newer implementations like NVDA use our IAccessible2 implementation and no DOM access to give users the web experience.

Safari on OS X uses the Apple NSAccessibility protocol obviously. It has since been discontinued on Windows, and never had much of an  MSAA support to speak of.

Google Chrome also exposes its information through Apple’s NSAccessibility protocol on OS X, and uses MSAA and IA2, at least to some degree, on Windows.

And what does ChromeVox use?

Here’s the big problem: ChromeVox uses DOM access exclusively to provide access to web content to blind users. It does, as far as I could tell, not use any of Chrome’s own accessibility APIs. On the contrary: The first thing ChromeVox does is set aria-hidden on the document node to make Chrome not expose the whole web site to VoiceOver or any other accessibility consumer on OS X or Windows. In essence, both Crome and ChromeVox perform their own analysis of the HTML and CSS of a page to make up content. And the problem is: They do not match. An example is the three popup buttons at the top of Facebook where the number of friend requests, messages, and notifications are displayed. While Crome exposes this information correctly to VoiceOver, ChromeVox only reads the button label if there is a count other than 0. Otherwise, the buttons sound like they are unlabeled.

In my half hour of testing, I found several pages where there were these inconsistencies between what Chrome exposes, and what chromeVox reads to users. An example quite to the contrary is the fact that Google Docs is only accessible if one uses Chrome and chromeVox. What Chrome exposes to VoiceOver or NVDA is not sufficient to gain the same level of access to Google Docs.

If you are a web developer, you can imagine what this means! Even if you go through the trouble of testing your web site with Chrome and Safari to make sure they expose their information to VoiceOver, it is not guaranteed that ChromeVox users will benefit, too. Likewise, if you use chromeVox exclusively, you have no certainty that the other APIs are able to cope with your content.

Web developers on Windows have also learned these lessons the hard way with different screen readers in different browsers: Because with IE, each is forced to do their own interpretation of HTML, at least on some level, results will undoubtedly differ.

There is a harmonization effort going on at the W3C level to make sure browsers interoperate on what they expose on each platform for different element types. However, if prominent testing tools like ChromeVox or some legacy screen readers on Windows hang on to using their own HTML interpretation methods even when there are APIs available to provide them with that information, this effort is made very very difficult and puts a big extra burden on those web developers who are making every effort to make their sites or web apps accessible.

When we started work at Mozilla to make both Firefox for Android and Firefox OS accessible, we made a conscious decision that these platforms needed to use the same API as the desktop variants. Why? Because we wanted to make absolutely sure that we deliver the same kind of information on any platform for any given HTML element or widget. Web developers can count on the fact that we at Mozilla will always ensure that if your stuff works in a desktop environment, it is highly likely that it will also speak properly on Firefox OS or through TalkBack in Firefox for Android. That is why our JS bridge does not use its own DOM access methods, so there are no interpretation gaps and API diversities.

And here’s my pledge to all who still use their own DOM scraping methods on whichever platform: Stop using them! If you have an API available by the browser, use that whenever possible! You will make your product less prone to changes or additions in the HTML spec and supported elements. As an example: Does anyone remember earlier versions of the most prominent screen reader on Windows which suddenly stopped exposing certain Facebook content because Facebook started using header and footer elements? The screen reader didn’t know about those and ignored everything contained within the opening and closing tags of those elements. It required users to buy an expensive upgrade to get a fix for that problem, if I remember correctly. NVDA and Orca users with Firefox, on the other hand, simply continued to enjoy full web access to Facebook when this change occurred, because Firefox accessibility already knew how to deal with the header and footer elements and told NVDA and Orca everything they needed to know.

On the other hand, if you are using DOM scraping because you find that something is missing in the APIs provided by the browser, work with the browser vendor! Identify the gaps! Take the proposals to close those gaps to the standards bodies at the W3C or the HTML accessibility task force! If you’re encountering that gap, it is very likely others will, too!

And last but not least: Help provide web developers with a safer environment to work in! Web apps on the desktop and mobile operating systems are becoming more and more important every day! Help to ensure people can provide accessible solutions that benefit all by not making their lives unnecessarily hard!