VoiceOver’s spatial logic is eluding me

One insight into the accessibility of a web page is how a screen reader reads the page out loud. The VoiceOver screen reader provides training in its use. The training shows how to navigate on a page, but the example raises more questions than it answers.


After attending Learners Guild for 40 weeks and blogging about it 40 times, I hiatused for 3 months while attending a week-long accessibility conference in San Diego, obediently cramming for a Google job interview (Google instructs its applicants to cram, and how to cram), applying and interviewing for other software-development and accessibility jobs, adding to my technical skills, doing some pro bono (sometimes uninvited and perhaps unwelcome) accessibility consulting under the rubric of geezer.pro, and moving from Oakland to Seattle. While I may comment elsewhere on some of this, now let me muse about one specific anomaly I have discovered while studying accessibility.

Screen readers

If you are blind, have limited eyesight, are wearing a compress around your face, are cooking dinner, are driving a car, or are on the beach in blinding sunlight, you may want to have a web page read to you out loud. In some of the above situations you have the use of a keyboard, and then you may prefer to have a screen reader do the reading.

It has the advantages of simulating a human speaker well (at least in some languages); letting you adjust the volume, pitch, speed, and voice personality; and responding to dozens of keyboard commands that you can learn to give. There are several screen readers on the market.


The first screen reader I studied is VoiceOver, Macintosh version. It provides a built-in training course.

The VoiceOver training, on panel 5, lets you experiment with movement on a page. You can use the arrow keys of your keyboard to make the focus (i.e. where you are at) move.

Problem 1

Here’s the first problem. Arrow keys point in 4 directions. Things on a page are laid out on arbitrary places on a 2-dimentional space. Moreover, those things aren’t points; they are arbitrary shapes. An arrow key, when pressed, moves the focus from one thing to another on the page. But, if your only possible commands are left, right, up, and down, the effect of issuing one of those commands may be unpredictable.

Here is the territory that panel 5 asks you, for practice, to move the focus in with the arrow keys:

VoiceOver assigns these items to cells in an imaginary grid. Within that grid, the arrow keys do predictable things. But you don’t see the grid, even if you can see. So you need to guess how it is laid out.

As VoiceOver interprets this form, its items are assigned as follows:

Imputed grid in navigation practice in VoiceOver training
Comments: Show all
Hide all
Settings: Announce alert messages
Speak selected text
Speak text under the mouse
Learn More… Set Option
Quit Go Back Continue

Two nonobvious rules apply:

  • VoiceOver skips blank cells.
  • VoiceOver wraps to the next row or column after the last column or row.

So, now you can see that starting in the upper left corner and pressing the down arrow gives you “Comments:”, “Settings:”, “Quit”, “Show all”, etc. Or, if you press the right arrow instead, you get a sequence that includes “Set Option”, “Quit”, and “Go Back” in that order. The grid arrangement and flow aren’t obvious, and with more non-rectangularly displayed items predictability would further decrease.

Problem 2

You might reasonably ask why the navigation should be dictated by placement in the window anyway. This is the second problem. It shouldn’t. By the usual principle of user-centered design, it should be whatever does the user the most good.

If you see the display but have difficulty reading the text, then navigating visually may be your preferred rule. You see where an item is and can make a reasonable guess about the arrow key(s) that will get you to that item.

But, if you cannot see the display at all, why would you care where the items are? You would likely want to navigate in a logical order, which could be spatially vertical, horizontal, circular, or anything else. The very idea of using 4 keys to navigate might clash with your understanding of the decision space you are in. You might be logically in a tree, and it might be most efficient for you to type a minimally distinguishing string of characters naming your current node’s preferred child node. I confronted  a similar situation months ago, wondering about navigation with the tab key through the keys of a calculator. There it seemed dubious to follow the sequence 7, 8, 9, 4, 5, 6, 1, 2, 3 merely because calculator buttons are arranged that way.


WCAG 2.1 criterion 2.4.3 says that “focusable components receive focus in an order that preserves meaning and operability”. That there always is such an order, and that it is the same order for all users, seem implausible. This illustrates that compliance with codified accessibility standards requires judgment and testing and can often be disputed.


In case it isn’t obvious, these are ruminations, not pronouncements. I’m sharing my puzzlements with you as I learn. I have much accessibility lore yet to learn. If you can correct or amplify anything here, feel free to join the discussion below.

Leave a Reply

Your email address will not be published. Required fields are marked *

Edit translation
Machine translation (Google):
Copy to editor
or Cancel