Making the Quip editor accessible

By Joyce Zhu

In our last blog post, we gave an overview of the work that Quip’s dedicated accessibility team has been doing since 2020. But the story didn’t start there!

Back in 2018, I was part of a brief initial effort spun up by Quip to explore and fix our most glaring accessibility-related issues. We quickly realized that the worst of the roadblocks were fundamental challenges with the centerpiece of the product: our multiplayer document editor, which allows for all sorts of embedded rich objects (such as user mentions, embedded spreadsheets, and live apps).

ContentEditable support = false

Quip’s editor is based around an article node with contenteditable=true set on it. This single HTML attribute caused some browsers and screen readers to suddenly stop supporting even the most commonly-used ARIA attributes used to manipulate the accessibility tree. Our initial explorations quickly found that our problems wouldn’t be solved just by fixing markup that didn’t meet WCAG guidelines: we would also need to actively uncover (and work around) idiosyncrasies in screen reader and rendering engine implementations that didn’t behave according to standards.

Given limited time, the 2018 team decided to focus on more modest goals. For one thing, we unblocked screen reader users from at least reading document content by introducing a keyboard shortcut (now Cmd+Opt+R or Ctrl+Alt+R) that removed the contenteditable attribute on the editor HTML node, disabling editing but allowing for a much better reading experience. We also built affordances that let screen reader users semantically navigate many oddly-structured types of editor content.

As an example: for performance reasons, all bulleted list sections in Quip — regardless of how many nested sublists they contain — are rendered as a single flat ul element, with each li child indented to an appropriate level using CSS to limit how much of the DOM are touched during indentation changes. However, screen readers have no awareness of the visual hierarchy of sublists that results from this indentation. To resolve this, we constructed a parallel DOM structure to express that hierarchy via aria-owns.

While we managed to make navigating basic content types much clearer in read-only circumstances, we were still left with the unsolved problem of a coherent editing experience. When the current accessibility team spun up in 2020, we knew we had to focus on finding a technical approach that would solve this problem while minimally disturbing our vast and complex editor codebase.

The roads not taken

Before work started, we conducted a comprehensive survey of screen reader behavior in other document editing products. The approach we found in many of these products avoided fixing behavior or markup inside the document editor itself, instead choosing to use a separate off-screen aria-live="assertive" region to interrupt whatever the screen reader usually reports after an action. For example, navigating down a line or typing a character would replace the text in this live region with that line of text or that character.

This approach initially appealed to us, as it would allow us to abstract accessibility behavior from the rest of the codebase, and our team only would only have to maintain a single HTML node instead of dealing with the intricacies of existing editor objects. However, we ultimately decided not to head down this path. From an engineering standpoint, we were wary of how much micromanaging we would need to do — everything a user could possibly do in the editor would need to manually update the content of the live region. Between the complexity of our editor code and how many other engineers were modifying it regularly, this would have created a breeding ground for bugs.

More importantly, we received clear community feedback that this approach wasn’t ideal for real screen reader users for several reasons. For one, our lack of direct control over the timing of the screen reader queue meant that users might end up with an unreliable reading experience. Moreover, this approach forces a single screen reader experience onto all users—we would override all of their preferences, as well as potentially deviate from their expectations about how different actions and elements are announced.

Another approach we floated but quickly abandoned was the idea of creating a simpler, less dynamic markdown-only editor. Though this might have been a lot simpler as a short-term solution, we realized it would ultimately create a second-class experience with many drawbacks for our users while introducing a new divergent surface for us to continually maintain and update.

We made a conscious decision to do the right thing by our users (and our future selves) even if it would be hard in the short-term: take advantage of what the ARIA spec supported and fix the behavior of the elements actually in the DOM.

Remediating the editor

Once we foreclosed on “easy fixes”, we returned to investigating why screen reader users were so disoriented when using our editor. Some investigation boiled it down to three basic problems:

  • we lacked a consistent, robust keyboard-only interaction patterns for rich objects
  • there were gaps in a 1:1 mapping between arrow keystrokes and character movement through text, and
  • we did not clearly, semantically identify all the different types of editor content that can end up in a document

Both our 2018 discoveries and our discussions with PAC (Prime Access Consulting) made it clear we would have the best chance of success if we initially optimized for a specific set of browser/screen reader combinations. Given our particular user base, we selected Chrome+NVDA (Windows) and Safari+VoiceOver (Mac) and got to work.

Interacting with rich objects

The Quip editor is capable of hosting a variety of rich (non-plaintext) objects: some in-line (such as user mentions), others embedded objects (such as spreadsheets). The interaction patterns for these objects were originally designed with just mouse users in mind, which resulted in entire Quip features being unusable for keyboard-only users. Among many other things, non-mouse users couldn’t examine a user’s profile hovercard, edit the URL associated with linkified text, or use our calendar widget to select a reminder date! To fix this, we needed to define and implement consistent keyboard behavior for all of our rich objects.

For most interactive HTML elements, such as buttons or links, the standard interaction pattern involves focusing the element and pressing Enter or Space to trigger the associated action. Unfortunately, our rich objects were inside a document editor where those two keystrokes are reserved for character insertion. Moreover, several of our objects actually trigger different behavior for clicking on vs. hovering over them: we wanted to provide the ability to trigger both actions via the keyboard.

After many conversations with PAC, we landed on a consistent model wherein rich objects in Quip have a primary and secondary actions associated with them. When the caret is located inside or on the object, primary actions are triggered upon pressing Mod-Enter, while secondary actions are triggered via Shift-Enter. Building out this framework in the codebase rapidly opened up entire swathes of functionality for users who can’t rely on a mouse.

Demo of keyboard interaction with a date mention in Quip

Consistent caret movement

Before our work on the editor, all inline rich objects (e.g., user and document mentions) were treated as immutable entities in the editor, and we didn’t allow users to move their cursors inside — if you pressed the right arrow when your cursor was immediately to the left of one of these objects, the cursor would skip to its end. To help enforce this, we had contenteditable=false set on the elements representing these objects inside the larger editor DOM.

Demo of how keyboard interaction with a date mention and a user mention worked in Quip before work began. Both are skipped over with a single arrow key press.

PAC helped us realize that this behavior was highly undesirable and inconsistent for screen reader users. What if a user encountered an incredibly long object title, such as “Folder for Q1 Financial Results and Planning for Q2”? With our previous behavior, screen reader users would hear the entire thing read out as one fast aural blob, and would not be able to review it at their own pace. As a result, we decided to enforce a new invariant: one left or right arrow keystroke would always move the caret by one character position in the editor.

Our discussions also surfaced an important accessibility design principle to keep in mind: different groups of people may genuinely prefer different behavior! In this case, some users preferred the clarity and efficiency of arrow navigation skipping over immutable inline objects. To allow users to choose the experience that worked best for them, we introduced two preferences in the Accessibility tab of the Quip user settings: “Improve screen reader support” and “Improve keyboard navigation”. Navigability into inline rich objects became the first significant bit of behavior controlled by the keyboard user preference.

We removed the contenteditable=false attribute from our rich objects to permit this arrow navigation, which meant we then needed to manually prevent users from editing the actual object content (it would be super weird to edit the text of a “Reminder” to “Reminddd”!) while their cursors were inside the relevant text. In the current implementation, if a keystroke which would normally insert a character fires inside one of these objects, a JavaScript event handler fires off an aria-live announcement that edits are banned for the current caret position. As a bonus, we’ve been able to reuse this codepath to enable users to move the caret inside of locked or Live Pasted content inside the editor without fear of accidental edits.

Enforcing this caret movement behavior also involved hiding the decorative icon characters in many of our rich objects (e.g., the alarm clock icon for reminders). We tried various attempts to set aria-hidden on just the icon or to provide it a helpful aria-label, but unfortunately we ran into our recurring issue of basic ARIA semantics getting utterly ignored inside a contenteditable node. As a result, we decided to just remove them from the DOM for users with the screen reader preference turned on.

Demo of how keyboard interaction with mentions works now. The caret is able to be moved into each mention, and decorative icons are hidden.

Finally, we created a more manageable navigation pattern for embedded media in the editor — complex widgets such as spreadsheets or live apps that are rendered as blocks of content rather than inline with text. These caused a great deal of keyboard navigation frustration, either by being completely non-interactive or by trapping the user’s focus without a clear or expedient way to return to the rest of the document. We decided to let users easily move over embedded media objects in the course of document navigation, while also creating a consistent pattern to move into and out of their content if desired.

Now, embedded media objects are represented by a single caret stop in the editor. If a user wishes to interact with the inner content of an embedded media object while its frame is selected, they can use the Mod+Enter primary action keyboard shortcut to move keyboard focus into a modal that allows them to explore and interact with the content. Closing that modal using Escape reselects the object’s frame and return focus to the editor node.

Demo of keyboard navigation into and out of a spreadsheet in Quip

Expressing Editor Semantics

List indentation semantics improvements from 2018 aside, we still had an entire world of confusingly-labeled or unlabeled editor structures which had no direct equivalents in supported ARIA roles or standard HTML (for example, background colors for text or inline formulae). We decided to address these using aria-roledescriptions, which provide more detailed descriptions of rich content where possible. For example, our date objects will read out “date button” now.

The rest of the work to express clear semantics got as close to a standard “fix the markup to match the ARIA spec” project as we could manage, including:

  • wrapping our checklists in the checkbox role and set aria-checked values accordingly
  • adding aria-live announcements when users navigate into locked or magic pasted sections
  • inserting headings to improve navigation and orient users to the presence of the document outline and various dialogs
  • implementing a pattern inspired by the ARIA 1.2 combobox spec to our inline autocomplete experience.

Drawbacks

Throughout this project, we found ourselves sandwiched between our pre-existing complex codebase that constrained how much of the DOM structure we could realistically change, and an ARIA spec that wasn’t necessarily always supported to the letter by every browser and screen reader we cared about. Even though we started off targeting only two browser and screen reader combinations to support, we filed and followed up on a lot of upstream bugs. A few examples include aria-roledescription being flaky in VoiceOver+Safari, the ::marker pseudoelement we try to use for rendering list bullets being unstable in various browsers, and NVDA+Chrome reporting off-by-one caret errors when moving over elements with role=button. We file, track, and try to follow up on these bugs whenever we can, but we unfortunately have no control over when or whether they get addressed and often must resort to bespoke hacks to get the best behavior we can given the current state of affairs.

Credit

A huge thanks to Christina Xu, the accessibility team PM, for being willing to dive directly into the deepest of technical weeds; Ben Cronin, who diligently implemented and tested the bulk of these thorny changes; Gabriel Adomnicai, for lending his expertise in the editor codebase; and Sina Bahram and James Scholes from Prime Access Consulting, for all their advocacy and aid in figuring out consistent interaction patterns.