Copyright © 2021-2022 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This document outlines accessibility-related user needs, requirements and scenarios for natural language interfaces. These user needs should influence accessibility requirements in related specifications and in the design of applications that include natural language interfaces. The concept of a natural language interface is first clarified. User needs and associated requirements are then described.
This document is not a collection of baseline requirements. Some requirements may be implemented at a system or platform level and others at the application level.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Accessible Platform Architectures Working Group as a Group Draft Note using the Note track.
To comment, please open a new issue in the WAI-Adapt GitHub repository, if it's not feasible for you to use GitHub, send comments in plain text e-mail to: [email protected], include [NAUR] as the beginning of your subject line of your email. Please include your comments in the body of the message, not as a binary attachment which we will be unable to process. Please send comments by 16 October 2022.
Group Draft Notes are not endorsed by W3C nor its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The W3C Patent Policy does not carry any licensing requirements or commitments on this document.
This document is governed by the 2 November 2021 W3C Process Document.
A natural language interface is a user interface in which the user and the system communicate via a natural (human) language. The user provides input via speech or some other method, and the system generates responses in the form of utterances delivered by speech, text or some other method.
Systems that provide natural language interfaces often support spoken interaction. In this case, speech recognition processes the user's input, and speech synthesis generates spoken responses. However, the use of speech is not essential to a natural language interface.
Typical examples of natural language interfaces include:
These examples are not definitive. Variations of the examples and applications that do not fit these patterns are possible.
Natural language interfaces can be made accessible users with disabilities at the platform and application levels via multiple modes of input and output. For example, some users with physical disabilities may need speech input, while others may need a keyboard, switch input, an eye tracking system, or some combination.
Similarly, natural language output may be spoken or visually displayed as text. These and other requirements are detailed below. These requirements may best be satisfied by an assistive technology. For example, a chat bot that lacks a spoken interface may satisfy a user's need for speech input via a browser or operating system dictation function.
For some disability types, the requirements for authors and designers are straightforward. At the heart of current accessibility testing are technical code specifications that map to accessibility requirements and can be tested and verified to check if certain statements are true or false. For some disability types this may be more of a support continuum rather than a binary model. In some of these areas the criteria for these models may not be clear. A user interface that is responsive, and can be personalized to support shifting user needs, is a good example.
Current work in accessibility guidelines and standards is moving toward accommodating these new ways of measuring more subjective accessibility requirements that support the needs of people with disabilities but may not be easily measured in a binary fashion.
With this in mind, in the context of Natural Language Interfaces, it’s especially important that application design support the cognitive needs of users,especially if the interface includes speech input because speech input is cognitively taxing In several ways.
Speech input commands must be quickly called to mind, which requires cognitive effort for experienced users and more effort for new users. Speech input also taxes attention. Similar to type-ahead results, speech input results must be watched to make sure that the computer has not made a wording mistake that may be difficult to figure out later. At the same time, the language centre of the brain, used for speech input, is also used for many of the thought processes that go into things people do on computers such as writing or coding. Good design can mitigate the extra cognitive effort and is doubly important for those who have learning or cognitive disabilities. Good practices such as discoverability, ease of use and simple affordances are important considerations in making natural language interaction viable for all users and may require particular understanding when designing these interfaces.
For example, there are particular challenges for people with cognitive disabilities using interfaces that rely on memory. The design should accomodate this need by providing step-by-step instructions. By reminding users how many steps they have completed and how many more steps they need to complete supports the user's memory, rather than relying on it.
When speech input is used, it is important to provide command prompts if needed, so users do not have to rely on their memories to come up with commands at the same time as they are completing steps. There are other user needs patterns relating to supporting the needs of people with cognitive disabilities that can be referred to in 'Making Content Usable for People with Cognitive and Learning Disabilities'. [content-usable]
Voice user interfaces (VUI) using speech such as those found on a range of commercially available devices for home and mobile use represent a part of the stack that make up natural language interfaces. This document aims to identify accessibility related user needs and requirements for VUIs and indicate further areas of work and research in terms of how they relate to new standards like WCAG 3 and other emerging technologies.
Natural language interfaces frequently occur as components of larger user interfaces and systems. For example, a chat bot may be included in a web application. A natural language interface may be an essential part of a multi-modal application that uses a combination of language and gestural inputs. An example would be an interactive navigation tool that allows the user to issue spoken commands and to interact with a graphical map with a pointing device.
The scope of this document is largely confined to the accessibility of the natural language aspect of the over-all user interface. It is concerned with the accessibility of natural language interactions to users with disabilities.
Behind these interfaces there are services that provide core processing, evaluation and content. This document aims to look at these services and determine to what degree they can and should support the needs of people with disabilities; what system requirements are, or where further research is needed.
Ideally by satisfying system requirements, developers of platforms and applications offering natural language interfaces can meet corresponding user needs. Currently, no stance is taken in this document regarding which needs are best satisfied at the platform level, by an assistive technology, or in the development of applications, but this will change as the document develops. These architectural considerations are left to be decided by system designers, and therefore there may be requirements in accessible system design that they need to be aware of. Often, they also depend on the services provided by the underlying operating system or by the web platform.
If natural language interaction is provided as part of a system that also offers other styles of interaction, this document should be read in combination with guidance provided elsewhere which is relevant to the other interface and service aspects. Notably,
As a general principle, the entire interface of a system or application needs to be accessible to users with disabilities. If only the natural language interaction component is accessible, some users will be unable to complete tasks successfully. For example, a smart agent that answers a user's questions by searching the web for information and then displaying it on screen is only accessible as a whole if both the interaction and the presentation of the information satisfy the user's access needs. If the on-screen information is not accessible, then the user cannot complete the task of acquiring and understanding the information requested.
The term 'user needs' in this document relates to what people with various disabilities need to successfully use natural language interfaces. User needs are dependent on the context in which an application is used, including the user's capabilities and the environmental conditions in which interaction with the interface takes place. For example, a spoken interaction would be inaccessible to a person who is deaf, or to a hearing person situated in a noisy environment. Although disability-related needs are the focus of this document, the user needs described here are not limited to people with specific types of disability. The capabilities of users vary greatly. They include a variety of physical, sensory, learning and cognitive abilities that should be taken into account in the design of platforms and applications.
This section outlines a variety of user needs and system requirements that can satisfy them.
To achieve adequate security, voice identification may need to be combined with other factors of authentication.
In some cases, this requirement can be met simply by using authentication mechanisms provided by the underlying operating system or browser environment.
This requirement can often be met by supporting the input methods available from the underlying platform, including assistive technologies.
If software that incorporates a natural language interface supports multiple input mechanisms, support for any specific mechanism may be available only on particular hardware devices or in particular environments. For example, a smart speaker may support only speech input, whereas the same smart agent running on a mobile system such as a phone or tablet may support text input via a keyboard or any device capable of emulating a keyboard.
See the requirement to support a keyboard interface specified in WCAG 2.1 [WCAG21], success criterion 2.1.1.
This requirement can often be met by supporting the output methods available from the underlying platform, including assistive technologies.
If software that incorporates a natural language interface supports multiple output mechanisms, support for any specific mechanism may be available only on particular hardware devices or in particular environments. For example, a smart speaker supports only audio/sound output, whereas the same smart agent running on a mobile system, such as a phone or tablet, may support a visual display as well, and be compatible with braille devices.
Support for braille displays is assumed to be provided by a screen reader running under the device's operating system. Therefore, support for keyboard input and textual output is the stated requirement for the natural language interface itself, leaving interaction with the braille hardware to the operating system on which the user interface is run.
At present, it is generally infeasible to implement REQ 10a and REQ 10b with sufficient reliability and accuracy to be useful. Sign language processing (including automatic recognition, translation, and production of sign languages) involves challenging research problems. See [Bragg-et-al] for details. These two requirements are nevertheless stated here to encourage further research and development efforts.
Sign languages vary by country and region. Therefore, multiple sign languages may need to be supported, depending on the intended audience of the system.
REQ 13b is only an appropriate strategy if the system's confidence measure is strongly correlated with its actual recognition accuracy for people with speech-related disabilities. This correlation should be established empirically, in a variety of real use contexts, before relying on this approach. Otherwise, the system's prompting for input to be repeated or for confirmation will be insufficiently associated with cases of genuine recognition error.
To ensure this user interface is accessible, it should satisfy relevant accessibility requirements drawn from this document or elsewhere. For example, a system could provide spoken commands, and a settings dialogue in a graphical user interface, as alternative mechanisms for configuring speech properties.
In some cases, this requirement can be met by capabilities of the operating system or browsing environment.
See the text spacing requirement specified in WCAG 2.1 [WCAG21], success criterion 1.4.12.
This need is particularly applicable to systems which can serve a wide range of requests, such as personal assistants. All users need to know how to interact with a system to start using it. It is important that people with cognitive disabilities can easily access designs that make two things obvious: what the system does and how to set about doing it.
Commands for performing a variety of functions typically supported by speech interfaces used for telephony and multimedia applications are standardized in [ETSI-ES-202-076].
See WCAG 2.1 [WCAG21], success criterion 3.3.4.
The mode of operation described in requirement 19d may be distracting or anxiety-provoking for some users. Therefore, it should be optional.
See WCAG 2.1 [WCAG21], success criteria 2.2.1, 2.2.3, and 2.2.6.
See WCAG 2.1 [WCAG21], success criteria 3.1.3, 3.1.4, and 3.1.5.
See WCAG 2.1 [WCAG21], success criteria 3.3.1, 3.3.3, 3.3.4, and 3.3.6.
See WCAG 2.1 [WCAG21], success criterion 3.3.6.
The purpose of this multimodal presentation of text is to enhance comprehension of the material, especially by people with learning disabilities that affect reading.
Information presented graphically must also be available as text. See '6.2 Means of input and output' above.
This work is supported by the EC-funded WAI-Guide Project.