This document describes requirements for the Extensible
MultiModal Annotation language (EMMA) specification under
development in the W3C Multimodal Interaction
Activity. EMMA is intended as a data format for the interface
between input processors and interaction management systems. It will
define the means for recognizers to annotate application specific
data with information such as confidence scores, time stamps, input
mode (e.g. key strokes, speech or pen), alternative recognition
hypotheses, and partial recognition results, etc. EMMA is a target
data format for the semantic interpretation specification being
developed in the Voice Browser Activity, and
which describes annotations to speech grammars for extracting
application specific data as a result of speech recognition. EMMA
supercedes earlier work on the natural language semantics markup
language in the Voice Browser Activity.
Status of this Document
This section describes the status of this document at the
time of its publication. Other documents may supersede this
document. The latest status of this document series is maintained
at the
W3C.
W3C's Multimodal
Interaction Activity is developing specifications for extending
the Web to support multiple modes of interaction. This document
provides the basis for guiding and evaluating subsequent work on a
specification for a data format (EMMA) that acts as an exchange
mechanism between input processors and interaction management
components in a multimodal application. These components are
introduced in the W3C Multimodal
Interaction Framework.
This document is a NOTE made available by the W3C for archival
purposes, and is not expected to undergo frequent changes. Publication
of this Note by W3C indicates no endorsement by W3C or the W3C Team,
or any W3C Members. A list of current W3C technical reports and
publications, including Recommendations, Working Drafts, and Notes
can be found at http://www.w3.org/TR/.
This document has been produced as part of the W3C Multimodal Interaction
Activity,
following the procedures set out for the W3C Process. The
authors of this document are members of the Multimodal Interaction
Working Group (W3C Members
only). This is a Royalty Free Working Group, as described in
W3C's Current
Patent Practice NOTE. Working Group participants are required
to provide patent
disclosures.
Please send comments about this document to the public mailing
list: [email protected] (public
archives). To subscribe, send an email to <[email protected]>
with the word subscribe in the subject line (include the
word unsubscribe if you want to unsubscribe).
Table of Contents
-
Introduction
-
1. Scope of EMMA
-
2. Data model requirements
-
3. Annotation requirements
-
4. Integration with other work
Introduction
Extensible MultiModal Annotation language (EMMA) is the markup language
used to represent human input to a multimodal application.
As such, it may be seen in terms of the W3C Multimodal Interaction Framework
as the exchange mechanism between
user input devices and the interaction management capabilities of an application.
General Principles
An EMMA document can be considered to hold three types of data:
-
instance data
The slots and values corresponding to input information
which is meaningful to the consumer of an EMMA document.
Instances are
application-specific and
built by input processors at runtime.
Given that utterances may be ambiguous with respect to input values,
an EMMA document may hold more than one instance.
-
data model
The constraints on structure and content of an instance.
The data model is typically pre-established by an application, and
may be implicit, that is, unspecified.
-
metadata
Annotations associated with the data contained in the instance.
Annotation values are added by input processors at runtime.
Given the assumptions above about the nature of data represented
in an EMMA document, the following general principles apply to the design of EMMA:
-
The
main prescriptive content
of the EMMA specification will consist of metadata: EMMA will provide a means
to express the metadata annotations which require standardization.
(Notice, however, that such annotations may express
the relationship among all the types of data within an EMMA document.)
-
The instance and its data model is assumed to be specified in XML, but EMMA
will remain agnostic to the XML format used to express these. (The
instance XML is assumed to be sufficiently structured to enable the association
of annotative data.)
The following sections apply these principles in terms of the scope of EMMA,
the requirements on the contents and syntax of data model and annotations, and
EMMA integration with other work.
-
EMMA must be able to represent the following kinds of input:
-
1.1 input in any human language
-
1.2 input from the modalities and
devices specified in the next section
-
input reflecting the results of the following processes:
-
1.3 token interpretation from signal
(e.g. speech+SRGS)
-
1.4 semantic interpretation from
token/signal (e.g. text+NL parsing/speech+SRGS+SI)
-
input gained in any of the following ways:
-
1.5 single modality input
-
1.6 sequential modality input,
that is:
single-modality inputs presented in sequence
-
1.7 simultaneous modality input (as
defined in the main MMI requirements doc).
-
1.8 composite modality input (as
defined in the main MMI requirements doc).
-
EMMA must be able to represent input from the following modalities, devices and
architectures:
-
human language input modalities
-
1.9 text
-
1.10 speech
-
1.11 handwriting
-
1.12 other modalities identified by
the MMI Requirements document as required
-
1.13 combinations of the above
modalities
-
devices
-
1.14 telephones (i.e. no device
processing, proxy agent)
-
1.15 thin clients (i.e. limited
device processing)
-
1.16 rich clients (i.e. powerful
device processing)
-
1.17 everything in this range
-
known and foreseeable network configurations
-
1.18 architectures
-
1.19 protocols
-
1.20 extensibility to further
devices and modalities
-
Representation of output and other uses
EMMA
is considered primarily
as a representation of user input, and it is in this context that the rest of
this document defines the requirements on EMMA. Given that the focus of EMMA is on
meta information, sufficient need is not seen at this stage to define standard
annotations for system output
nor for general message content between system components.
However, the following requirement is included
to ensure that EMMA may still be used in these cases where necessary.
- 1.21 The
following uses of EMMA must not be precluded:
- a representation from which system output markup may
be generated;
- a language for general purpose communication among
system components.
-
Ease of use and portability
-
1.22 EMMA content must be accessible
via standard means (e.g. XPath).
-
1.23 Queries on EMMA content must be
easy to author.
-
1.24 The EMMA specification must
enable portability of EMMA documents across applications.
-
Data model content
The following requirements apply to the use of data models in EMMA
documents
-
2.1 use of a data model and
constraints must be possible, for the purposes of validation and
interoperability
-
2.2 use of a data model will not be
required
-
in other words, it must be possible to rely on an implicit data model.
-
2.3
it must be possible in a single EMMA document
to associate different data models with different instances
It is assumed that the combination and decomposition of data models
will be supported by data model description formats (e.g. XML Schema),
and that the comparison of data models is enabled by standard
XML comparison mechanisms (e.g. use of XSLT, XPath). Therefore this functionality
is not considered a requirement on EMMA data modelling.
-
Data model description formats
The following requirements apply to the description format of data
models used in EMMA documents
-
2.4 existing standard formats must
be able to be used, for example:
-
arbitrary XML
-
XML Schema
-
XForms
-
2.5 no single description format is
required
The use of a data model in EMMA is for the purpose of
validating an EMMA instance against the constraints of a data model.
Since Web applications today use different formats to specify data models, e.g.
XML Schema, XForms, Relax-NG, etc., the principle that EMMA does not require
a single format enables EMMA to be used in a variety of application contexts.
The concern that this may lead to problems of interoperability has been discussed,
and will be reviewed during production of the specification.
-
2.6 data model declarations
must be able to be specified
inline or referenced
-
Annotation content
EMMA must enable the specification of the following features.
For each annotation feature, "local" annotation is assumed: that is, that the
association of the annotation may be at any level within the instance
structure, and not only at the highest level.
-
General meta data
-
3.1 lack of input
-
3.2 uninterpretable input
-
3.3 identification of input source
-
3.4 time stamps
-
3.5 relative positioning of input
events
(NB: This requirement is covered explicitly by time stamps, but reflects use of
EMMA in environments in which times tamping may not be possible.)
-
3.6 temporal grouping of input
events
-
3.7 human language of input
-
3.8 identification of input modality
-
Annotational structure
-
3.9 association to corresponding
instance element annotated
-
3.10 reference to data model
definition
-
3.11 composite multimodal input:
representation of input from multiple modalities.
-
Recognition (signal --> tokens processing)
-
3.12 reference to signal
-
3.13 reference to processing used
(e.g. SRGS grammar)
-
3.14 tokens of utterance
-
3.15 ambiguity
This enables a tree-based representation of local ambiguity. That is,
alternatives are expressible for given nodes in the structure.
-
3.16 confidence scores of
recognition
-
Interpretation (tokens --> semantic processing)
-
3.17 tokens of utterance
-
3.18 reference to processing used
(e.g. SRGS)
-
3.19 ambiguity
-
3.20 confidence scores of
interpretation
-
Recognition and Interpretation (signal --> semantic processing)
-
3.21 union of
Recognition/Interpretation features,
(e.g. SRGS + SI)
-
Modality-dependent annotations
-
3.22 EMMA must be extensible to
annotations which are specific to particular modalities, e.g. those of:
4.1 Where such alignment
is appropriate, EMMA must enable the use and integration of widely adopted
standard specifications and features. The following activities are considered
most relevant in this respect:
-
W3C activities
-
MMI activities
-
MMI general requirements
-
Events subgroup requirements
-
Integration subgroup requirements
-
Ink subgroup requirements
-
Voice Browser activities
-
SRGS: EMMA must enable results from speech using SRGS
-
SI: EMMA must enable results from speech using SRGS with SI output
-
Other W3C activities
-
Relevant XML-related activities
-
RDF working group
-
Other organizations and standards