Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
The W3C Multimodal Interaction Working Group aims to develop specifications to enable access to the Web using multimodal interaction. This document is part of a set of specifications for multimodal systems, and provides details of an XML markup language for containing and annotating the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the 27 June 2013 Second Public Working Draft of "EMMA: Extensible MultiModal Annotation markup language Version 1.1". It has been produced by the Multimodal Interaction Working Group, which is part of the Multimodal Interaction Activity.
This specification describes markup for representing interpretations of user input (speech, keystrokes, pen input etc.) together with annotations for confidence scores, timestamps, input medium etc., and forms part of the proposals for the W3C Multimodal Interaction Framework.
The EMMA: Extensible
Multimodal Annotation 1.0 specification was published as a W3C
Recommendation in February 2009. Since then there have been
numerous implementations of the standard and extensive feedback has
come in regarding desired new features and clarifications requested
for existing features. The W3C Multimodal Interaction Working Group
examined a range of different use cases for extensions of the EMMA
specification and published a W3C Note on Use Cases for Possible
Future EMMA Features [
EMMA Use Cases]. In this working draft of EMMA 1.1, we have
developed a set of new features based on feedback from implementers
and have also added clarification text in a number of places
throughout the specification. The new features include: support for
adding human annotations (emma:annotation
,
emma:annotated-tokens
), support for inline
specification of process parameters (emma:parameters
,
emma:parameter
, emma:parameter-ref
),
support for specification of models used in processing beyond
grammars (emma:process-model
,
emma:process-model-ref
), extensions to
emma:grammar
to enable inline specification of
grammars, a new mechanism for indicating which grammars are active
(emma:grammar-active
, emma:active
),
support for non-XML semantic payloads
(emma:result-format
), support for multiple
emma:info
elements and reference to the
emma:info
relevant to an interpretation
(emma:info-ref
), and a new attribute to complement the
emma:medium
and emma:mode
attributes that
enables specification of the modality used to express an input
(emma:expressed-through
).
The changes from the last working draft are:
emma:location
element was added for
specification of the location of the device or sensor which captured
the input.ref
attribute was added to a number of elements
allowing for shorter EMMA documents which use URIs to point to content
stored outside of the document: emma:one-of
,
emma:sequence
, emma:group
,
emma:info
, emma:parameters
,
emma:lattice
.
emma:partial-content
is introduced
which indicates whether the content in an element
with ref
, is the full content or whether it is partial
and more can be retrieved by following the URI
in ref
.emma:emma
element is extended
with doc-ref
and prev-doc
attributes that
indicate where the document can be retrieved from and where the
previous document in a sequence of inputs can be retrieved
from.emma:lattice
is also extended so
that an EMMA document can contain both a N-best and a lattice
side-by-side.Also changes from EMMA 1.0 can be found in Appendix F.
Comments are welcome on [email protected] (archive). See W3C mailing list and archive usage guidelines.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The sections in the main body of this document are normative unless otherwise specified. The appendices in this document are informative unless otherwise indicated explicitly.
All sections in this specification are normative, unless otherwise indicated. The informative parts of this specification are identified by "Informative" labels within sections.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
emma:model
elementemma:derived-from
element and
emma:derivation
elementemma:grammar
elementemma:grammar-active
elementemma:info
elementemma:endpoint-info
element and
emma:endpoint
elementemma:process-model
elementemma:parameters
and
emma:parameter
elementsemma:annotation
elementemma:location
elementemma:tokens
attributeemma:process
attributeemma:no-input
attributeemma:uninterpreted
attributeemma:lang
attributeemma:signal
and
emma:signal-size
attributesemma:media-type
attributeemma:confidence
attributeemma:source
attributeemma:medium
,
emma:mode
, emma:function
,
emma:verbal,emma:device-type,
and
emma:expressed-through
attributesemma:hook
attributeemma:cost
attributeemma:endpoint-role
,
emma:endpoint-address
, emma:port-type
,
emma:port-num
, emma:message-id
,
emma:service-name
,
emma:endpoint-pair-ref
,
emma:endpoint-info-ref
attributesemma:grammar
element: emma:grammar-ref
attributeemma:model
element: emma:model-ref
attributeemma:dialog-turn
attributeemma:result-format
attributeemma:info
element: emma:info-ref
attributeemma:process-model
element:
emma:process-model-ref
attributeemma:parameters
element:
emma:parameter-ref
attributeemma:annotated-tokens
attributeemma:partial-content
emma:hook
and SRGS (Informative)This section is Informative.
This document presents an XML specification for EMMA, an Extensible MultiModal Annotation markup language, responding to the requirements documented in Requirements for EMMA [EMMA Requirements]. This markup language is intended for use by systems that provide semantic interpretations for a variety of inputs, including but not necessarily limited to, speech, natural language text, GUI and ink input.
It is expected that this markup will be used primarily as a standard data interchange format between the components of a multimodal system; in particular, it will normally be automatically generated by interpretation components to represent the semantics of users' inputs, not directly authored by developers.
The language is focused on annotating single inputs from users, which may be either from a single mode or a composite input combining information from multiple modes, as opposed to information that might have been collected over multiple turns of a dialog. The language provides a set of elements and attributes that are focused on enabling annotations on user inputs and interpretations of those inputs.
An EMMA document can be considered to hold three types of data:
instance data
Application-specific markup corresponding to input information which is meaningful to the consumer of an EMMA document. Instances are application-specific and built by input processors at runtime. Given that utterances may be ambiguous with respect to input values, an EMMA document may hold more than one instance.
data model
Constraints on structure and content of an instance. The data model is typically pre-established by an application, and may be implicit, that is, unspecified.
metadata
Annotations associated with the data contained in the instance. Annotation values are added by input processors at runtime. In EMMA 1.1 annotations may also result from transcription and other activities by human annotators.
Given the assumptions above about the nature of data represented in an EMMA document, the following general principles apply to the design of EMMA:
emma:result-format
attribute. EMMA will remain agnostic to the specific details of the
format (If it is XML, the instance data is assumed to be
sufficiently structured to enable the association of annotative
data.)emma:info
element (Section
4.1.5).The annotations of EMMA should be considered 'normative' in the sense that if an EMMA component produces annotations as described in Section 3 and Section 4, these annotations must be represented using the EMMA syntax. The Multimodal Interaction Working Group may address in later drafts the issues of modularization and profiling; that is, which sets of annotations are to be supported by which classes of EMMA component.
The general purpose of EMMA is to represent information automatically extracted from a user's input by an interpretation component, where input is to be taken in the general sense of a meaningful user input in any modality supported by the platform. The reader should refer to the sample architecture in W3C Multimodal Interaction Framework [MMI Framework], which shows EMMA conveying content between user input modality components and an interaction manager.
Components that generate EMMA markup:
Components that use EMMA include:
Although not a primary goal of EMMA, a platform may also choose to use this general format as the basis of a general semantic result that is carried along and filled out during each stage of processing. In addition, future systems may also potentially make use of this markup to convey abstract semantic content to be rendered into natural language by a natural language generation component.
emma:time-ref-uri
,
emma:time-ref-anchor-point
allows you to specify
whether the referenced anchor is the start or end of the
interval.anyURI
primitive as defined in XML Schema Part 2:
Datatypes Second Edition Section 3.2.17 [SCHEMA2].This section is Informative.
As noted above, the main components of an interpreted user input in EMMA are the instance data, an optional data model, and the metadata annotations that may be applied to that input. The realization of these components in EMMA is as follows:
An EMMA interpretation is the primary unit for holding user input as interpreted by an EMMA processor. As will be seen below, multiple interpretations of a single input are possible.
EMMA provides a simple structural syntax for the organization of interpretations and instances, and an annotative syntax to apply the annotation to the input data at different levels.
An outline of the structural syntax and annotations found in EMMA documents is as follows. A fuller definition may be found in the description of individual elements and attributes in Section 3 and Section 4.
emma:emma
element, holds EMMA version and namespace
information, and provides a container for one or more of the
following interpretation and container elements (Section
3.1)emma:interpretation
element
contains a given interpretation of the input and holds application
specific markup (Section
3.2)emma:one-of
is a container for one or more
interpretation elements or container elements and denotes that
these are mutually exclusive interpretations (Section
3.3.1)emma:group
is a general container for one or more
interpretation elements or container elements. It can be associated
with arbitrary grouping criteria (Section
3.3.2).emma:sequence
is a container for one or more
interpretation elements or container elements and denotes that
these are sequential in time (Section
3.3.3).emma:lattice
element is used to
contain a series of emma:arc
and
emma:node
elements that define a lattice of words,
gestures, meanings or other symbols. The emma:lattice
element appears within the emma:interpretation
element
(Section
3.4)emma:literal
element is used as a
wrapper when the application semantics is a string literal. (Section
3.5)emma:derived-from
, emma:endpoint-info
,
and emma:info
which are represented as elements so
that they can occur more than once within an element and can
contain internal structure. (Section
4.1)emma:start
, emma:end
,
emma:confidence
, and emma:tokens
which
are represented as attributes. They can appear on
emma:interpretation
elements. Some can
appear on container elements, lattice elements, and elements in the
application-specific markup. (Section
4.2)From the defined root node emma:emma
the structure
of an EMMA document consists of a tree of EMMA container elements
(emma:one-of
, emma:sequence
,
emma:group
) terminating in a number of interpretation
elements (emma:interpretation
). The
emma:interpretation
elements serve as wrappers for
either application namespace markup describing the interpretation
of the users input or an emma:lattice
element or
emma:literal
element . A single
emma:interpretation
may also appear directly under the
root node.
The EMMA elements emma:emma
,
emma:interpretation
, emma:one-of
, and
emma:literal
and the EMMA attributes
emma:no-input
, emma:uninterpreted
,
emma:medium
, and emma:mode
are required
of all implementations. The remaining elements and attributes are
optional and may be used in some implementations and not other
depending on the specific modalities and processing being
represented.
To illustrate this, here is an example of an EMMA document representing input to a flight reservation application. In this example there are two speech recognition results and associated semantic representations of the input. The system is uncertain whether the user meant "flights from Boston to Denver" or "flights from Austin to Denver". The annotations to be captured are timestamps and confidence scores for the two inputs.
Example:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="r1" emma:start="1087995961542" emma:end="1087995963542"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="int1" emma:confidence="0.75"
emma:tokens="flights from boston to denver">
<origin>Boston</origin>
<destination>Denver</destination>
</emma:interpretation>
<emma:interpretation id="int2" emma:confidence="0.68"
emma:tokens="flights from austin to denver">
<origin>Austin</origin>
<destination>Denver</destination>
</emma:interpretation>
</emma:one-of>
</emma:emma>
Attributes on the root emma:emma
element indicate
the version and namespace. The emma:emma
element
contains an emma:one-of
element which contains a
disjunctive list of possible interpretations of the input. The
actual semantic representation of each interpretation is within the
application namespace. In the example here the application specific
semantics involves elements origin
and
destination
indicating the origin and destination
cities for looking up a flight. The timestamp is the same for both
interpretations and it is annotated using values in milliseconds in
the emma:start
and emma:end
attributes on
the emma:one-of
. The confidence scores and tokens
associated with each of the inputs are annotated using the EMMA
annotation attributes emma:confidence
and
emma:tokens
on each of the
emma:interpretation
elements.
Attributes in EMMA cascade from a containing
emma:one-of
element to the individual interpretations.
In the example above, the emma:start
,
emma:end
, emma:medium
, and
emma:mode
attributes are all specified once on
emma:one-of
but apply to both of the contained
emma:interpretation
elements. This is an important
mechanism as it limits the need to repeat annotations. More details
on the scope of annotations among EMMA structural elements, and
also on the scope of annotations within derivations, where multiple
different processing stages apply to an input, can be found in Section
4.3.
Many EMMA elements allow for content to be
specified either inline or by reference using the ref
attribute. This is an important mechanism as it allows for EMMA
documents to be less verbose and yet allows the EMMA consumer to
access content from an external document, possibly on a remote
server. For example, in the case of emma:grammar
a
grammar can either be specified inline within the element or the
ref
attribute on emma:grammar
can
indicate the location where the grammar document can be retrieved.
Similarly with emma:model
a data model can be
specified inline or by reference through the ref
attribute. A ref
attribute can also be used on the
EMMA container elements emma:sequence
,
emma:one-of
, emma:group
, and
emma:lattice
. In these cases, the ref
attribute provides a pointer to a portion of an external EMMA
document, possibly on a remote server. This can be achieved using
URI ID references to pick out a particular element within the
external EMMA document. One use case for ref
with the
container elements is to allow for inline content to be partial and
for the ref
to provide access to the full content. For
example, in the case of emma:one-of
, an EMMA document
delivered to an EMMA consumer could contain an abbreviated list of
interpretations, e.g. the top 3, while an emma:one-of
element accessible through the URI in ref to include a more
inclusive list of 20 emma:interpretation
elements. The
emma:partial-content
attribute MUST be used on the
partially specified element if the ref
refers to a
more fully specified element. The emma:ref
attribute
can also be used on emma:info
,
emma:parameters
, and
emma:annotation
. The use of
ref
on specific elements is described and exemplified
in the specific section describing each element.
An EMMA data model expresses the constraints on the structure and content of instance data, for the purposes of validation. As such, the data model may be considered as a particular kind of annotation (although, unlike other EMMA annotations, it is not a feature pertaining to a specific user input at a specific moment in time, it is rather a static and, by its very definition, application-specific structure). The specification of a data model in EMMA is optional.
Since Web applications today use different formats to specify data models, e.g. XML Schema Part 1: Structures Second Edition [XML Schema Structures], XForms 1.0 (Second Edition) [XFORMS], RELAX NG Specification [RELAX-NG], etc., EMMA itself is agnostic to the format of data model used.
Data model definition and reference is defined in Section 4.1.1.
An EMMA attribute is qualified with the EMMA namespace prefix if the attribute can also be used as an in-line annotation on elements in the application's namespace. Most of the EMMA annotation attributes in Section 4.2 are in this category. An EMMA attribute is not qualified with the EMMA namespace prefix if the attribute only appears on an EMMA element. This rule ensures consistent usage of the attributes across all examples.
Attributes from other namespaces are permissible on all EMMA
elements. As an example xml:lang
may be used to
annotate the human language of character data content.
This section defines elements in the EMMA namespace which provide the structural syntax of EMMA documents.
emma:emma
Annotation | emma:emma |
---|---|
Definition | The root element of an EMMA document. |
Children | The emma:emma element MUST immediately contain a
single emma:interpretation element or EMMA container
element: emma:one-of , emma:group ,
emma:sequence . It MAY also contain an optional single
emma:derivation element. It MAY also contain multiple
optional emma:grammar elements,
emma:model elements, and
emma:endpoint-info elements,
emma:info elements,
emma:process-model elements,
emma:parameters elements, and
emma:annotation elements. It
may also contain a single emma:location
element. |
Attributes |
|
Applies to | None |
The root element of an EMMA document is named
emma:emma
. It holds a single
emma:interpretation
or EMMA container element
(emma:one-of
, emma:sequence
,
emma:group
). It MAY also contain a single
emma:derivation
element containing earlier stages of
the processing of the input (See Section
4.1.2). It MAY also contain multiple optional
emma:grammar
, emma:model
, and
emma:endpoint-info
, emma:info
,
emma:process-model
, emma:parameters
, and
emma:annotation
elements.
It MAY hold attributes for information pertaining to EMMA
itself, along with any namespaces which are declared for the entire
document, and any other EMMA annotative data. The
emma:emma
element and other elements and attributes
defined in this specification belong to the XML namespace
identified by the URI "http://www.w3.org/2003/04/emma". In the
examples, the EMMA namespace is generally declared using the
attribute xmlns:emma
on the root
emma:emma
element. EMMA processors MUST support the
full range of ways of declaring XML namespaces as defined by the
Namespaces in XML 1.1 (Second Edition) [XMLNS].
Application markup MAYMUST be
declared either in an explicit application namespace,
or an undefined namespace by setting xmlns="".
For example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma"> .... </emma:emma>
or
<emma version="1.1" xmlns="http://www.w3.org/2003/04/emma"> .... </emma>
The optional attributes doc-ref
and
prev-doc
MAY be used on emma:emma
in
order to indicate the location where the EMMA document comprising
that emma:emma
element can be retrieved from, and the
location of the previous EMMA document in a sequence of
interactions. One important use case for doc-ref
is
for client side logging. A client receiving an EMMA document can
record the URI found in doc-ref
in a log file instead
of a local copy of the whole EMMA document. The
prev-doc
attribute provides a mechanism for tracking a
sequence of EMMA documents representing the results of processing
distinct turns of interaction by an EMMA processor.
In the following example, doc-ref
on
EMMA provides a URI which indicates where the EMMA document
embodied in this emma:emma
can be retrieved from.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
doc-ref="http://example.com/trainapp/user123/emma0727080512.xml">
<emma:interpretation id="int1"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:signal="http://example.com/audio/input678.amr"
emma:process="http://example.com/asr/params.xml"
emma:tokens="trains to london tomorrow">
<destination>London</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:emma>
In the following example, again doc-ref
indicates where the EMMA document can be retrieved from but in
addition prev-doc
indicates where the previous EMMA
document can be retrieved from.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
doc-ref="http://example.com/trainapp/user123/emma0730080512.xml"
prev-doc="http://example.com/trainapp/user123/emma0727080512.xml">
<emma:interpretation id="int1"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:signal="http://example.com/audio/input679.amr"
emma:process="http://example.com/asr/params.xml"
emma:tokens="from cambridge">
<origin>Cambridge</origin>
</emma:interpretation>
</emma:emma>
EMMA processors may be use a number of different
techniques to determine the prev-doc
. It may, for
example, be determined based on the session. In a session of
interaction a server processing requests for processing can track
the previous EMMA result for a client and indicate that in
prev-doc
. Alternatively, the URI of the last EMMA
result could be passed in as a parameter in a request to an EMMA
processor and returned in the prev-doc
with the next
result.
emma:interpretation
Annotation | emma:interpretation |
---|---|
Definition | The emma:interpretation element acts as a wrapper
for application instance data or lattices. |
Children | The emma:interpretation element MUST immediately
contain either application instance data, or a single
emma:lattice element, or a single
emma:literal element, or in the case of uninterpreted
input or no input emma:interpretation
MUST be empty. It MAY also contain multiple
optional emma:derived-from
elements and an optional single
emma:info element. It MAY also
contain multiple optional emma:annotation elements. It
MAY also contain multiple emma:parameters elements. It
MAY also contain a single optional emma:grammar-active
element. It may also contain a single
emma:location element. |
Attributes |
|
Applies to | The emma:interpretation element is legal only as a
child of emma:emma , emma:group ,
emma:one-of , emma:sequence , or
emma:derivation . |
The emma:interpretation
element holds a single
interpretation represented in application specific markup, or a
single emma:lattice
element, or a single
emma:literal
element.
The emma:interpretation
element MUST be empty if it
is marked with emma:no-input="true"
(Section
4.2.3). The emma:interpretation
element
MUST be empty if it has been annotated with
emma:uninterpreted="true"
(Section
4.2.4) or emma:function="recording"
(Section
4.2.11).
Attributes:
xsd:ID
value that uniquely
identifies the interpretation within the EMMA document.<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="r1" emma:medium="acoustic" emma:mode="voice"> ... </emma:interpretation> </emma:emma>
While emma:medium
and emma:mode
are
optional on emma:interpretation
, note that all EMMA
interpretations must be annotated for emma:medium
and
emma:mode
, so either these attributes must appear
directly on emma:interpretation
or they must appear on
an ancestor emma:one-of
node or they must appear on an
earlier stage of the derivation listed in
emma:derivation
.
emma:one-of
elementAnnotation | emma:one-of |
---|---|
Definition | A container element indicating a disjunction among a collection of mutually exclusive interpretations of the input. |
Children | The emma:one-of element MUST immediately contain a
collection of one or more emma:interpretation elements
or container elements: emma:one-of ,
emma:group , emma:sequence UNLESS it is annotated with ref . It
MAY also contain multiple optional
emma:derived-from elements and multiple
emma:info elements. It
MAY also contain multiple optional emma:annotation
elements. It MAY also contain multiple optional
emma:parameters elements. It MAY also contain a single
optional emma:grammar-active element. It MAY also contain a single emma:lattice
element containing the lattice result for the same input.
It may also contain a single
emma:location element. |
Attributes |
|
Applies to | The emma:one-of element MAY only appear as a child
of emma:emma , emma:one-of ,
emma:group , emma:sequence , or
emma:derivation . |
The emma:one-of
element acts as a container for a
collection of one or more interpretation
(emma:interpretation
) or container elements
(emma:one-of
, emma:group
,
emma:sequence
), and denotes that these are mutually
exclusive interpretations.
An N-best list of choices in EMMA MUST be represented as a set
of emma:interpretation
elements contained within an
emma:one-of
element. For instance, a series of
different recognition results in speech recognition might be
represented in this way.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="r1" emma:medium="acoustic" emma:mode="voice"
ref="http://www.example.com/i156/emma.xml#r1>
<emma:interpretation id="int1">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation id="int2">
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
</emma:emma>
The function of the emma:one-of
element is to
represent a disjunctive list of possible interpretations of a user
input. A disjunction of possible interpretations of an input can be
the result of different kinds of processing or ambiguity. One
source is multiple results from a recognition technology such as
speech or handwriting recognition. Multiple results can also occur
from parsing or understanding natural language. Another possible
source of ambiguity is from the application of multiple different
kinds of recognition or understanding components to the same input
signal. For example, an single ink input signal might be processed
by both handwriting recognition and gesture recognition. Another is
the use of more than one recording device for the same input
(multiple microphones).
The optional ref
attribute indicates a
location where a copy of the content within the
emma:one-of
element can be retrieved from an external
document, possibly located on a remote server.
In order to make explicit these different kinds of multiple
interpretations and allow for concise statement of the annotations
associated with each, the emma:one-of
element MAY
appear within another emma:one-of
element. If
emma:one-of
elements are nested then they MUST
indicate the kind of disjunction using the attribute
disjunction-type
. The values of
disjunction-type
are {recognition,
understanding, multi-device, and multi-process}
. For the
most common use case, where there are multiple recognition results
and some of them have multiple interpretations, the top-level
emma:one-of
is
disjunction-type="recognition"
and the embedded
emma:one-of
has the attribute
disjunction-type="understanding"
.
As an example, in an interactive flight reservation application, recognition yielded 'Boston' or 'Austin' and each had a semantic interpretation as either the assertion of city name or the specification of a flight query with the city as the destination, this would be represented as follows in EMMA:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of disjunction-type="recognition"
start="12457990" end="12457995"
emma:medium="acoustic" emma:mode="voice">
<emma:one-of disjunction-type="understanding"
emma:tokens="boston">
<emma:interpretation>
<assert><city>boston</city></assert>
</emma:interpretation>
<emma:interpretation>
<flight><dest><city>boston</city></dest></flight>
</emma:interpretation>
</emma:one-of>
<emma:one-of disjunction-type="understanding"
emma:tokens="austin">
<emma:interpretation>
<assert><city>austin</city></assert>
</emma:interpretation>
<emma:interpretation>
<flight><dest><city>austin</city></dest></flight>
</emma:interpretation>
</emma:one-of>
</emma:one-of>
</emma:emma>
EMMA MAY explicitly represent ambiguity resulting from different
processes, devices, or sources using embedded
emma:one-of
and the disjunction-type
attribute. Multiple different interpretations resulting from
different factors MAY also be listed within a single unstructured
emma:one-of
though in this case it is more complex or
impossible to uncover the sources of the ambiguity if required by
later stages of processing. If there is no embedding in
emma:one-of
, then the disjunction-type
attribute is not required. If the disjunction-type
attribute is missing then by default the source of disjunction is
unspecified.
The example case above could also be represented as:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of start="12457990" end="12457995"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation emma:tokens="boston">
<assert><city>boston</city></assert>
</emma:interpretation>
<emma:interpretation >
<flight><dest><city>boston</city></dest></flight>
</emma:interpretation>
<emma:interpretation emma:tokens="austin">
<assert><city>austin</city></assert>
</emma:interpretation>
<emma:interpretation emma:tokens="austin">
<flight><dest><city>austin</city></dest></flight>
</emma:interpretation>
</emma:one-of>
</emma:emma>
But in this case information about which interpretations resulted from speech recognition and which resulted from language understanding is lost.
A list of emma:interpretation
elements within an
emma:one-of
MUST be sorted best-first by some measure
of quality. The quality measure is emma:confidence
if
present, otherwise, the quality metric is platform-specific.
With embedded emma:one-of
structures there is no
requirement for the confidence scores within different
emma:one-of
to be on the same scale. For example, the
scores assigned by handwriting recognition might not be comparable
to those assigned by gesture recognition. Similarly, if multiple
recognizers are used there is no guarantee that their confidence
scores will be comparable. For this reason the ordering requirement
on emma:interpretation
within emma:one-of
only applies locally to sister emma:interpretation
elements within each emma:one-of
. There is no
requirement on the ordering of embedded emma:one-of
elements within a higher emma:one-of
element.
While emma:medium
and emma:mode
are
optional on emma:one-of
, note that all EMMA
interpretations must be annotated for emma:medium
and
emma:mode
, so either these annotations must appear
directly on all of the contained emma:interpretation
elements within the emma:one-of
, or they must appear
on the emma:one-of
element itself, or they must appear
on an ancestor emma:one-of
element, or they must
appear on an earlier stage of the derivation listed in
emma:derivation
.
An important use case for ref
on
emma:one-of
is to allow an EMMA processor to return an
abbreviated list of container elements such as
emma:interpretation
within an emma:one-of
and use the ref
attribute to provide a reference to a
more fully specified set. In these cases, the
emma:one-of
MUST be annotated with the
emma:partial-content="true"
attribute.
In the following example the EMMA document received
has the two interpretations within emma:one-of
. The
emma:partial-content="true"
provides an indication
that there are more interpretations and those can be retrieved by
accessing the URI in ref
:
"http://www.example.com/emma_021210_10.xml#r1"
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of id="r1" emma:medium="acoustic" emma:mode="voice" ref="http://www.example.com/emma_021210_10.xml#r1
emma:partial-content="true"> <emma:interpretation id="int1" emma:tokens="from boston to denver" emma:confidence="0.9"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation id="int2" emma:tokens="from austin to denver" emma:confidence="0.7"> <origin>Austin</origin> <destination>Denver</destination> </emma:interpretation> </emma:one-of> </emma:emma>
Where the document at
"http://www.example.com/emma_021210_10.xml" is as follows, and
there are two more interpretations within the
emma:one-of
with id "r1".
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of id="r1" emma:medium="acoustic" emma:mode="voice"
emma:partial-content="false"> <emma:interpretation id="int1" emma:tokens="from boston to denver" emma:confidence="0.9"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation id="int2" emma:tokens="from austin to denver" emma:confidence="0.7"> <origin>Austin</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation id="int3" emma:tokens="from tustin to denver" emma:confidence="0.3"> <origin>Tustin</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation id="int4" emma:tokens="from tustin to dallas" emma:confidence="0.1"> <origin>Tustin</origin> <destination>Dallas</destination> </emma:interpretation> </emma:one-of> </emma:emma>
It is also possible to specify a lattice of results
alongside an N-best list of interpretations in
emma:one-of
. A single emma:lattice
element can appear as a child of emma:one-of
and
contains a lattice representation of the processing of the same
input resulting in the interpretations that appear within the
emma:one-of
. In this example, there are two N-best
results and the emma:lattice
enumerates two more as it
includes arcs for "tomorrow" vs "today".
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="r1" emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="int1" emma:tokens="flights from boston to denver tomorrow">
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
<emma:interpretation id="int2" emma:tokens="flights from austin to denver tomorrow">
<origin>Austin</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
<emma:lattice initial="1" final="7">
<emma:arc from="1" to="2">flights</emma:arc>
<emma:arc from="2" to="3">from</emma:arc>
<emma:arc from="3" to="4">boston</emma:arc>
<emma:arc from="3" to="4">austin</emma:arc>
<emma:arc from="4" to="5">to</emma:arc>
<emma:arc from="5" to="6">denver</emma:arc>
<emma:arc from="6" to="7">today</emma:arc>
<emma:arc from="6" to="7">tomorrow</emma:arc>
</emma:lattice>
</emma:one-of>
</emma:emma>
emma:group
elementAnnotation | emma:group |
---|---|
Definition | A container element indicating that a number of interpretations of distinct user inputs are grouped according to some criteria. |
Children | The emma:group element MUST immediately contain a
collection of one or more emma:interpretation elements
or container elements: emma:one-of ,
emma:group , emma:sequence . It MAY also
contain an optional single
emma:group-info element. It MAY also contain
multiple optional emma:derived-from
elements and multiple emma:info
elements. It MAY also contain
multiple optional emma:annotation elements. It MAY
also contain multiple optional emma:parameters
elements. It MAY also contain a single optional
emma:grammar-active element. It may also contain a single emma:location
element. |
Attributes |
|
Applies to | The emma:group element is legal only as a child of
emma:emma , emma:one-of ,
emma:group , emma:sequence , or
emma:derivation . |
The emma:group
element is used to indicate that the
contained interpretations are from distinct user inputs that are
related in some manner. emma:group
MUST NOT be used
for containing the multiple stages of processing of a single user
input. Those MUST be contained in the emma:derivation
element instead (Section
4.1.2). For groups of inputs in temporal order the more
specialized container emma:sequence
MUST be used
(Section
3.3.3). The following example shows three
interpretations derived from the speech input "Move this ambulance
here" and the tactile input related to two consecutive points on a
map.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:group id="grp" emma:start="1087995961542" emma:end="1087995964542"> <emma:interpretation id="int1" emma:medium="acoustic" emma:mode="voice"> <action>move</action> <object>ambulance</object> <destination>here</destination> </emma:interpretation> <emma:interpretation id="int2" emma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation id="int3" emma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> </emma:emma>
The emma:one-of
and emma:group
containers MAY be nested arbitrarily.
Like emma:one-of
the contents for
emma:group
may be partial, indicated by
emma:partial-content="true"
and the full set of group
members retrieved by accessing the element referenced in
ref
.
emma:group-info
elementAnnotation | emma:group-info |
---|---|
Definition | The emma:group-info element contains or references
criteria used in establishing the grouping of interpretations in an
emma:group element. |
Children | The emma:group-info element MUST either
immediately contain inline instance data specifying grouping
criteria or have the attribute ref referencing the
criteria. |
Attributes |
|
Applies to | The emma:group-info element is legal only as a
child of emma:group . |
Sometimes it may be convenient to indirectly associate a given
group with information, such as grouping criteria. The
emma:group-info
element might be used to make explicit
the criteria by which members of a group are associated. In the
following example, a group of two points is associated with a
description of grouping criteria based upon a sliding temporal
window of two seconds duration.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" xmlns:ex="http://www.example.com/ns/group"> <emma:group id="grp"> <emma:group-info> <ex:mode>temporal</ex:mode> <ex:duration>2s</ex:duration> </emma:group-info> <emma:interpretation id="int1" emma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation id="int2" emma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> </emma:emma>
You might also use emma:group-info
to refer to a
named grouping criterion using external reference, for
instance:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" xmlns:ex="http://www.example.com/ns/group"> <emma:group id="grp"> <emma:group-info ref="http://www.example.com/criterion42"/> <emma:interpretation id="int1" emma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation id="int2" emma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> </emma:emma>
emma:sequence
elementAnnotation | emma:sequence |
---|---|
Definition | A container element indicating that a number of interpretations of distinct user inputs are in temporal sequence. |
Children | The emma:sequence element MUST immediately contain
a collection of one or more emma:interpretation
elements or container elements: emma:one-of ,
emma:group , emma:sequence . It MAY also
contain multiple optional
emma:derived-from elements and multiple
emma:info elements. It
MAY also contain multiple optional emma:annotation
elements. It MAY also contain multiple optional
emma:parameters elements. It MAY also contain a single
optional emma:grammar-active element. It may also contain a single emma:location
element. |
Attributes |
|
Applies to | The emma:sequence element is legal only as a child
of emma:emma , emma:one-of ,
emma:group , emma:sequence , or
emma:derivation . |
The emma:sequence
element is used to indicate that
the contained interpretations are sequential in time, as in the
following example, which indicates that two points made with a pen
are in temporal order.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:sequence id="seq1"> <emma:interpretation id="int1" emma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation id="int2" emma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:sequence> </emma:emma>
The emma:sequence
container MAY be combined with
emma:one-of
and emma:group
in arbitrary
nesting structures. The order of children in the content of the
emma:sequence
element corresponds to a sequence of
interpretations. This ordering does not imply any particular
definition of sequentiality. EMMA processors are expected therefore
to use the emma:sequence
element to hold
interpretations which are either strictly sequential in nature
(e.g. the end-time of an interpretation precedes the start-time of
its follower), or which overlap in some manner (e.g. the start-time
of a follower interpretation precedes the end-time of its
precedent). It is possible to use timestamps to provide fine
grained annotation for the sequence of interpretations that are
sequential in time (see Section
4.2.10).
In the following more complex example, a sequence of two pen
gestures in emma:sequence
and a speech input in
emma:interpretation
is contained in an
emma:group
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:group id="grp"> <emma:interpretation id="int1" emma:medium="acoustic" emma:mode="voice"> <action>move</action> <object>this-battleship</object> <destination>here</destination> </emma:interpretation> <emma:sequence id="seq1"> <emma:interpretation id="int2" emma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation id="int3" emma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:sequence> </emma:group> </emma:emma>
Like emma:one-of
the contents for emma:group may be
partial, indicated by emma:partial-content="true"
and
the full set of group members retrieved by accessing the element
referenced in ref
.
In addition to providing the ability to represent N-best lists
of interpretations using emma:one-of
, EMMA also
provides the capability to represent lattices of words or other
symbols using the emma:lattice
element. Lattices
provide a compact representation of large lists of possible
recognition results or interpretations for speech, pen, or
multimodal inputs.
In addition to providing a representation for lattice output from speech recognition, another important use case for lattices is for representation of the results of gesture and handwriting recognition from a pen modality component. Lattices can also be used to compactly represent multiple possible meaning representations. Another use case for the lattice representation is for associating confidence scores and other annotations with individual words within a speech recognition result string.
Lattices are compactly described by a list of transitions between nodes. For each transition the start and end nodes MUST be defined, along with the label for the transition. Initial and final nodes MUST also be indicated. The following figure provides a graphical representation of a speech recognition lattice which compactly represents eight different sequences of words.
which expands to:
a. flights to boston from portland today please b. flights to austin from portland today please c. flights to boston from oakland today please d. flights to austin from oakland today please e. flights to boston from portland tomorrow f. flights to austin from portland tomorrow g. flights to boston from oakland tomorrow h. flights to austin from oakland tomorrow
emma:lattice
,
emma:arc
, emma:node
elementsAnnotation | emma:lattice |
---|---|
Definition | An element which encodes a lattice representation of user input. |
Children | The emma:lattice element MUST immediately contain
one or more emma:arc elements and zero or more
emma:node elements. |
Attributes |
|
Applies to | The emma:lattice element is legal only as a child
of the emma:interpretation and
emma:one-of elements. |
Annotation | emma:arc |
Definition | An element which encodes a transition between two nodes in a
lattice. The label associated with the arc in the lattice is
represented in the content of emma:arc . |
Children | The emma:arc element MUST immediately contain
either character data or a single application namespace element or
be empty, in the case of epsilon transitions. It MAY contain an
emma:info element containing application or vendor
specific annotations. It MAY contain zero or more optional
emma:annotation elements containing annotations made
by a human annotator. |
Attributes |
|
Applies to | The emma:arc element is legal only as a child of
the emma:lattice element. |
Annotation | emma:node |
Definition | An element which represents a node in the lattice. The
emma:node elements are not required to describe a
lattice but might be added to provide a location for annotations on
nodes in a lattice. There MUST be at most one
emma:node specification for each numbered node in the
lattice. |
Children | An OPTIONAL emma:info element for application or
vendor specific annotations on the node. It MAY contain zero
or more optional emma:annotation elements containing
annotations made by a human annotator. |
Attributes |
|
Applies to | The emma:node element is legal only as a child of
the emma:lattice element. |
In EMMA, a lattice is represented using an element
emma:lattice
, which has attributes
initial
and final
for indicating the
initial and final nodes of the lattice. For the lattice
below, this will be: <emma:lattice
initial="1" final="8"/>
. The nodes are numbered with
integers. If there is more than one distinct final node in the
lattice the nodes MUST be represented as a space separated list in
the value of the final
attribute e.g.
<emma:lattice initial="1" final="9 10 23"/>
.
There MUST only be one initial node in an EMMA lattice. Each
transition in the lattice is represented as an element
emma:arc
with attributes from
and
to
which indicate the nodes where the transition
starts and ends. The arc's label is represented as the content of
the emma:arc
element and MUST be any well-formed
character or XML content. In the example here the contents are
words. Empty (epsilon) transitions in a lattice MUST be represented
in the emma:lattice
representation as
emma:arc
empty elements, e.g.
<emma:arc from="1" to="8"/>
.
The example speech lattice above would be represented in EMMA markup as follows:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
emma:medium="acoustic" emma:mode="voice">
<emma:lattice initial="1" final="8">
<emma:arc from="1" to="2">flights</emma:arc>
<emma:arc from="2" to="3">to</emma:arc>
<emma:arc from="3" to="4">boston</emma:arc>
<emma:arc from="3" to="4">austin</emma:arc>
<emma:arc from="4" to="5">from</emma:arc>
<emma:arc from="5" to="6">portland</emma:arc>
<emma:arc from="5" to="6">oakland</emma:arc>
<emma:arc from="6" to="7">today</emma:arc>
<emma:arc from="7" to="8">please</emma:arc>
<emma:arc from="6" to="8">tomorrow</emma:arc>
</emma:lattice>
</emma:interpretation>
</emma:emma>
Alternatively, if we wish to represent the same information as
an N-best list using emma:one-of,
we would have the
more verbose representation:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="nbest1" emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="interp1">
<text>flights to boston from portland today please</text>
</emma:interpretation>
<emma:interpretationid="interp2">
<text>flights to boston from portland tomorrow</text>
</emma:interpretation>
<emma:interpretation id="interp3">
<text>flights to austin from portland today please</text>
</emma:interpretation>
<emma:interpretation id="interp4">
<text>flights to austin from portland tomorrow</text>
</emma:interpretation>
<emma:interpretation id="interp5">
<text>flights to boston from oakland today please</text>
</emma:interpretation>
<emma:interpretation id="interp6">
<text>flights to boston from oakland tomorrow</text>
</emma:interpretation>
<emma:interpretation id="interp7">
<text>flights to austin from oakland today please</text>
</emma:interpretation>
<emma:interpretation id="interp8">
<text>flights to austin from oakland tomorrow</text>
</emma:interpretation>
</emma:one-of>
</emma:emma>
The lattice representation avoids the need to enumerate all of
the possible word sequences. Also, as detailed below, the
emma:lattice
representation enables placement of
annotations on individual words in the input.
For use cases involving the representation of gesture/ink
lattices and use cases involving lattices of semantic
interpretations, EMMA allows for application namespace elements to
appear within emma:arc
.
For example a sequence of two gestures, each of which is recognized as either a line or a circle, might be represented as follows:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
emma:medium="acoustic" emma:mode="voice">
<emma:lattice initial="1" final="3">
<emma:arc from="1" to="2">
<circle radius="100"/>
</emma:arc>
<emma:arc from="2" to="3">
<line length="628"/>
</emma:arc>
<emma:arc from="1" to="2">
<circle radius="200"/>
</emma:arc>
<emma:arc from="2" to="3">
<line length="1256"/>
</emma:arc>
</emma:lattice>
</emma:interpretation>
</emma:emma>
As an example of a lattice of semantic interpretations, in a travel application where the source is either "Boston" or "Austin"and the destination is either "Newark" or "New York", the possibilities might be represented in a lattice as follows:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
emma:medium="acoustic" emma:mode="voice">
<emma:lattice initial="1" final="3">
<emma:arc from="1" to="2">
<source city="boston"/>
</emma:arc>
<emma:arc from="2" to="3">
<destination city="newark"/>
</emma:arc>
<emma:arc from="1" to="2">
<source city="austin"/>
</emma:arc>
<emma:arc from="2" to="3">
<destination city="new york"/>
</emma:arc>
</emma:lattice>
</emma:interpretation>
</emma:emma>
The emma:arc
element MAY contain either an
application namespace element or character data. It MUST NOT
contain combinations of application namespace elements and
character data. However, an emma:info
element MAY
appear within an emma:arc
element alongside character
data, in order to allow for the association of vendor or
application specific annotations on a single word or symbol in a
lattice. Also an emma:annotation
element may appear as
a child of emma:arc
or emma:node
indicating human annotations on the arc or node.
So, in summary, there are four groupings of content that can
appear within emma:arc
:
emma:info
element
providing vendor or application specific annotations that apply to
the character data.emma:info
element providing vendor or application
specific annotations that apply to the character data.The ref
attribute on emma:lattice
can
be used for cases where the lattice is not returned in the
document, but is made accessible through ref
, or for
cases where the lattice is partial and a full lattice is available
on the server.
For example the following emma:lattice
does not
contain any emma:arc
elements but ref
indicates where the lattice can retrieved from.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="interp1" emma:medium="acoustic" emma:mode="voice" emma:tokens="flights to boston from oakland tomorrow"> <emma:lattice id="l1" initial="1" final="8" emma:partial-content="true" ref="http://www.example.com/ex1/lattice.xml#l1"/> </emma:interpretation> </emma:emma>The document on the server in this case could for example be as follows.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="interp1" emma:medium="acoustic" emma:mode="voice" emma:tokens="flights to boston from oakland tomorrow"> <emma:lattice id="l1" initial="1" final="8" emma:partial-content="false"> <emma:arc from="1" to="2">flights</emma:arc> <emma:arc from="2" to="3">to</emma:arc> <emma:arc from="3" to="4">boston</emma:arc> <emma:arc from="3" to="4">austin</emma:arc> <emma:arc from="4" to="5">from</emma:arc> <emma:arc from="5" to="6">portland</emma:arc> <emma:arc from="5" to="6">oakland</emma:arc> <emma:arc from="6" to="7">today</emma:arc> <emma:arc from="7" to="8">please</emma:arc> <emma:arc from="6" to="8">tomorrow</emma:arc> </emma:lattice> </emma:interpretation> </emma:emma>
Similarly the emma:lattice
could have some arcs but
not all and point to through ref
to the full lattice.
In this case the EMMA document received is a pruned lattice and the
full lattice can be retrieved by accessing the external document
indicated in ref
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="interp1" emma:medium="acoustic" emma:mode="voice"> <emma:lattice id="l1" initial="1" final="8" emma:partial-content="true" ref="http://www.example.com/ex1/lattice.xml#l1"> <emma:arc from="1" to="2">flights</emma:arc> <emma:arc from="2" to="3">to</emma:arc> <emma:arc from="3" to="4">boston</emma:arc> <emma:arc from="4" to="5">from</emma:arc> <emma:arc from="5" to="6">portland</emma:arc> <emma:arc from="6" to="8">tomorrow</emma:arc> </emma:lattice> </emma:interpretation> </emma:emma>
The encoding of lattice arcs as XML elements
(emma:arc
) enables arcs to be annotated with metadata
such as timestamps, costs, or confidence scores:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="interp1" emma:medium="acoustic" emma:mode="voice"> <emma:lattice initial="1" final="8"> <emma:arc from="1" to="2" emma:start="1087995961542" emma:end="1087995962042" emma:cost="30"> flights <emma:annotation id="label3" annotator="john_smith" time="2011-11-10T09:00:21" type="emotion" confidence="1.0" reference="false"> <emotionml xmlns="http://www.w3.org/2009/10/emotionml"> <emotion> <category set="everyday" name="angry"/>
<modality medium="acoustic" mode="voice"/> </emotion> </emotionml> </emma:annotation> </emma:arc> <emma:arc from="2" to="3" emma:start="1087995962042" emma:end="1087995962542" emma:cost="20"> to </emma:arc> <emma:arc from="3" to="4" emma:start="1087995962542" emma:end="1087995963042" emma:cost="50"> boston </emma:arc> <emma:arc from="3" to="4" emma:start="1087995963042" emma:end="1087995963742" emma:cost="60"> austin </emma:arc> ... </emma:lattice> </emma:interpretation> </emma:emma>
The following EMMA attributes MAY be placed on
emma:arc
elements: absolute timestamps
(emma:start
, emma:end
), relative
timestamps ( emma:offset-to-start
,
emma:duration
), emma:confidence
,
emma:cost
, the human language of the input
(emma:lang
), emma:medium
,
emma:mode
, emma:source
, and
emma:annotated-tokens
. The use case for
emma:medium
, emma:mode
, and
emma:source
is for lattices which contains content
from different input modes. The emma:arc
element MAY
also contain an emma:info
element for specification of
vendor and application specific annotations on the arc. The
emma:arc
and emma:node
elements can also
contain optional emma:annotation
elements containing
annotations mae by human annotators. For example, in the example
above emma:annotation
is used to indicate manual
annotation of emotion on the word 'flights'.
The timestamps that appear on emma:arc
elements do
not necessarily indicate the start and end of the arc itself. They
MAY indicate the start and end of the signal corresponding to the
label on the arc. As a result there is no requirement that the
emma:end
timestamp on an arc going into a node should
be equivalent to the emma:start
of all arcs going out
of that node. Furthermore there is no guarantee that the left to
right order of arcs in a lattice will correspond to the temporal
order of the input signal. The lattice representation is an
abstraction that represents a range of possible interpretations of
a user's input and is not intended to necessarily be a
representation of temporal order.
Costs are typically application and device dependent. There are a variety of ways that individual arc costs might be combined to produce costs for specific paths through the lattice. This specification does not standardize the way for these costs to be combined; it is up to the applications and devices to determine how such derived costs would be computed and used.
For some lattice formats, it is also desirable to annotate the
nodes in the lattice themselves with information such as costs. For
example in speech recognition, costs might be placed on nodes as a
result of word penalties or redistribution of costs. For this
purpose EMMA also provides an emma:node
element which
can host annotations such as emma:cost
. The
emma:node
element MUST have an attribute
node-number
which indicates the number of the node.
There MUST be at most one emma:node
specification for
a given numbered node in the lattice. In our example, if there was
a cost of 100 on the final state this could be represented
as follows:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
emma:medium="acoustic" emma:mode="voice">
<emma:lattice initial="1" final="8">
<emma:arc
from="1"
to="2"
emma:start="1087995961542"
emma:end="1087995962042"
emma:cost="30">
flights
</emma:arc>
<emma:arc
from="2"
to="3"
emma:start="1087995962042"
emma:end="1087995962542"
emma:cost="20">
to
</emma:arc>
<emma:arc
from="3"
to="4"
emma:start="1087995962542"
emma:end="1087995963042"
emma:cost="50">
boston
</emma:arc>
<emma:arc
from="3"
to="4"
emma:start="1087995963042"
emma:end="1087995963742"
emma:cost="60">
austin
</emma:arc>
...
<emma:node node-number="8" emma:cost="100"/>
</emma:lattice>
</emma:interpretation>
</emma:emma>
The relative timestamp mechanism in EMMA is intended to provide
temporal information about arcs in a lattice in relative terms
using offsets in milliseconds. In order to do this the absolute
time MAY be specified on emma:interpretation
; both
emma:time-ref-uri
and
emma:time-ref-anchor-point
apply to
emma:lattice
and MAY be used there to set the anchor
point for offsets to the start of the absolute time specified on
emma:interpretation
. The offset in milliseconds to the
beginning of each arc MAY then be indicated on each
emma:arc
in the emma:offset-to-start
attribute.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
emma:start="1087995961542" emma:end="1087995963042"
emma:medium="acoustic" emma:mode="voice">
<emma:lattice emma:time-ref-uri="#interp1"
emma:time-ref-anchor-point="start"
initial="1" final="4">
<emma:arc
from="1"
to="2"
emma:offset-to-start="0">
flights
</emma:arc>
<emma:arc
from="2"
to="3"
emma:offset-to-start="500">
to
</emma:arc>
<emma:arc
from="3"
to="4"
emma:offset-to-start="1000">
boston
</emma:arc>
</emma:lattice>
</emma:interpretation>
</emma:emma>
Note that the offset for the first emma:arc
MUST
always be zero since the EMMA attribute
emma:offset-to-start
indicates the number of
milliseconds from the anchor point to the start of the piece
of input associated with the emma:arc
, in this case
the word "flights".
emma:literal
elementAnnotation | emma:literal |
---|---|
Definition | An element that contains string literal output. |
Children | String literal |
Attributes | An optional emma:result-format attribute. |
Applies to | The emma:literal is a child of
emma:interpretation . |
Certain EMMA processing components produce semantic results in
the form of string literals without any surrounding application
namespace markup. These MUST be placed with the EMMA element
emma:literal
within emma:interpretation
.
For example, if a semantic interpreter simply returned "boston"
this could be represented in EMMA as:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="r1"
emma:medium="acoustic" emma:mode="voice">
<emma:literal>boston</emma:literal>
</emma:interpretation>
</emma:emma>
Note that a raw recognition result of a sequence of words from
speech recognition is also a kind of string literal and can be
contained within emma:literal
. For example,
recognition of the string "flights to san francisco" can be
represented in EMMA as follows:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="r1"
emma:medium="acoustic" emma:mode="voice">
<emma:literal>flights to san francisco</emma:literal>
</emma:interpretation>
</emma:emma>
This section defines annotations in the EMMA namespace including both attributes and elements. The values are specified in terms of the data types defined by XML Schema Part 2: Datatypes Second Edition [XML Schema Datatypes].
emma:model
elementAnnotation | emma:model |
---|---|
Definition | The emma:model either references or provides
inline the data model for the instance data. |
Children | If a ref attribute is not specified then this
element contains the data model inline. |
Attributes |
|
Applies to | The emma:model element MAY appear only as a child
of emma:emma . |
The data model that may be used to express constraints on the structure and content of instance data is specified as one of the annotations of the instance. Specifying the data model is OPTIONAL, in which case the data model can be said to be implicit. Typically the data model is pre-established by the application.
The data model is specified with the emma:model
annotation defined as an element in the EMMA namespace. If the data
model for the contents of a emma:interpretation
,
container elements, or application namespace element is to be
specified in EMMA, the attribute emma:model-ref
MUST
be specified on the emma:interpretation
, container
element, or application namespace element. Note that since multiple
emma:model
elements might be specified under the
emma:emma
it is possible to refer to multiple data
models within a single EMMA document. For example, different
alternative interpretations under an emma:one-of
might
have different data models. In this case, an
emma:model-ref
attribute would appear on each
emma:interpretation
element in the N-best list with
its value being the id
of the emma:model
element for that particular interpretation.
The data model is closely related to the interpretation data,
and is typically specified as the annotation related to the
emma:interpretation
or emma:one-of
elements.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:model id="model1" ref="http://example.com/models/city.xml"/>
<emma:interpretation id="int1" emma:model-ref="model1"
emma:medium="acoustic" emma:mode="voice">
<city> London </city>
<country> UK </country>
</emma:interpretation>
</emma:emma>
The emma:model
annotation MAY reference any element
or attribute in the application instance data, as well as any EMMA
container element (emma:one-of
,
emma:group
, or emma:sequence
).
The data model annotation MAY be used to either reference an
external data model with the ref
attribute or provide
a data model as in-line content. Either a ref
attribute or in-line data model (but not both) MUST be
specified.
Note that unlike the use of ref
on e.g. emma:one-of
it is not possible in EMMA to
provide a partial specification of the data model inline and use
emma:partial-content="true"
to indicate that the full
data model is available from the URI in
ref
.
emma:derived-from
element and
emma:derivation
elementAnnotation | emma:derived-from |
---|---|
Definition | An empty element which provides a reference to the interpretation which the element it appears on was derived from. |
Children | None |
Attributes |
|
Applies to | The emma:derived-from element is legal only as a
child of emma:interpretation ,
emma:one-of , emma:group , or
emma:sequence . |
Annotation | emma:derivation |
Definition | An element which contains interpretation and container elements representing earlier stages in the processing of the input. |
Children | One or more emma:interpretation ,
emma:one-of , emma:sequence , or
emma:group elements. |
Attributes | None |
Applies to | The emma:derivation MAY appear only as a child of
the emma:emma , emma:interpretation ,
emma:one-of , emma:group , and
emma:sequence elements. |
Instances of interpretations are in general derived from other instances of interpretation in a process that goes from raw data to increasingly refined representations of the input. The derivation annotation is used to link any two interpretations that are related by representing the source and the outcome of an interpretation process. For instance, a speech recognition process can return the following result in the form of raw text:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="raw"
emma:medium="acoustic" emma:mode="voice">
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
</emma:emma>
A first interpretation process will produce:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="better"
emma:medium="acoustic" emma:mode="voice">
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:emma>
A second interpretation process, aware of the current date, will be able to produce a more refined instance, such as:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="best"
emma:medium="acoustic" emma:mode="voice">
<origin>Boston</origin>
<destination>Denver</destination>
<date>20030315</date>
</emma:interpretation>
</emma:emma>
The interaction manager might need to have access to the three
levels of interpretation. The emma:derived-from
annotation element can be used to establish a chain of derivation
relationships as in the following example:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="raw"
emma:medium="acoustic" emma:mode="voice">
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
<emma:interpretation id="better">
<emma:derived-from resource="#raw" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="best">
<emma:derived-from resource="#better" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>20030315</date>
</emma:interpretation>
</emma:emma>
The emma:derivation
element MAY be used as a
container for representations of the earlier stages in the
interpretation of the input. The emma:derivation
element MAY appear only as a child of the emma:emma
,
emma:interpretation
, emma:one-of
,
emma:group
, emma:sequence
elements. That
is, it can be a child of emma:emma
, or any container
element except literal or lattice. If emma:derivation
appears within a container it MUST apply to that specific element,
or to a descendant of that element. The latest stage of processing
MUST be a direct child of emma:emma
.
The resource attribute on emma:derived-from
is a
URI which can reference IDs in the current or other EMMA documents.
Since emma:derivation
elements can appear in multiple
different places, EMMA processors MUST use the
emma:derived-from
element to identify earlier stages
of the processing of an input, rather than the document structure.
The option to have emma:derivation
in locations other
than directly under emma:emma
is provided to make the
document more transparent and improve human readability.
In the following example, emma:sequence
is used to
represent a sequence of two spoken inputs and each has its own
emma:derivation
element containing the previous stage
of processing.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:sequence> <emma:interpretation id="nlu1"> <emma:derived-from resource="#raw1" composite="false"/> <origin>Boston</origin> <emma:derivation> <emma:interpretation id="raw1"
emma:medium="acoustic" emma:mode="voice"> <emma:literal>flights from boston</emma:literal> </emma:interpretation> </emma:derivation> </emma:interpretation> <emma:interpretation id="nlu2"> <emma:derived-from resource="#raw2" composite="false"/> <destination>Denver</destination> <emma:derivation> <emma:interpretation id="raw2"
emma:medium="acoustic" emma:mode="voice"> <emma:literal>to denver</emma:literal> </emma:interpretation> </emma:derivation> </emma:interpretation> </emma:sequence> </emma:emma>
In addition to representing sequential derivations, the EMMA
emma:derived-from
element can also be used to capture
composite derivations. Composite derivations involve combination of
inputs from different modes.
In order to indicate whether an emma:derived-from
element describes a sequential derivation step or a composite
derivation step, the emma:derived-from
element has an
attribute composite
which has a boolean value. A
composite emma:derived-from
MUST be marked as
composite="true"
while a sequential
emma:derived-from
element is marked as
composite="false"
. If this attribute is not specified
the value is false
by default.
In the following composite derivation example the user said "destination" using the voice mode and circled Boston on a map using the ink mode:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretation id="voice1" emma:start="1087995961500" emma:end="1087995962542" emma:process="http://example.com/myasr.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"> <rawinput>destination</rawinput> </emma:interpretation> <emma:interpretation id="ink1" emma:start="1087995961600" emma:end="1087995964000" emma:process="http://example.com/mygesturereco.xml" emma:source="http://example.com/pen/wacom123" emma:signal="http://example.com/signals/ink5.inkml" emma:confidence="0.5" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="false"> <rawinput>Boston</rawinput> </emma:interpretation> </emma:derivation> <emma:interpretation id="multimodal1" emma:confidence="0.3" emma:start="1087995961500" emma:end="1087995964000" emma:medium="acoustic tactile" emma:mode="voice ink" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"> <emma:derived-from resource="#voice1" composite="true" <emma:derived-from resource="#ink1" composite="true" <destination>Boston</destination> </emma:interpretation> </emma:emma>
In this example, annotations on the multimodal interpretation
indicate the process used for the integration and there are two
emma:derived-from
elements, one pointing to the speech
and one pointing to the pen gesture.
The only constraints the EMMA specification places on the
annotations that appear on a composite input are that the
emma:medium
attribute MUST contain the union of the
emma:medium
attributes on the combining inputs,
represented as a space delimited set of nmtokens
as
defined in Section
4.2.11, and that the emma:mode
attribute MUST
contain the union of the emma:mode
attributes on the
combining inputs, represented as a space delimited set of
nmtokens
as defined in Section
4.2.11. In the example above this meanings that the
emma:medium
value is "acoustic tactile"
and the emma:mode
attribute is "voice
ink"
. How all other annotations are handled is author
defined. In the following paragraph, informative examples on how
specific annotations might be handled are given.
With reference to the illustrative example above, this paragraph
provides informative guidance regarding the determination of
annotations (beyond emma:medium
and
emma:mode
on a composite multimodal interpretation).
Generally the timestamp on a combined input should contain the
intervals indicated by the combining inputs. For the absolute
timestamps emma:start
and emma:end
this
can be achieved by taking the earlier of the
emma:start
values
(emma:start="1087995961500"
in our example) and the
later of the emma:end
values
(emma:end="1087995964000"
in the example). The
determination of relative timestamps for composite is more complex,
informative guidance is given in Section
4.2.10.4. Generally speaking the emma:confidence
value will be some numerical combination of the confidence scores
assigned to the combining inputs. In our example, it is the result
of multiplying the voice and ink confidence scores
(0.3
). In other cases there may not be a confidence
score for one of the combining inputs and the author may choose to
copy the confidence score from the input which does have one.
Generally, for emma:verbal
, if either of the inputs
has the value true
then the multimodal interpretation
will also be emma:verbal="true"
as in the example. In
other words the annotation for the composite input is the result of
an inclusive OR of the boolean values of the annotations on the
inputs. If an annotation is only specified on one of the combining
inputs then it may in some cases be assumed to apply to the
multimodal interpretation of the composite input. In the example,
emma:lang="en-US"
is only specified for the speech
input, and this annotation appears on the composite result also.
Similarly in our example, only the voice has
emma:tokens
and the author has chosen to annotate the
combined input with the same emma:tokens
value. In
this example, the emma:function
is the same on both
combining input and the author has chosen to use the same
annotation on the composite interpretation.
In annotating derivations of the processing of the input, EMMA
provides the flexibility of both course-grained or fine-grained
annotation of relations among interpretations. For example, when
relating two N-best lists, within emma:one-of
elements
either there can be a single emma:derived-from
element
under emma:one-of
referring to the ID of the
emma:one-of
for the earlier processing stage:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:one-of id="nbest1"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="int1">
<res>from boston to denver on march eleven two thousand three</res>
</emma:interpretation>
<emma:interpretation id="int2">
<res>from austin to denver on march eleven two thousand three</res>
</emma:interpretation>
</emma:one-of>
</emma:derivation>
<emma:one-of id="nbest2">
<emma:derived-from resource="#nbest1" composite="false"/>
<emma:interpretation id="int1b">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation id="int2b">
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
</emma:emma>
Or there can be a separate emma:derived-from
element on each emma:interpretation
element referring
to the specific emma:interpretation
element it was
derived from.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="nbest2">
<emma:interpretation id="int1b">
<emma:derived-from resource="#int1" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation id="int2b">
<emma:derived-from resource="#int2" composite="false"/>
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
<emma:derivation>
<emma:one-of id="nbest1"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="int1">
<res>from boston to denver on march eleven two thousand three</res>
</emma:interpretation>
<emma:interpretation id="int2">
<res>from austin to denver on march eleven two thousand three</res>
</emma:interpretation>
</emma:one-of>
</emma:derivation>
</emma:emma>
Section
4.3 provides further examples of the use of
emma:derived-from
to represent sequential derivations
and addresses the issue of the scope of EMMA annotations across
derivations of user input.
emma:grammar
elementAnnotation | emma:grammar |
---|---|
Definition | An element used indicate the grammar used in processing the
input. The grammar MUST either be specified inline OR
referenced using the ref attribute. |
Children | In the case of inline specification of the grammar, this element contains an element with the specification of the grammar. |
Attributes |
|
Applies to | The emma:grammar is legal only as a child of the
emma:emma element. |
The grammar that was used to derive the EMMA result MAY be
specified with the emma:grammar
annotation defined as
an element in the EMMA namespace. The emma:grammar-ref
attribute appears on the specific interpretation and references the
appropriate emma:grammar
element. The
emma:grammar
element MUST either contain a
representation of the grammar inline OR have a ref
attribute which contains a URI referencing the grammar used in
processing the input. The optional attribute
grammar-type
on emma:grammar
contains a
MIME type indicating the format of the specified grammar. For
example an SRGS grammar in the XML format SHOULD be annotated as
grammar-type="application/srgs-xml"
. The namespace of
an inline grammar MUST be specified.
In the following example, there are three interpretations. Each
interpretation is annotated with emma:grammar-ref
to
indicate the grammar that resulted in that interpretation. The two
emma:grammar
elements indicate the URI for the grammar
using the ref
attribute. Both grammars are SRGS XML
grammars and so are annotated as
grammar-type="application/srgs-xml"
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:grammar id="gram1" grammar-type="application/srgs-xml" ref="someURI"/> <emma:grammar id="gram2" grammar-type="application/srgs-xml" ref="anotherURI"/> <emma:one-of id="r1" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation id="int1" emma:grammar-ref="gram1"> <origin>Boston</origin> </emma:interpretation> <emma:interpretation id="int2" emma:grammar-ref="gram1"> <origin>Austin</origin> </emma:interpretation> <emma:interpretation id="int3" emma:grammar-ref="gram2"> <command>help</command> </emma:interpretation> </emma:one-of> </emma:emma>
In the following example, there are two interpretations, each
from a different grammar, and the SRGS grammars used to derive the
interpretations are specified inline each as a child of an
emma:grammar
element. The namespace of the inline
grammars is indicated explicitly on each.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:grammar id="gram1" grammar-type="application/srgs-xml">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xml:lang="en" version="1.1" root="state" mode="voice" tag-format="semantics/1.0">
<rule id="state" scope="public"> <one-of> <item>California<tag>out="CA";</tag></item> <item>New Jersey<tag>out="NJ";</tag></item> <item>New York<tag>out="NY";</tag></item> </one-of> </rule>
</grammar> </emma:grammar> <emma:grammar id="gram2" grammar-type="application/srgs-xml">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xml:lang="en" version="1.1" root="city" mode="voice" tag-format="semantics/1.0">
<rule id="city" scope="public"> <one-of> <item>Calgary<tag>out="YYC";</tag></item> <item>San Francisco<tag>out="SFO";</tag></item> <item>Boston<tag>out="BOS";</tag></item> </one-of> </rule>
</grammar> </emma:grammar> <emma:one-of id="r1" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation id="int1" emma:tokens="California" emma:grammar-ref="gram1"> <emma:literal>CA</emma:literal> </emma:interpretation> <emma:interpretation id="int2" emma:tokens="Calgary" emma:grammar-ref="gram2"> <emma:literal>YYC</emma:literal> </emma:interpretation> </emma:one-of> </emma:emma>
Non-XML grammar formats, such as the SRGS ABNF format, MUST be
contained within <!-[CDATA[ ...]]>
. Care should
be taken in platforms generating EMMA to avoid conflicts between
id
values in the EMMA markup and those in any inline
grammars. Authors should be aware that there could be conflicts
between id
values used in different embedded inline
grammars within an EMMA document.
Note that unlike the use of ref
on e.g. emma:one-of
it is not possible in EMMA to
provide a partial specification of the grammar inline and use
emma:partial-content="true"
to indicate that the full
grammar is available from the URI in ref
.
emma:grammar-active
elementAnnotation | emma:grammar-active |
---|---|
Definition | An element used to indicate the grammars active during the processing of an input. |
Children | A list of emma:active elements, one for each
grammar currently active. |
Attributes |
|
Applies to | emma:interpretation , emma:one-of ,
emma:group , emma:sequence |
Annotation | emma:active |
Definition | An element specifying a particular grammar active during the processing of an input. |
Children | None |
Attributes |
|
Applies to | emma:grammar-active |
The default when multiple emma:grammar
elements are
specified under emma:emma
is to assume that all
grammars are active for all of the interpretations specified in the
top level of the current EMMA document. In certain use cases, such
as documents containing results from different microphones or
different modalities, this may not be the case and the set of
grammars active for a specific interpretation or set of
interpretations should be annotated explicitly using
emma:grammar-active
. Each grammar which is active is
indicated by an active element which must have an
emma:grammar-ref
annotation pointing to the specific
grammar. For example, to make explicit the fact that both grammars,
gram1
and gram2
are active for all of the
three N-best interpretations in the following example, an
emma:grammar-active element appears as a child of the
emma:one-of
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:grammar id="gram1" grammar-type="application/srgs-xml" ref="someURI"/> <emma:grammar id="gram2" grammar-type="application/srgs-xml" ref="anotherURI"/> <emma:one-of id="r1" emma:medium="acoustic" emma:mode="voice"> <emma:grammar-active> <emma:active emma:grammar-ref="gram1"/> <emma:active emma:grammar-ref="gram2"/> </emma:grammar-active> <emma:interpretation id="int1" emma:grammar-ref="gram1"> <origin>Boston</origin> </emma:interpretation> <emma:interpretation id="int2" emma:grammar-ref="gram1"> <origin>Austin</origin> </emma:interpretation> <emma:interpretation id="int3" emma:grammar-ref="gram2"> <command>help</command> </emma:interpretation> </emma:one-of> </emma:emma>
The use of an element for each active grammar, allows for more
complex use cases where specific metadata is associated with each
active grammar. For example, a weighting of other parameters
associated with each active grammar could be specified within an
emma:info
within emma:active
.
emma:info
elementAnnotation | emma:info |
---|---|
Definition | The emma:info element acts as a container for
vendor and/or application specific metadata regarding a user's
input. |
Children | One of more elements in the application namespace providing metadata about the input. |
Attributes |
|
Applies to | The emma:info element is legal only as a child of
the EMMA elements emma:emma ,
emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:arc , emma:node , or
emma:annotation . |
In Section
4.2, a series of attributes are defined for representation of
metadata about user inputs in a standardized form. EMMA also
provides an extensibility mechanism for annotation of user inputs
with vendor or application specific metadata not covered by the
standard set of EMMA annotations. The element
emma:info
MUST be used as a container for these
annotations, UNLESS they are explicitly covered by
emma:endpoint-info
. For example, if an input to a
dialog system needed to be annotated with the number that the call
originated from, their state, some indication of the type of
customer, and the name of the service, these pieces of information
could be represented within emma:info
as in the
following example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:info id="info_details"> <caller_id> <phone_number>2121234567</phone_number> <state>NY</state> </caller_id> <customer_type>residential</customer_type> <service_name>acme_travel_service</service_name> </emma:info> <emma:one-of id="r1"
emma:start="1087995961542" emma:end="1087995963542" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation id="int1" emma:confidence="0.75"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation id="int2" emma:confidence="0.68"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of> </emma:emma>
It is important to have an EMMA container element for
application/vendor specific annotations since EMMA elements provide
a structure for representation of multiple possible interpretations
of the input. As a result it is cumbersome to state
application/vendor specific metadata as part of the application
data within each emma:interpretation
. An element is
used rather than an attribute so that internal structure can be
given to the annotations within emma:info
.
In addition to emma:emma
, emma:info
MAY also appear as a child of other structural elements such as
emma:interpretation
, emma:one-of
and so
on. When emma:info
appears as a child of one of these
elements the application/vendor specific annotations contained
within emma:info
are assumed to apply to all of the
emma:interpretation
elements within the containing
element. The semantics of conflicting annotations in
emma:info
, for example when different values are found
within emma:emma
and emma:interpretation
,
are left to the developer of the vendor/application specific
annotations.
There may be more than one emma:info
element. One
of the functions of this is to enable specification interpretations
to indicate which emma:info
applies to them using
emma:info-ref
. If emma:info
has the optional id
attribute then the
emma:info-ref
attribute (Section
4.2.19) can be used on emma:interpretation
and
other container elements to indicate that a particular set of
application/vendor specific annotations apply to a particular
interpretation.
The emma:info
element can therefore have either
position scope (applies to the element it appears in and the
interpretations within in), or index scope where
emma:info-ref
attributes are used to show which
interpretations a particular emma:info
applies to. In
order to distinguish emma:info elements that have positional vs.
index scope the indexed attribute must be used. The attribute
indexed=true
indicates that the emma:info
it appears on does not have positional scope and instead is
referenced using emma:info-ref
. The attribute
indexed=false
indicates than an emma:info
has positional scope. The default value if indexed
is
not specified is false
. The indexed
attribute is required if and only if there is an
emma:info-ref
that refers to the id
of
the emma:info
.
The ref
attribute can also be used on
emma:info
instead of placing the application/vendor
specific annotations inline. For example, assuming the example
above was available at
http://example.com/examples/123/emma.xml
, the EMMA
document delivered to an EMMA consumer could be:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:info ref="http://example.com/examples/123/emma.xml#info_details"/>
<emma:one-of id="r1"
emma:start="1087995961542"
emma:end="1087995963542"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="int1" emma:confidence="0.75">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation id="int2" emma:confidence="0.68">
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
</emma:emma>
A ref
on emma:info
can
also be used to point to an external document, not necessarily an
EMMA document, containing additional annotations on the
interpretation. For example, it could be used to point to an XML
document providing a list of the specifications of the input
device.
emma:endpoint-info
element and
emma:endpoint
elementAnnotation | emma:endpoint-info |
---|---|
Definition | The emma:endpoint-info element acts as a container
for all application specific annotation regarding the communication
environment. |
Children | One or more emma:endpoint elements. |
Attributes |
|
Applies to | The emma:endpoint-info elements is legal only as a
child of emma:emma . |
Annotation | emma:endpoint |
Definition | The element acts as a container for application specific endpoint information. |
Children | Elements in the application namespace providing metadata about the input. |
Attributes |
|
Applies to | emma:endpoint-info |
In order to conduct multimodal interaction, there is a need in
EMMA to specify the properties of the endpoint that receives the
input which leads to the EMMA annotation. This allows subsequent
components to utilize the endpoint properties as well as the
annotated inputs to conduct meaningful multimodal interaction. EMMA
element emma:endpoint
can be used for this purpose. It
can specify the endpoint properties based on a set of common
endpoint property attributes in EMMA, such as
emma:endpoint-address
, emma:port-num
,
emma:port-type
, etc. (Section
4.2.14). Moreover, it provides an extensible annotation
structure that allows the inclusion of application and vendor
specific endpoint properties.
Note that the usage of the term "endpoint" in this context is different from the way that the term is used in speech processing, where it refers to the end of a speech input. As used here, "endpoint" refers to a network location which is the source or recipient of an EMMA document.
In multimodal interaction, multiple devices can be used and each
device can open multiple communication endpoints at the same time.
These endpoints are used to transmit and receive data, such as raw
input, EMMA documents, etc. The EMMA element
emma:endpoint
provides a generic representation of
endpoint information which is relevant to multimodal interaction.
It allows the annotation to be interoperable, and it eliminates the
need for EMMA processors to create their own specialized
annotations for existing protocols, potential protocols or yet
undefined private protocols that they may use.
Moreover, emma:endpoint-info
provides a container
to hold all annotations regarding the endpoint information,
including emma:endpoint
and other application and
vendor specific annotations that are related to the communication,
allowing the same communication environment to be referenced and
used in multiple interpretations.
Note that EMMA provides two locations (i.e.
emma:info
and emma:endpoint-info
) for
specifying vendor/application specific annotations. If the
annotation is specifically related to the description of the
endpoint, then the vendor/application specific annotation SHOULD be
placed within emma:endpoint-info
, otherwise it SHOULD
be placed within emma:info
.
The following example illustrates the annotation of endpoint reference properties in EMMA.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
xmlns:ex="http://www.example.com/emma/port">
<emma:endpoint-info id="audio-channel-1">
<emma:endpoint id="endpoint1"
emma:endpoint-role="sink"
emma:endpoint-address="135.61.71.103"
emma:port-num="50204"
emma:port-type="rtp"
emma:endpoint-pair-ref="endpoint2"
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
emma:service-name="travel"
emma:mode="voice">
<ex:app-protocol>SIP</ex:app-protocol>
</emma:endpoint>
<emma:endpoint id="endpoint2"
emma:endpoint-role="source"
emma:endpoint-address="136.62.72.104"
emma:port-num="50204"
emma:port-type="rtp"
emma:endpoint-pair-ref="endpoint1"
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
emma:service-name="travel"
emma:mode="voice">
<ex:app-protocol>SIP</ex:app-protocol>
</emma:endpoint>
</emma:endpoint-info>
<emma:interpretation id="int1"
emma:start="1087995961542" emma:end="1087995963542"
emma:endpoint-info-ref="audio-channel-1"
emma:medium="acoustic" emma:mode="voice">
<destination>Chicago</destination>
</emma:interpretation>
</emma:emma>
The ex:app-protocol
is provided by the application
or the vendor specification. It specifies that the application
layer protocol used to establish the speech transmission from the
"source" port to the "sink" port is Session Initiation Protocol
(SIP). This is specific to SIP based VoIP communication, in which
the actual media transmission and the call signaling that controls
the communication sessions, are separated and typically based on
different protocols. In the above example, the Real-time
Transmission Protocol (RTP) is used in the media transmission
between the source port and the sink port.
emma:process-model
elementAnnotation | emma:process-model |
---|---|
Definition | An element used indicate the model used in processing the
input. The model must be referenced using the ref
attribute which is URI valued. |
Children | None. |
Attributes |
|
Applies to | The emma:process-model is legal only as a child of
the emma:emma element. |
The model that was used to derive the EMMA result MAY be
specified with the emma:process-model
annotation
defined as an element in the EMMA namespace. The
emma:process-model-ref
attribute appears on the
specific interpretation and references the appropriate
emma:process-model
element. The
emma:process-model
element MUST have a
ref
attribute which contains a URI referencing the
model used in processing the input. Unlike
emma:grammar
, emma:process-model
does not
allow for inline specification of a model. For each
emma:process-model
element there MUST be an
emma:process-model-ref
in the document those value is
the id
of that emma:process-model
. The
emma:process-model
element cannot have positional
scope.
The emma:process-model
element MUST have an
attribute type
containing a string
indicating the type of model referenced. The value of type is drawn
from an open set including
{svm,crf,neural_network,hmm...}
.
Examples of potential uses of emma:process-model
include referencing the model used for handwriting recognition or a
text classification model used for natural language understanding.
The emma:process-model
annotation SHOULD be used for
input processing models that are not grammars. Grammars SHOULD be
referenced or specified inline using emma:grammar
.
Some input processing modules may utilize both a recognition model
and a grammar. For example, for handwriting recognition of
electronic ink a neural network might be used for character
recognition while a language model or grammar is used to constrain
the word or character sequences recognized. In this case, the
neural network SHOULD be referenced using
emma:process-model
and the grammar or language model
using emma:grammar
.
In the following example, there are two interpretations. The EMMA document in this example is produced by a computer vision system doing object recognition. The first interpretation is generated by a process model for vehicle recognition and second competing interpretation is generated by a process model for person recognition.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:process-model id="pm1"
type="neural_network"
ref="http://example.com/vision/vehicle"/>
<emma:process-model id="pm2"
type="neural_network"
ref="http://example.com/vision/people"/>
<emma:one-of id="r1"
emma:start="1087995961542"
emma:end="1087995961542"
emma:medium="visual"
emma:mode="image"
emma:process="http://example.com/mycompvision1.xml">>
<emma:interpretation id="int1"
emma:confidence="0.9"
emma:process-model-ref="pm1">
<object>aircraft</object>
</emma:interpretation>
<emma:interpretation id="int2"
emma:confidence="0.1"
emma:process-model-ref="pm2">
<object>person</object>
</emma:interpretation>
</emma:one-of>
</emma:emma>
emma:parameters
and emma:parameter
elementsAnnotation | emma:parameters |
---|---|
Definition | An element used indicate a set of parameters used to configure a processor used in producing an EMMA result. |
Children | Any number of emma:parameter elements |
Attributes |
|
Applies to | The emma:parameters MAY appear only as a child of
the emma:emma , emma:interpretation ,
emma:one-of , emma:group , and
emma:sequence elements. |
Annotation | emma:parameter |
Definition | An element used indicate a specific parameter in the configuration of a processor used in producing an EMMA result. |
Children | None |
Attributes |
|
Applies to | The emma:parameter is legal only as a child of the
emma:parameters element. |
A set of parameters that were used to configure the EMMA
processor that produces an EMMA result MAY be specified with the
emma:parameters
annotation defined as an element in
the EMMA namespace. The emma:parameter-ref
attribute
(Section 4.2.21)
appears on the specific emma:interpretation
or other
container element and references the appropriate
emma:parameters
element. For example, typical
parameters for speech recognition such as confidence thresholds,
speed vs. accuracy, timeouts, settings for endpointing etc can be
included in emma:parameters
.
For each emma:parameters
element there MUST be an
emma:parameter-ref
in the document those value is the
id
of that emma:parameters
. The
emma:parameters
element cannot have positional
scope.
The optional attribute api-ref
on
emma:parameter
and emma:parameters
,
specifies the specific API that the name and value of a parameter
is drawn from or names and values of the set of parameters are
drawn from. It's value is a string from an open set including:
{vxml2.1, vxml2.0, MRCPv2, MRCPv1, html+speech,
OpenCV....}. A parameters name
and
value
are from the API specified in
api-ref
on the emma:parameter
element if
present. Otherwise, they are from the API specified in
api-ref
, if present, on the surrounding
emma:parameters
element. If the api-ref
is not defined on either emma:parameter
or
emma:parameters
the API that the name(s) and value(s)
are drawn from is undefined.
In the following example, the interpretation is annotated with
emma:parameter-ref
to indicate the set of processing
parameters that resulted in that interpretation. These are
contained within an emma:parameters
under
emma:emma
. The API for the first two parameters is
inherited from emma:parameters
and is
"vxml2.1"
. The API for the third parameter is vendor
specific and specified directly in api-ref
on that
emma:parameter
element.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters id="parameters1" api-ref="vxml2.1">
<emma:parameter name="confidencelevel" value=".9"/>
<emma:parameter name="completetimeout" value=".3s"/> <emma:parameter name="word_confusion_network_confidence" value="YES" api-ref="x-acme-recognizer"/>
</emma:parameters>
<emma:interpretation id="int1" emma:parameter-ref="parameters1" emma:medium="acoustic" emma:mode="voice" emma:process="http://example.com/asr>
<origin>Boston</origin>
</emma:interpretation>
</emma:emma>
Note that in an EMMA document describing a multimodal input or a
derivation with multiple steps there may be multiple different
emma:parameters
elements specifying the parameters
used for each specific mode or processing stage. The relationship
between a emma:parameters
element and the container
element it applies to is captured by the
emma:parameter-ref
attribute.
Instead of specifying parameters inline the
ref
attribute can be used to provide a URI reference
to an external document containing the parameters. This could be
either a pointer to an emma:parameters
element within
an EMMA document, or it can be a reference to a non-EMMA document
containing a specification of the parameters. In the following
example, the emma:parameters
element contains a
reference to an separate parameters document.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters id="parameters1" api-ref="vxml2.1" ref="http://example.com/mobile/asr/params.xml"
</emma:parameters>
<emma:interpretation id="int1" emma:parameter-ref="parameters1"
emma:medium="acoustic" emma:mode="voice"
emma:process="http://example.com/asr>
<origin>Boston</origin>
</emma:interpretation>
</emma:emma>
emma:annotation
elementAnnotation | emma:annotation |
---|---|
Definition | The emma:annotation element acts as a
container for annotations of user inputs made by human
labellers |
Children | One or more elements providing annotations of the input.
May also contain a single emma:info
element. |
Attributes |
|
Applies to | The emma:annotation element is legal only as
a child of the EMMA elements emma:emma ,
emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:arc , or emma:node . |
In many spoken and multimodal applications, at some time after user interactions have taken place, human labellers are used to provide annotation of the input. For example, for speech input the most common annotation is to transcribe the actual words spoken by the user by listening to the audio. The correct semantic interpretation of the input may also be annotated. Labellers may also annotate other aspects of the input such as the emotional state of the user.
To provide support for augmenting logged EMMA documents with
human annotations, the EMMA markup provides the
emma:annotation
element. Multiple instances of this
element can appear as a children of the various EMMA containers. In
examples with emma:one-of
and multiple
emma:interpretation
elements,
emma:annotation
will generally appear as a child of
emma:one-of
as it is an annotation of the signal
rather than of the specific interpretation hypotheses encoded in
the individual interpretations. The emma:annotation
element can also be used to annotated arcs and states in lattices
by including it in emma:arc
and
emma:node
.
In addition to id
, the emma:annotation
element provides a series of optional attributes that MAY be used
to provide metadata regarding the annotation. The
annotator
attribute contains a string indicating the
name or other identifier of the annotator. The type
attribute indicates the kind of annotation and has an open set of
values {transcription, semantics, emotion ...}
. The
time
attribute on emma:annotation
does
not have any relation to the time of the input itself, rather it
indicates the date and time that the annotation was made. The
emma:confidence
attribute is a value between 0 and 1
indicating the annotators confidence in their annotation. The
reference
attribute is a boolean which indicates
whether the annotation is appears on is the reference annotation
for the interpretation as opposed to some other annotation of the
input. For example, if the interpretation in the EMMA document is a
speech recognition result, annotation of the reference string
SHOULD have reference="true"
, while an annotation of
the emotional state of the user should be annotated as
reference="false"
Further metadata regarding the
annotation can be captured by using emma:info
within
emma:annotation
.
In addition to specifying annotations inline the src
ref attribute on the
emma:annotation
element can be used to refer to an
external document containing the annotation content.
In the following example, the EMMA document contains an N-best
list with two recognition hypotheses and their semantic
representations. Under emma:one-of
there are three
different annotations all made by different annotators on different
days and times. The first is the transcription, this indicates that
in fact neither of the N-best results was correct and actual
utterance spoken was "flights from austin to denver tomorrow". The
second annotation (label2
) contains the annotated
semantic interpretation of the reference string. The third
annotation contains an additional piece of metadata captured by a
human labeller, specifically it captures the fact that based on the
audio, the user's emotional state was angry. Here as an
illustration we utilize Emotion
ML markup.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of id="int1" emma:start="1087995961542" emma:end="1087995963542" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:signal="http://example.com/signals/audio457.wav"> <emma:interpretation id="int1" emma:confidence="0.75" emma:tokens="flights from boston to denver tomorrow"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation id="int2" emma:confidence="0.68" emma:tokens="flights from austin to denver today"> <origin>Austin</origin> <destination>Denver</destination> <date>today</date> </emma:interpretation> <emma:annotation id="label1" annotator="joe_bloggs" time="2011-10-26T21:32:52" type="transcription" emma:confidence="0.9" reference="false"> <emma:literal>flights from austin to denver tomorrow</emma:literal> </emma:annotation> <emma:annotation id="label2" annotator="mary_smith" time="2011-10-27T12:00:21" type="semantics" emma:confidence="1.0" reference="true"> <origin>Austin</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:annotation> <emma:annotation id="label3" annotator="tim_black" time="2011-11-10T09:00:21" type="emotion" emma:confidence="1.0" reference="false"> <emotionml xmlns="http://www.w3.org/2009/10/emotionml"> <emotion> <category set="everyday" name="angry"/>
<modality medium="acoustic" mode="voice"/> </emotion> </emotionml> </emma:annotation> </emma:one-of> </emma:emma>
In addition to this more powerful mechanism for adding human
annotation to a document, EMMA also provides a shorthand
emma:annotated-tokens
attribute for the common use
case of adding reference transcriptions to an EMMA document (Section
4.2.22) .
Note that 'annotation' as used in the
emma:annotation
element and the
emma:annotated-tokens
attribute refers only to
annotations made in a post process by human labellers to indicate
what the correct processing (reference) of an input should have
been or to annotate other aspects of the input. This differs from
the general sense of annotation as used more broadly in the
specification as in the title "Extensible MultiModal Annotation",
which refers in general to metadata provided about an input either
by an EMMA processor or by a human labeller. The many annotation
elements and attributes in EMMA are used to indicate metadata
captured regarding an input. The emma:annotation
element and emma:annotated-tokens
attribute are
specifically for the addition of information provided by human
labellers.
Annotations such the Emotion
ML in the example above can also be stored in separate files
and referenced on an emma:annotation
element using
ref
. Like emma:parameters
, a partial
specification of the annotation can be provided inline and
emma:partial-content="true"
provides an indication
that the full annotation can be accessed at ref
.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="int1" emma:start="1087995961542"
emma:end="1087995963542"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:signal="http://example.com/signals/audio457.wav"
emma:confidence="0.75">
<emma:interpretation id="int1" emma:confidence="0.75"
emma:tokens="flights from boston to denver tomorrow">
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation> <emma:annotation id="label3"
annotator="tim_black"
time="2011-11-10T09:00:21"
type="emotion"
confidence="1.0"
reference="false"
ref="http://example.com/2011/11/10/emotion123.xml">
</emma:annotation>
</emma:one-of>
</emma:emma>
emma:location
elementAnnotation | emma:location |
---|---|
Definition | The emma:location element acts as a
container for information about the location of a user input, more
precisely, information about the location of a capture device such
as a mobile device. |
Children | none |
Attributes |
|
Applies to | The emma:location element is legal only as a
child of the EMMA elements emma:emma ,
emma:interpretation , emma:group ,
emma:one-of , emma:sequence . |
Many mobile devices and sensors are equiped with
geolocation capabilities and information about where a unimodal or
multimodal event occurred can be very useful both for
interpretation and logging. Annotating interpretations with
location information in EMMA is achieved with the
emma:location
element. The emma:location
element indicates the location of the capture device. In many cases
the device location and the user location will be identical, as in
the case where the user is carrying a mobile device. In other use
cases (e.g. cameras capturing distant motion, far field microphone
arrays) the user may be distant from from the device location.
Capturing the location of the user or other source of signal is
beyond the scope of the emma:location
annotation. Note
that emma:location
is not intended as a general
semantic representation for location information, e.g. a gesture
made a location on a map or a spoken location, these rather are
part of the interpretation and should be contained within
emma:interpretation
rather than the
emma:location
annotation element. The location
information in emma:location
represents a point in
space. Since a device or sensor may be moving during the capture of
an input, the location may not be same at the beginning and end of
an input. For this reason, the emma:location
information is defined to be relative to the beginning of the
capture. Note though that the bearing of the sensor can be
annotated using the emma:heading
and
emma:speed
attributes on emma:location
.
The emma:location
element represents the location of
single capture device. Uses cases where multiple input devices or
sensors are involved in the capture of the input can be represented
as composite inputs with an emma:location
element
annotation on each of the interpretations that are composed.
Multimodal Interaction Working Group invites comments on use cases
that may require a finer-grained representation of location
metadata.
The emma:location
attributes are based
on the W3C Geolocation API [Geolocation]
specification, with the addition of attributes for a description of
the location and address information. The formats of the attributes
from the Geolocation API are as defined in that specification.
Specifically, they are:
The geographic coordinate reference system used by the attributes is the World Geodetic System (2d) [WGS84]. No other reference system is supported.
The emma:latitude
and
emma:longitude
attributes are geographic coordinates
of the capture device at the beginning of the capture. They MUST be
specified in decimal degrees.
The emma:altitude
attribute denotes the
height of the position at the beginning of the capture. It MUST be
specified in meters above the [WGS84]
ellipsoid, or as provided by the device's geolocation
implementation. If the implementation cannot provide altitude
information, the value of this attribute MUST be the empty
string.
The emma:accuracy
attribute denotes the
accuracy of the latitude and longitude coordinates. It MUST be
specified in meters. The value of the emma:accuracy
attribute MUST be a non-negative real number.
The emma:altitudeAccuracy
attribute is
specified in meters. If the implementation cannot provide altitude
information, the value of this attribute MUST be the empty string.
Otherwise, the value of the emma:altitudeAccuracy
attribute MUST be a non-negative real number.
The emma:accuracy
and
emma:altitudeAccuracy
values in a EMMA document SHOULD
correspond to a 95% confidence level.
The emma:heading
attribute denotes the
direction of travel of the capture device at the beginning of the
capture, and is specified in degrees, where 0° ≤ heading < 360°,
counting clockwise relative to the true north. If the
implementation cannot provide heading information, the value of
this attribute MUST be the empty string. If the capture device is
stationary (i.e. the value of the speed attribute is 0), then the
value of the emma:heading
attribute MUST be the empty
string.
The emma:speed
attribute denotes the
magnitude of the horizontal component of the capture device's
velocity at the beginning of the capture, and MUST be specified in
meters per second. If the implementation cannot provide speed
information, the value of this attribute MUST be the empty string.
Otherwise, the value of the emma:speed
attribute MUST
be a non-negative real number.
The emma:description
attribute is an
arbitrary string describing the location of the capture device at
the beginning of the capture.
The emma:address
attribute is an
arbitrary string describing the address of the capture device at
the beginning of the capture.
The internal formats of the
emma:description
and the emma:address
attributes are not defined in this specification.
The following example shows the location information for an input spoken at the W3C MIT office.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:location latitude="42.361860" longitude="-71.091840" altitude="6.706" accuracy="20.5" altitudeAccuracy="1.6" heading="" speed="" description="W3C MIT office" address="32 Vassar Street, Cambridge, MA 02139 USA"/> </emma:location> <emma:interpretation id="nlu1" emma:medium="acoustic" emma:mode="voice" emma:tokens="flights from boston to denver"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> </emma:emma>
emma:tokens
attributeAnnotation | emma:tokens |
---|---|
Definition | An attribute of type xsd:string holding a sequence
of input tokens. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence , and
application instance data. |
The emma:tokens
annotation holds a list of input
tokens. In the following description, the term tokens is
used in the computational and syntactic sense of units of
input, and not in the sense of XML tokens. The value
held in emma:tokens
is the list of the tokens of input
as produced by the processor which generated the EMMA document;
there is no language associated with this value.
In the case where a grammar is used to constrain input, the
value will correspond to tokens as defined by the grammar. So for
an EMMA document produced by input to a SRGS grammar [SRGS],
the value of emma:tokens
will be the list of words
and/or phrases that are defined as tokens in SRGS (see
Section 2.1 of [SRGS]).
Items in the emma:tokens
list are delimited by white
space and/or quotation marks for phrases containing white space.
For example:
emma:tokens="arriving at 'Liverpool Street'"
where the three tokens of input are arriving, at and Liverpool Street.
The emma:tokens
annotation MAY be applied not just
to the lexical words and phrases of language but to any level of
input processing. Other examples of tokenization include phonemes,
ink strokes, gestures and any other discrete units of input at any
level.
Examples:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1"
emma:tokens="From Cambridge to London tomorrow"
emma:medium="acoustic" emma:mode="voice">
<origin emma:tokens="From Cambridge">Cambridge</origin>
<destination emma:tokens="to London">London</destination>
<date emma:tokens="tomorrow">20030315</date>
</emma:interpretation>
</emma:emma>
emma:process
attributeAnnotation | emma:process |
---|---|
Definition | An attribute of type xsd:anyURI referencing the
process used to generate the interpretation. |
Applies to | emma:interpretation , emma:one-of ,
emma:group , emma:sequence |
A reference to the information concerning the processing that
was used for generating an interpretation MAY be made using the
emma:process
attribute. For example:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="raw"
emma:medium="acoustic" emma:mode="voice">
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
<emma:interpretation id="better"
emma:process="http://example.com/mysemproc1.xml">
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
<emma:derived-from resource="#raw"/>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="best"
emma:process="http://example.com/mysemproc2.xml">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
<emma:derived-from resource="#better"/>
</emma:interpretation>
</emma:emma>
The process description document, referenced by the
emma:process
annotation MAY include information on the
process itself, such as grammar, type of parser, etc. EMMA is not
normative about the format of the process description document.
Note that while the emma:process
attribute may
refer to a document that describes the process, the URI syntax
itself can be used to briefly describe the process within the EMMA
document without actually referring to an external document. For
example, the results of a natural language understanding component
could be annotated as follows:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="nlu1"
emma:medium="acoustic"
emma:mode="voice"
emma:tokens="flights from boston to denver tomorrow please"
emma:process="http://nlu/classifier=svm&model=travel&output=xml">
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:emma>
In this case the emma:process
attribute indicates
that the process is natural language understanding
(nlu
) that the classifier used is a support vector
machine (svm
), that the specific model is the
'travel
' model and the required output was
'xml
'. Note that none of the specific values used
within the URI here are standardized. This simply illustrates how a
URI can be used to provide a detailed process description.
emma:no-input
attributeAnnotation | emma:no-input |
---|---|
Definition | Attribute holding xsd:boolean value that is true
if there was no input. |
Applies to | emma:interpretation |
The case of lack of input MUST be annotated as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="int1" emma:no-input="true"
emma:medium="acoustic" emma:mode="voice"/> </emma:emma>
If the emma:interpretation
is annotated with
emma:no-input="true"
then the
emma:interpretation
MUST be empty.
emma:uninterpreted
attributeAnnotation | emma:uninterpreted |
---|---|
Definition | Attribute holding xsd:boolean value that is true
if no interpretation was produced in response to the
input |
Applies to | emma:interpretation |
An emma:interpretation
element representing input
for which no interpretation was produced MUST be
annotated with emma:uninterpreted="true"
. For
example:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1" emma:uninterpreted="true"
emma:medium="acoustic" emma:mode="voice"/>
</emma:emma>
The notation for uninterpreted input MAY refer to any possible
stage of interpretation processing, including raw transcriptions.
For instance, no interpretation would be produced for stages
performing pure signal capture such as audio recordings. Likewise,
if a spoken input was recognized but cannot be parsed by a language
understanding component, it can be tagged as
emma:uninterpreted
as in the following example:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="understanding"
emma:process="http://example.com/mynlu.xml"
emma:uninterpreted="true"
emma:tokens="From Cambridge to London tomorrow"
emma:medium="acoustic" emma:mode="voice"/>
</emma:emma>
The emma:interpretation
MUST be empty if the emma:interpretation
element
is annotated with emma:uninterpreted="true"
.
emma:lang
attributeAnnotation | emma:lang |
---|---|
Definition | An attribute of type xsd:language indicating the
language for the input. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence , and
application instance data. |
The emma:lang
annotation is used to indicate the
human language for the input that it annotates. The values of the
emma:lang
attribute are language identifiers as
defined by IETF Best Current Practice 47 [BCP47].
For example, emma:lang="fr"
denotes French, and
emma:lang="en-US"
denotes US English.
emma:lang
MAY be applied to any
emma:interpretation
element. Its annotative scope
follows the annotative scope of these elements. Unlike the
xml:lang
attribute in XML, emma:lang
does
not specify the language used by element contents or attribute
values.
The following example shows the use of emma:lang
for annotating an input interpretation.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1" emma:lang="fr"
emma:medium="acoustic" emma:mode="voice">
<answer>arretez</answer>
</emma:interpretation>
</emma:emma>
Many kinds of input including some inputs made through pen,
computer vision, and other kinds of sensors are inherently
non-linguistic. Examples include drawing areas, arrows etc. using a
pen and music input for tune recognition. If these non-linguistic
inputs are annotated with emma:lang
then they MUST be
annotated as emma:lang="zxx"
. For example, pen input
where a user circles an area on map display could be represented as
follows where emma:lang="zxx"
indicates that the ink
input is not in any human language.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="pen1"
emma:medium="tactile"
emma:mode="ink"
emma:lang="zxx">
<location>
<type>area</type>
<points>42.1345 -37.128 42.1346 -37.120 ... </points>
</location>
</emma:interpretation>
</emma:emma>
If inputs for which there is no information about whether the
source input is in a particular human language, and if so which
language, are annotated with emma:lang,
then they MUST
be annotated as emma:lang=""
. Furthermore, in cases
where there is not explicit emma:lang
annotation, and
none is inherited from a higher element in the document, the
default value for emma:lang
is ""
meaning
that there is no information about whether the source input is in a
language and if so which language.
The xml:lang
and emma:lang
attributes
serve uniquely different and equally important purposes. The role
of the xml:lang
attribute in XML 1.0 is to indicate
the language used for character data content in an XML element or
document. In contrast, the emma:lang
attribute is used
to indicate the language employed by a user when entering an input.
Critically, emma:lang
annotates the language of the
signal originating from the user rather than the specific tokens
used at a particular stage of processing. This is most clearly
illustrated through consideration of an example involving multiple
stages of processing of a user input. Consider the following
scenario: EMMA is being used to represent three stages in the
processing of a spoken input to an system for ordering products.
The user input is in Italian, after speech recognition, the user
input is first translated into English, then a natural language
understanding system converts the English translation into a
product ID (which is not in any particular language). Since the
input signal is a user speaking Italian, the emma:lang
will be emma:lang="it"
on all of these three stages of
processing. The xml:lang
attribute, in contrast, will
initially be "it"
, after translation the
xml:lang
will be "en-US"
, and after
language understanding it will be "zxx"
since the
product ID is non-linguistic content. The following are examples of
EMMA documents corresponding to these three processing stages,
abbreviated to show the critical attributes for discussion here.
Note that <transcription>
,
<translation>
, and
<understanding>
are application namespace
attributes, not part of the EMMA markup.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">
<transcription xml:lang="it">condizionatore</transcription>
</emma:interpretation>
</emma:emma>
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">
<translation xml:lang="en-US">air conditioner</translation>
</emma:interpretation>
</emma:emma>
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">
<understanding xml:lang="zxx">id1456</understanding>
</emma:interpretation>
</emma:emma>
In order to handle inputs involving multiple
languages, such as through code switching, the
emma:lang
tag MAY contain several language identifiers
separated by spaces.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1"
emma:tokens="please stop arretez s'il vous plait"
emma:lang="en fr"
emma:medium="acoustic" emma:mode="voice">
<command> CANCEL </command>
</emma:interpretation>
</emma:emma>
emma:signal
and emma:signal-size
attributesAnnotation | emma:signal |
---|---|
Definition | An attribute of type xsd:anyURI referencing the
input signal. |
Applies to | emma:interpretation , emma:one-of ,
emma:group , emma:sequence ,
and application instance data. |
Annotation | emma:signal-size |
Definition | An attribute of type xsd:nonNegativeInteger
specifying the size in eight bit octets of the referenced
source. |
Applies to | emma:interpretation , emma:one-of ,
emma:group , emma:sequence ,
and application instance data. |
A URI reference to the signal that originated the input
recognition process MAY be represented in EMMA using the
emma:signal
annotation. For example, in the case
of speech recognition, the emma:signal
attribute is
the annotation used to reference the audio that was recognized. The
MIME type of the audio can be indicated using emma:media-type
.
Here is an example where the reference to a speech signal is
represented using the emma:signal
annotation on the
emma:interpretation
element:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="intp1"
emma:signal="http://example.com/signals/sg23.bin"
emma:medium="acoustic" emma:mode="voice">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
</emma:interpretation>
</emma:emma>
The emma:signal-size
annotation can be used to
declare the exact size of the associated signal in 8-bit octets. An
example of the use of an EMMA document to represent a recording,
with emma:signal-size
indicating the size is as
follows:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="intp1"
emma:medium="acoustic"
emma:mode="voice"
emma:function="recording"
emma:uninterpreted="true"
emma:signal="http://example.com/signals/recording.mpg"
emma:signal-size="82102"
emma:duration="10000">
</emma:interpretation>
</emma:emma>
emma:media-type
attributeAnnotation | emma:media-type |
---|---|
Definition | An attribute of type xsd:string holding the MIME
type associated with the signal's data format. |
Applies to | emma:interpretation , emma:one-of ,
emma:group , emma:sequence ,
emma:endpoint , and application instance
data. |
The data format of the signal that originated the input MAY be
represented in EMMA using the emma:media-type
annotation. An initial set of MIME media types is defined by [RFC2046].
Here is an example where the media type for the ETSI ES 202 212
audio codec for Distributed Speech Recognition (DSR) is applied to
the emma:interpretation
element. The example also
specifies an optional sampling rate of 8 kHz and maxptime of 40
milliseconds.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="intp1" emma:signal="http://example.com/signals/signal.dsr" emma:media-type="audio/dsr-es202212; rate:8000; maxptime:40" emma:medium="acoustic" emma:mode="voice"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
emma:confidence
attributeAnnotation | emma:confidence |
---|---|
Definition | An attribute of type xsd:decimal in range 0.0 to
1.0, indicating the processor's confidence in the result. |
Applies to | emma:interpretation , emma:one-of ,
emma:group , emma:sequence ,
emma:annotation , and application instance data. |
The confidence score in EMMA is used to indicate the quality of the input the
processor or annotator's confidence in the assignment of the
interpretation to the input, and if confidence is annotated
on an input it MUST be given as the value of
emma:confidence
. The confidence score MUST be a number
in the range from 0.0 to 1.0 inclusive. A value of 0.0 indicates
minimum confidence, and a value of 1.0 indicates maximum
confidence. Note that emma:confidence
represents not
only the confidence of the speech recognizer, but rather more generally
the confidence of the whatever processor was responsible for
creating the EMMA result, based on whatever evidence it has. For a
natural language interpretation, for example, this might include
semantic heuristics in addition to speech recognition scores.
Moreover, the confidence score values do not have to be interpreted
as probabilities. In fact confidence score values are
platform-dependent, since their computation is likely to differ
between platforms and different EMMA processors. Confidence scores
are annotated explicitly in EMMA in order to provide this
information to the subsequent processes for multimodal interaction.
The example below illustrates how confidence scores are annotated
in EMMA.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="nbest1"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="meaning1" emma:confidence="0.6">
<location>Boston</location>
</emma:interpretation>
<emma:interpretation id="meaning2" emma:confidence="0.4">
<location> Austin </location>
</emma:interpretation>
</emma:one-of>
</emma:emma>
In addition to its use as an attribute on the EMMA
interpretation and container elements, the
emma:confidence
attribute MAY also be used to assign
confidences to elements in instance data in the application
namespace. This can be seen in the following example, where the
<destination>
and <origin>
elements have confidences.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="meaning1" emma:confidence="0.6"
emma:medium="acoustic" emma:mode="voice">
<destination emma:confidence="0.8"> Boston</destination>
<origin emma:confidence="0.6"> Austin </origin>
</emma:interpretation>
</emma:emma>
Although in general instance data can be represented in XML using a combination of elements and attributes in the application namespace, EMMA does not provide a standard way to annotate processors' confidences in attributes. Consequently, instance data that is expected to be assigned confidences SHOULD be represented using elements, as in the above example.
emma:source
attributeAnnotation | emma:source |
---|---|
Definition | An attribute of type xsd:anyURI referencing the
source of input. |
Applies to | emma:interpretation , emma:one-of ,
emma:group , emma:sequence , and
application instance data. |
The source of an interpreted input MAY be represented in EMMA as
a URI resource using the emma:source
annotation. Here
is an example that shows different input sources for different
input interpretations.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
xmlns:myapp="http://www.example.com/myapp">
<emma:one-of id="nbest1"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="intp1"
emma:source="http://example.com/microphone/NC-61">
<myapp:destination>Boston</myapp:destination>
</emma:interpretation>
<emma:interpretation id="intp2"
emma:source="http://example.com/microphone/NC-4024">
<myapp:destination>Austin</myapp:destination>
</emma:interpretation>
</emma:one-of>
</emma:emma>
The start and end times for input MAY be indicated using either
absolute timestamps or relative timestamps. Both are in
milliseconds for ease in processing timestamps. Note that the
ECMAScript Date object's getTime()
function is a
convenient way to determine the absolute time.
emma:start
, emma:end
attributesAnnotation | emma:start, emma:end |
---|---|
Definition | Attributes of type
xsd:nonNegativeInteger indicating the absolute
starting and ending times of an input in terms of the number of
milliseconds since 1 January 1970 00:00:00 GMT |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:arc , and application instance
data |
Here is an example of a timestamp for an absolute time.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1"
emma:start="1087995961542"
emma:end="1087995963542"
emma:medium="acoustic" emma:mode="voice">
<destination>Chicago</destination>
</emma:interpretation>
</emma:emma>
The emma:start
and emma:end
annotations on an input MAY be identical, however the
emma:end
value MUST NOT be less than the
emma:start
value.
emma:time-ref-uri
,
emma:time-ref-anchor-point
,
emma:offset-to-start
attributesAnnotation | emma:time-ref-uri |
---|---|
Definition | Attribute of type xsd:anyURI indicating the URI
used to anchor the relative timestamp. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:lattice , and application instance
data |
Annotation | emma:time-ref-anchor-point |
Definition | Attribute with a value of start or
end , defaulting to start . It indicates
whether to measure the time from the start or end of the interval
designated with emma:time-ref-uri . |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:lattice , and application instance
data |
Annotation | emma:offset-to-start |
Definition | Attribute of type xsd:integer ,
defaulting to zero. It specifies the offset in milliseconds for the
start of input from the anchor point designated with
emma:time-ref-uri and
emma:time-ref-anchor-point |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:arc , and application instance
data |
Relative timestamps define the start of an input relative to the start or end of a reference interval such as another input.
The reference interval is designated with
emma:time-ref-uri
attribute. This MAY be combined with
emma:time-ref-anchor-point
attribute to specify
whether the anchor point is the start or end of this interval. The
start of an input relative to this anchor point is then specified
with emma:offset-to-start
attribute.
Here is an example where the referenced input is in the same document:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:sequence> <emma:interpretation id="int1" emma:medium="acoustic" emma:mode="voice"> <origin>Denver</origin> </emma:interpretation> <emma:interpretation id="int2" emma:medium="acoustic" emma:mode="voice" emma:time-ref-uri="#int1" emma:time-ref-anchor-point="start" emma:offset-to-start="5000"> <destination>Chicago</destination> </emma:interpretation> </emma:sequence> </emma:emma>
Note that the reference point refers to an input, but not necessarily to a complete input. For example, if a speech recognizer timestamps each word in an utterance, the anchor point might refer to the timestamp for just one word.
The absolute and relative timestamps are not mutually exclusive; that is, it is possible to have both relative and absolute timestamp attributes on the same EMMA container element.
Timestamps of inputs collected by different devices will be subject to variation if the times maintained by the devices are not synchronized. This concern is outside of the scope of the EMMA specification.
emma:duration
attributeAnnotation | emma:duration |
---|---|
Definition | Attribute of type
xsd:nonNegativeInteger , defaulting to zero. It
specifies the duration of the input in milliseconds. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:arc , and application instance
data |
The duration of an input in milliseconds MAY be specified with
the emma:duration
attribute. The
emma:duration
attribute MAY be used either in
combination with timestamps or independently, for example in the
annotation of speech corpora.
In the following example, the duration of the signal that gave
rise to the interpretation is indicated using
emma:duration
.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1" emma:duration="2300"
emma:medium="acoustic" emma:mode="voice">
<origin>Denver</origin>
</emma:interpretation>
</emma:emma>
This section is informative.
The following table provides guidance on how to determine the values of relative timestamps on a composite input.
emma:time-ref-uri |
If the reference interval URI is the same for both inputs then it should be the same for the composite input. If it is not the same then relative timestamps will have to be resolved to absolute timestamps in order to determine the combined timestamp. . |
emma:time-ref-anchor-point |
If the anchor value is the same for both inputs then it should be the same for the composite input. If it is not the same then relative timestamps will have to be resolved to absolute timestamps in order to determine the combined timestamp. |
emma:offset-to-start |
Given that the emma:time-ref-uri and
emma:time-ref-anchor-point are the same for both
combining inputs, then the emma:offset-to-start for
the combination should be the lesser of the two. If they are not
the same then relative timestamps will have to be resolved to
absolute timestamps in order to determine the combined
timestamp. |
emma:duration |
Given that the emma:time-ref-uri and
emma:time-ref-anchor-point are the same for both
combining inputs, then the emma:duration is calculated
as follows. Add together the emma:offset-to-start and
emma:duration for each of the inputs. Take whichever
of these is greater and subtract from it the lesser of the
emma:offset-to-start values in order to determine the
combined duration. If emma:time-ref-uri and
emma:time-ref-anchor-point are not the same then
relative timestamps will have to be resolved to absolute timestamps
in order to determine the combined timestamp. |
emma:medium
, emma:mode
,
emma:function
, emma:verbal
attributesAnnotation | emma:medium |
---|---|
Definition | An attribute of type xsd:nmtokens
which contains a space delimited set of values from the
set {acoustic , tactile ,
visual }. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:endpoint , and application instance data |
Annotation | emma:mode |
Definition | An attribute of type xsd:nmtokens
which contains a space delimited set of values from an
open set of values including: {voice ,
dtmf , ink , gui ,
keys , video , photograph ,
...}. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:endpoint , and application instance data |
Annotation | emma:function |
Definition | An attribute of type xsd:string constrained to
values in the open set {recording ,
transcription , dialog ,
verification , ...}. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence , and
application instance data |
Annotation | emma:verbal |
Definition | An attribute of type xsd:boolean . |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence , and
application instance data |
Annotation | emma:device-type |
Definition | The type of device, or list of types of device through
which the input is captured. An attribute of type
xsd:nmtokens which contains a space delimited set of
values from an open set of values including:
{microphone , touchscreen ,
mouse , keypad , keyboard ,
pen , joystick , touchpad ,
scanner , camera_2d ,
camera_3d , thumbwheel ...}. |
Applies to | emma:interpretation ,
emma:group , emma:one-of ,
emma:sequence , and application instance
data |
Annotation | emma:expressed-through |
Definition | The modality, or list of modalities, through which the
interpretation is expressed. An attribute of type
xsd:nmtokens which contains a space delimited set of
values from an open set of values including: {gaze ,
face , head , torso ,
hands , leg , locomotion ,
posture , physiology , ...}. |
Applies to | emma:interpretation ,
emma:group , emma:one-of ,
emma:sequence , and application instance
data |
EMMA provides two properties for the annotation of input
modality. One indicating the broader medium or channel
(emma:medium
) and another indicating the specific mode
of communication used on that channel (emma:mode
). The
input medium is defined from the users perspective and indicates
whether they use their voice (acoustic
), touch
(tactile
), or visual appearance/motion
(visual
) as input. Tactile includes most
hand-on input device types such as pen, mouse, keyboard, and
touch screen. Visual is used for camera input.
emma:medium = space delimited sequence of values from the set:
[acoustic|tactile|visual]
The mode property provides the ability to distinguish between
different modes of communication that may be within a particular
medium. For example, in the tactile medium, modes include
electronic ink (ink
), and pointing and clicking on a
graphical user interface (gui
).
emma:mode = space delimited sequence of values from the set: [voice|dtmf|ink|gui|keys|video|photograph| ... ]
The emma:medium
classification is based on the
boundary between the user and the device that they use. For
emma:medium="tactile"
the user physically touches the
device in order to provide input. For
emma:medium="visual"
the user's movement is captured
by sensors (cameras, infrared) resulting in an input to the system.
In the case where emma:medium="acoustic"
the user
provides input to the system by producing an acoustic signal. Note
then that DTMF input will be classified as
emma:medium="tactile"
since in order to provide DTMF
input the user physically presses keys on a keypad.
In order to clarify the difference between
emma:medium
and emma:mode
consider the
following examples of different ways to capture drawn input. If the
user input consists of drawing it will be classified as
emma:mode="ink"
. If the user physically draws on a
touch sensitive screen then the input is classified as
emma:medium ="tactile"
since the user interacts with
the system by direct contact. If instead the user draws on a
tabletop and their input is captured by a camera mounted above (or
below) the surface then the input is emma:medium
="visual"
. Similarly, drawing on a large screen display
using hand gestures made in space and sensed with a camera will be
classified as emma:mode="ink"
and emma:medium
="visual"
.
While emma:medium
and emma:mode
are
optional on specific elements such as
emma:interpretation
and emma:one-of
, note
that all EMMA interpretations must be annotated for
emma:medium
and emma:mode
, so either
these attributes must appear directly on
emma:interpretation
or they must appear on an ancestor
emma:one-of
node or they must appear on an earlier
stage of the derivation listed in emma:derivation
.
The emma:device-type
annotation can be used to
indicate the specific type of device used to capture the input.
This allow for differentiation of, multiple different
tactile
inputs within the ink
mode, such
as touchscreen input, pen, and mouse.
emma:device-type = space delimited sequence of values from the set:
[microphone|keypad|keyboard|touchscreen|touchpad|
mouse|pen|joystick|thumbwheel|
camera_2d|camera_3d|scanner... ]
The emma:device-type
attribute SHOULD be used to
indicate the general category of the sensor used to captured the
input. The specific model number or characteristics SHOULD be
captured instead using emma:process
(Section
4.2.2).
Orthogonal to the mode, user inputs can also be classified with respect to their communicative function. This enables a simpler mode classification.
emma:function = [recording|transcription|dialog|verification| ... ]
For example, speech can be used for recording (e.g. voicemail), transcription (e.g. dictation), dialog (e.g. interactive spoken dialog systems), and verification (e.g. identifying users through their voiceprints).
EMMA also supports an additional property
emma:verbal
which distinguishes verbal use of an input
mode from non-verbal. This MAY be used to distinguish the use of
electronic ink to convey handwritten commands from the user of
electronic ink for symbolic gestures such as circles and arrows.
Handwritten commands, such as writing downtown in order to
change a map display to show the downtown are classified as verbal
(emma:function="dialog" emma:verbal="true"
). Pen
gestures (arrows, lines, circles, etc), such as circling a
building, are classified as non-verbal dialog
(emma:function="dialog" emma:verbal="false"
). The use
of handwritten words to transcribe an email message is classified
as transcription (emma:function="transcription"
emma:verbal="true"
).
emma:verbal = [true|false]
Handwritten words and ink gestures are typically recognized using different kinds of recognition components (handwriting recognizer vs. gesture recognizer) and the verbal annotation will be added by the recognition component which classifies the input. The original input source, a pen in this case, will not be aware of this difference. The input source identifier will tell you that the input was from a pen of some kind but will not tell you if the mode of input was handwriting (show downtown) or gesture (e.g. circling an object or area).
Here is an example of the EMMA annotation for a pen input where the user's ink is recognized as either a word ("Boston") or as an arrow:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of id="nbest1"> <emma:interpretation id="interp1" emma:confidence="0.6" emma:medium="tactile" emma:mode="ink" emma:device-type="pen" emma:function="dialog" emma:verbal="true"> <location>Boston</location> </emma:interpretation> <emma:interpretation id="interp2" emma:confidence="0.4" emma:medium="tactile" emma:mode="ink" emma:device-type="pen" emma:function="dialog" emma:verbal="false"> <direction>45</direction> </emma:interpretation> </emma:one-of> </emma:emma>
Here is an example of the EMMA annotation for a spoken command which is recognized as either "Boston" or "Austin":
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of> <emma:interpretation id="interp1" emma:confidence="0.6" emma:medium="acoustic" emma:mode="voice" emma:device-type="microphone" emma:function="dialog" emma:verbal="true"> <location>Boston</location> </emma:interpretation> <emma:interpretation id="interp2" emma:confidence="0.4" emma:medium="acoustic" emma:mode="voice" emma:device-type="microphone" emma:function="dialog" emma:verbal="true"> <location>Austin</location> </emma:interpretation> </emma:one-of> </emma:emma>
The following table shows the relationship between the medium, mode, and function properties and serves as an aid for classifying inputs. For the dialog function it also shows some examples of the classification of inputs as verbal vs. non-verbal.
Medium | Device-type | Mode | Function | |||
---|---|---|---|---|---|---|
recording | dialog | transcription | verification | |||
acoustic | microphone | voice | audiofile (e.g. voicemail) | spoken command / query / response (verbal = true) | dictation | speaker recognition |
singing a note (verbal = false) | ||||||
tactile | keypad | dtmf | audiofile / character stream | typed command / query / response (verbal = true) | text entry (T9-tegic, word completion, or word grammar) | password / pin entry |
command key "Press 9 for sales" (verbal = false) | ||||||
keyboard | dtmf | character / key-code stream | typed command / query / response (verbal = true) | typing | password / pin entry | |
command key "Press S for sales" (verbal = false) | ||||||
pen | ink | trace, sketch | handwritten command / query / response (verbal = true) | handwritten text entry | signature, handwriting recognition | |
gesture (e.g. circling building) (verbal = false) | ||||||
gui | N/A | tapping on named button (verbal = true) | soft keyboard | password / pin entry | ||
drag and drop, tapping on map (verbal = false) | ||||||
touchscreen | ink | trace, sketch | handwritten command / query / response (verbal = true) | handwritten text entry | signature, handwriting recognition | |
gesture (e.g. circling building) (verbal = false) | ||||||
gui | N/A | tapping on named button (verbal = true) | soft keyboard | password / pin entry | ||
drag and drop, tapping on map (verbal = false) | ||||||
mouse | ink | trace, sketch | handwritten command / query / response (verbal = true) | handwritten text entry | N/A | |
gesture (e.g. circling building) (verbal = false) | ||||||
gui | N/A | clicking named button (verbal = true) | soft keyboard | password / pin entry | ||
drag and drop, clicking on map (verbal = false) | ||||||
joystick | ink | trace,sketch | gesture (e.g. circling building) (verbal = false) | N/A | N/A | |
gui | N/A | pointing, clicking button / menu (verbal = false) | soft keyboard | password / pin entry | ||
visual | scanner | photograph | image | handwritten command / query / response (verbal = true) | optical character recognition, object/scene recognition (markup, e.g. SVG) | N/A |
drawings and images (verbal = false) | ||||||
camera_2d | photograph | image | objects (verbal = false) | visual object/scene recognition | face id, retinal scan | |
camera_2d | video | movie | sign language (verbal = true) | audio/visual recognition | face id, gait id, retinal scan | |
face / hand / arm / body gesture (e.g. pointing, facing) (verbal = false) |
The emma:expressed-through
attribute describes the
modality through which an input is produced, usually by a human
being. This differs from the specific mode of communication
(emma:mode
) and the broader channel or medium
(emma:medium
). For example in the case where a user
provides ink input on a touchscreen using their hands the input
would be classified as emma:medium="tactile"
,
emma:mode="ink"
, and
emma:expressed-through="hands"
. The
emma:expressed-through
attribute is not specific about
the sensors used for observing the modality. These can be specified
using emma:medium
and emma:mode
attributes.
This mechanism allows for more fine grained annotation of the
specific body part that is analyzed in the assignment of an EMMA
result. For example, in an emotion recognition task using computer
vision techniques on video camera input,
emma:medium="visual"
and
emma:mode="video"
. If the face is being analyzed to
determine the result then
emma:expressed-through="face"
while if the body motion
is being analyzed then
emma:expressed-through="locomotion"
.
The list of values provided covers a broad range of modalities through which inputs may be expressed. These values SHOULD be used if they are appropriate. The list is an open set in order to allow for more fine-grained distinctions such as "eyes" vs. "mouth" etc.
emma:hook
attributeAnnotation | emma:hook |
---|---|
Definition | An attribute of type xsd:string constrained to
values in the open set {voice , dtmf ,
ink , gui , keys ,
video , photograph , ...} or the wildcard
any |
Applies to | Application instance data |
The attribute emma:hook
MAY be used to mark the
elements in the application semantics within an
emma:interpretation
which are expected to be
integrated with content from input in another mode to yield a
complete interpretation. The emma:mode
to be
integrated at that point in the application semantics is indicated
as the value of the emma:hook
attribute. The possible
values of emma:hook
are the list of input modes that
can be values of emma:mode
(see Section
4.2.11). In addition to these, the value of
emma:hook
can also be the wildcard any
indicating that the other content can come from any source. The
annotation emma:hook
differs in semantics from
emma:mode
as follows. Annotating an element in the
application semantics with emma:mode="ink"
indicates
that that part of the semantics came from the ink
mode. Annotating an element in the application semantics with
emma:hook="ink"
indicates that part of the semantics
needs to be integrated with content from the ink
mode.
To illustrate the use of emma:hook
consider an
example composite input in which the user says "zoom in here" in
the speech input mode while drawing an area on a graphical display
in the ink input mode. The fact that the
location
element needs to come from the
ink
mode is indicated by annotating this application
namespace element using emma:hook
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation emma:medium="acoustic" emma:mode="voice">
<command>
<action>zoom</action>
<location emma:hook="ink">
<type>area</type>
</location>
</command>
</emma:interpretation>
</emma:emma>
For more detailed explanation of this example see Appendix C.
emma:cost
attributeAnnotation | emma:cost |
---|---|
Definition | An attribute of type xsd:decimal in range 0.0 to
10000000, indicating the processor's cost or weight associated with
an input or part of an input. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:arc , emma:node , and application
instance data. |
The cost annotation in EMMA indicates the weight or cost
associated with an user's input or part of their input. The most
common use of emma:cost
is for representing the costs
encoded on a lattice output from speech recognition or other
recognition or understanding processes. emma:cost
MAY
also be used to indicate the total cost associated with particular
recognition results or semantic interpretations.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="meaning1" emma:cost="1600">
<location>Boston</location>
</emma:interpretation>
<emma:interpretation id="meaning2" emma:cost="400">
<location> Austin </location>
</emma:interpretation>
</emma:one-of>
</emma:emma>
emma:endpoint-role
,
emma:endpoint-address
, emma:port-type
,
emma:port-num
, emma:message-id
,
emma:service-name
,
emma:endpoint-pair-ref
,
emma:endpoint-info-ref
attributesAnnotation | emma:endpoint-role |
---|---|
Definition | An attribute of type xsd:string constrained to
values in the set {source , sink ,
reply-to , router }. |
Applies to | emma:endpoint |
Annotation | emma:endpoint-address |
Definition | An attribute of type xsd:anyURI that uniquely
specifies the network address of the
emma:endpoint . |
Applies to | emma:endpoint |
Annotation | emma:port-type |
Definition | An attribute of type xsd:QName that specifies the
type of the port. |
Applies to | emma:endpoint |
Annotation | emma:port-num |
Definition | An attribute of type xsd:nonNegativeInteger that
specifies the port number. |
Applies to | emma:endpoint |
Annotation | emma:message-id |
Definition | An attribute of type xsd:anyURI that specifies the
message ID associated with the data. |
Applies to | emma:endpoint |
Annotation | emma:service-name |
Definition | An attribute of type xsd:string that specifies the
name of the service. |
Applies to | emma:endpoint |
Annotation | emma:endpoint-pair-ref |
Definition | An attribute of type xsd:anyURI that specifies the
pairing between sink and source endpoints. |
Applies to | emma:endpoint |
Annotation | emma:endpoint-info-ref |
Definition | An attribute of type xsd:IDREF referring to the
id attribute of an emma:endpoint-info
element. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence , and
application instance data. |
The emma:endpoint-role
attribute specifies the role
that the particular emma:endpoint
performs in
multimodal interaction. The role value sink
indicates
that the particular endpoint is the receiver of the input data. The
role value source
indicates that the particular
endpoint is the sender of the input data. The role value
reply-to
indicates that the particular
emma:endpoint
is the intended endpoint for the reply.
The same emma:endpoint-address
MAY appear in multiple
emma:endpoint
elements, provided that the same
endpoint address is used to serve multiple roles, e.g. sink,
source, reply-to, router, etc., or associated with multiple
interpretations.
The emma:endpoint-address
specifies the network
address of the emma:endpoint
, and
emma:port-type
specifies the port type of the
emma:endpoint
. The emma:port-num
annotates the port number of the endpoint (e.g. the typical port
number for an http endpoint is 80). The
emma:message-id
annotates the message ID information
associated with the annotated input. This meta information is used
to establish and maintain the communication context for both
inbound processing and outbound operation. The service
specification of the emma:endpoint
is annotated by
emma:service-name
which contains the definition of the
service that the emma:endpoint
performs. The matching
of the sink
endpoint and its pairing
source
endpoint is annotated by the
emma:endpoint-pair-ref
attribute. One sink endpoint
MAY link to multiple source endpoints through
emma:endpoint-pair-ref
. Further bounding of the
emma:endpoint
is possible by using the annotation of
emma:group
(see Section
3.3.2).
The emma:endpoint-info-ref
attribute associates the
EMMA result in the container element with an
emma:endpoint-info
element.
The following example illustrates the use of these attributes in multimodal interactions where multiple modalities are used.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" xmlns:ex="http://www.example.com/emma/port"> <emma:endpoint-info id="audio-channel-1" > <emma:endpoint id="endpoint1" emma:endpoint-role="sink" emma:endpoint-address="135.61.71.103" emma:port-num="50204" emma:port-type="rtp" emma:endpoint-pair-ref="endpoint2" emma:media-type="audio/dsr-202212; rate:8000; maxptime:40" emma:service-name="travel" emma:mode="voice"> <ex:app-protocol>SIP</ex:app-protocol> </emma:endpoint> <emma:endpoint id="endpoint2" emma:endpoint-role="source" emma:endpoint-address="136.62.72.104" emma:port-num="50204" emma:port-type="rtp" emma:endpoint-pair-ref="endpoint1" emma:media-type="audio/dsr-202212; rate:8000; maxptime:40" emma:service-name="travel" emma:mode="voice"> <ex:app-protocol>SIP</ex:app-protocol> </emma:endpoint> </emma:endpoint-info> <emma:endpoint-info id="ink-channel-1"> <emma:endpoint id="endpoint3" emma:endpoint-role="sink" emma:endpoint-address="http://emma.example/sink" emma:endpoint-pair-ref="endpoint4" emma:port-num="80" emma:port-type="http" emma:message-id="uuid:2e5678" emma:service-name="travel" emma:mode="ink"/> <emma:endpoint id="endpoint4" emma:endpoint-role="source" emma:port-address="http://emma.example/source" emma:endpoint-pair-ref="endpoint3" emma:port-num="80" emma:port-type="http" emma:message-id="uuid:2e5678" emma:service-name="travel" emma:mode="ink"/> </emma:endpoint-info> <emma:group> <emma:interpretation id="int1" emma:start="1087995961542" emma:end="1087995963542" emma:endpoint-info-ref="audio-channel-1" emma:medium="acoustic" emma:mode="voice"> <destination>Chicago</destination> </emma:interpretation> <emma:interpretation id="int2" emma:start="1087995961542" emma:end="1087995963542" emma:endpoint-info-ref="ink-channel-1" emma:medium="acoustic" emma:mode="voice"> <location> <type>area</type> <points>34.13 -37.12 42.13 -37.12 ... </points> </location> </emma:interpretation> </emma:group> </emma:emma>
emma:grammar
element: emma:grammar-ref
attributeAnnotation | emma:grammar-ref |
---|---|
Definition | An attribute of type xsd:IDREF referring to the
id attribute of an emma:grammar
element. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence , and
emma:active . |
The emma:grammar-ref
attribute associates the EMMA
result in the container element with an emma:grammar
element. The emma:grammar-ref
attribute is also
used on emma:active
elements within
emma:grammar-active
in order to indicate which
grammars are active during the processing of an input (4.1.4).
The following example shows the use of
emma:grammar-ref
on the container element
emma:interpretation
and on the
emma:active
element:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:grammar id="gram1" grammar-type="application/srgs-xml ref="someURI"/> <emma:grammar id="gram2" grammar-type="application/srgs-xml ref="anotherURI"/> <emma:one-of id="r1" emma:medium="acoustic" emma:mode="voice"> <emma:grammar-active> <emma:active emma:grammar-ref="gram1"/> <emma:active emma:grammar-ref="gram2"/> </emma:grammar-active> <emma:interpretation id="int1" emma:grammar-ref="gram1"> <origin>Boston</origin> </emma:interpretation> <emma:interpretation id="int2" emma:grammar-ref="gram1"> <origin>Austin</origin> </emma:interpretation> <emma:interpretation id="int3" emma:grammar-ref="gram2"> <command>help</command> </emma:interpretation> </emma:one-of> </emma:emma>
emma:model
element: emma:model-ref
attributeAnnotation | emma:model-ref |
---|---|
Definition | An attribute of type xsd:IDREF referring to the
id attribute of an emma:model
element. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence , and
application instance data. |
The emma:model-ref
annotation associates the EMMA
result in the container element with an emma:model
element.
Example:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:model id="model1" ref="someURI"/>
<emma:model id="model2" ref="anotherURI"/>
<emma:one-of id="r1"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="int1" emma:model-ref="model1">
<origin>Boston</origin>
</emma:interpretation>
<emma:interpretation id="int2" emma:model-ref="model1">
<origin>Austin</origin>
</emma:interpretation>
<emma:interpretation id="int3" emma:model-ref="model2">
<command>help</command>
</emma:interpretation>
</emma:one-of>
</emma:emma>
emma:dialog-turn
attributeAnnotation | emma:dialog-turn |
---|---|
Definition | An attribute of type xsd:string referring to the
dialog turn associated with a given container element. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , and emma:sequence . |
The emma:dialog-turn
annotation associates the EMMA
result in the container element with a dialog turn. The syntax and
semantics of dialog turns is left open to suit the needs of
individual applications. For example, some applications might use
an integer value, where successive turns are represented by
successive integers. Other applications might combine a name of a
dialog participant with an integer value representing the turn
number for that participant. Ordering semantics for comparison of
emma:dialog-turn
is deliberately unspecified and left
for applications to define.
Example:
<emma:emma version="1.1"
emma="http://www.w3.org/2003/04/emma"
xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1" emma:dialog-turn="u8"
emma:medium="acoustic" emma:mode="voice">
<quantity>3</quantity>
</emma:interpretation>
</emma:emma>
emma:result-format
attributeAnnotation | emma:result-format |
---|---|
Definition | An attribute of type xsd:string containing a MIME
type which indicates the representation used in the application
semantics that appears within the contained
emma:interpretation . |
Applies to | emma:interpretation , emma:literal ,
emma:group , emma:one-of , and
emma:sequence . |
Typically, the application semantics contained within EMMA is in
XML format, as can be seen in examples throughout the
specification. The application semantics can be also be a simple
string, contained within emma:literal
. EMMA also
accommodates other semantic representation formats such as JSON
(JavaScript Object Notation [JSON]
)using CDATA within emma:literal
. The function of the
emma:result-format
attribute is to make explicit the
specific format of the semantic representation. The value is a MIME
type. The value to generally be used for XML semantic
representations is text/xml
. If
emma:result-format
is not specified, the assumed
default is text/xml
. If a more specific XML MIME type
is being used then this should be indicated explicitly in
emma:result-format
, e.g. for RDF the
emma:result-format
would be
application/rdf+xml
. In the following example, the
application semantic representation is JSON and the MIME type
application/json
appears in
emma:result-format
indicating to an EMMA processor
what to expect within the contained emma:literal
.
<emma:emma
version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns="http://www.example.com/example">
<emma:interpretation id=“int1"
emma:confidence="0.75”
emma:medium="acoustic"
emma:mode="voice"
emma:verbal="true"
emma:function="dialog"
emma:result-format="application/json"
<emma:literal>
<![CDATA[
{
drink: {
liquid:"coke",
drinksize:"medium"},
pizza: {
number: "3",
pizzasize: "large",
topping: [ "pepperoni", "mushrooms" ]
}
}
]]>
</emma:literal>
</emma:interpretation>
</emma:emma>
Note that while many of the examples of semantic representation
in the specification are simple lists of attributes and values,
EMMA interpretations can contain arbitrarily complex semantic
representations. XML representation can be used for the payload, so
representations can be nested, have attributes, and ID references
can be used to capture aspects of the interpretation such as
variable binding or co-reference. Also using
emma:result-format
and emma:literal
as
above, other kinds of logical representations and notations, not
necessarily XML, can also be carried as EMMA payloads.
emma:info
element: emma:info-ref
attributeAnnotation | emma:info-ref |
---|---|
Definition | An attribute of type xsd:IDREF referring to
the id attribute of an emma:info
element. |
Applies to | emma:interpretation ,
emma:group , emma:one-of ,
emma:sequence , and application instance
data. |
The emma:info-ref
annotation associates the EMMA
result in the container element with a particular
emma:info
element. This allows a single
emma:info
block of application and vendor specific
annotations to apply to multiple different members of an
emma:one-of
or emma:group
or
emma:sequence
. Alternatively, emma:info
could appear separately as a child of each
emma:interpretation
. The benefit of using
emma:info-ref
is it avoids the need to repeat the same
block of emma:info
for multiple different
interpretations.
Example:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:info id="info1">
<customer_type>residential</customer_type>
<service_name>acme_travel_service</service_name>
</emma:info>
<emma:info id="info2">
<customer_type>residential</customer_type>
<service_name>acme_pizza_service</service_name>
</emma:info>
<emma:one-of id="r1" emma:start="1087995961542"
emma:end="1087995963542"
emma:medium="acoustic" emma:mode="voice">
<emma:interpretation id="int1" emma:confidence="0.75"
emma:tokens="flights from boston to denver tomorrow"
emma:info-ref="info1">
<origin>Boston</origin>
<destination>Denver</destination>
</emma:interpretation>
<emma:interpretation id="int2" emma:confidence="0.68"
emma:tokens="pizza with pepperoni and onions"
emma:info-ref="info2">
<order>pizza</order>
<topping>pepperoni</topping>
<topping>onion</topping>
</emma:interpretation>
<emma:interpretation id="int3" emma:confidence="0.38"
emma:tokens="pizza with peppers and cheese"
emma:info-ref="info2">
<order>pizza</order>
<topping>pepperoni</topping>
<topping>cheese</topping>
</emma:interpretation>
</emma:one-of>
emma:process-model
element:
emma:process-model-ref
attributeAnnotation | emma:process-model-ref |
---|---|
Definition | An attribute of type xsd:IDREF referring to the
id attribute of an emma:process-model
element. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence , and
application instance data. |
The emma:process-model-ref
annotation associates
the EMMA result in the container element with an
emma:process-model
element. In the following example
the specific model used to produce two different object recognition
results based on an image as input are indicated on the
interpretations using emma:process-model-ref
which
references an emma:process-model
element under
emma:emma
whose ref
attribute contains
URI identifying the particular model used.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:process-model id="pm1"
type="neural_network"
ref="http://example.com/vision/vehicle"/>
<emma:process-model id="pm2"
type="neural_network"
ref="http://example.com/vision/people"/>
<emma:one-of id="r1"
emma:start="1087995961542"
emma:end="1087995961542"
emma:medium="visual"
emma:mode="image"
emma:process="http://example.com/mycompvision1.xml">>
<emma:interpretation id="int1"
emma:confidence="0.9"
emma:process-model-ref="pm1">
<object>aircraft</object>
</emma:interpretation>
<emma:interpretation id="int2"
emma:confidence="0.1"
emma:process-model-ref="pm2">
<object>person</object>
</emma:interpretation>
</emma:one-of>
</emma:emma>
emma:parameters
element: emma:parameter-ref
attributeAnnotation | emma:parameter-ref |
---|---|
Definition | An attribute of type xsd:IDREF referring to the
id attribute of an emma:parameters
element. |
Applies to | emma:interpretation , emma:group ,
emma:one-of , and emma:sequence . |
The emma:parameter-ref
annotation associates the
EMMA result(s) in the container element it appears on with an
emma:parameters
element that specifies a series of
parameters used to configure the processor that produced those
result(s). This allows a set of parameters to be specified once in
an EMMA document and referred to by multiple different
interpretations. Different configurations of parameters can be
associated with different interpretations. In the example, below
there are two emma:parameters
elements and in the
N-best list of alternative interpretations within
emma:one-of
each emma:interpretation
references the relevant set of parameters using
emma:parameter-ref
.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters id="parameters1" api-ref="voicexml2.1">
<emma:parameter name="speedvsaccuracy" value=".5"/>
<emma:parameter name="sensitivity" value=".6"/>
</emma:parameters> <emma:parameters id="parameters2" api-ref="voicexml2.1">
<emma:parameter name="speedvsaccuracy" value=".7"/>
<emma:parameter name="sensitivity" value=".3"/>
</emma:parameters> <emma:one-of emma:medium="acoustic" emma:mode="voice" emma:process="http://example.com/myasr1.xml">
<emma:interpretation id="int1" emma:parameter-ref="parameters1">
<origin>Boston</origin>
</emma:interpretation> <emma:interpretation id="int2" emma:parameter-ref="parameters2">
<origin>Austin</origin>
</emma:interpretation> </emma:one-of>
</emma:emma>
emma:annotated-tokens
attributeAnnotation | emma:annotated-tokens |
---|---|
Definition | An attribute of type xsd:string holding the
reference sequence of tokens determined by a human annotator |
Applies to | emma:interpretation , emma:group ,
emma:one-of , emma:sequence ,
emma:arc , and application instance data. |
The emma:annotated-tokens
attribute holds a list of
input tokens. In the following description, the term tokens
is used in the computational and syntactic sense of units of
input, and not in the sense of XML tokens. The value
held in emma:annotated-tokens
is the list of the
tokens of input as determined by a human annotator. For example, in
case of speech recognition this will contain the reference string.
The emma:annotated-tokens
annotation MAY be applied
not just to the lexical words and phrases of language but to any
level of input processing. Other examples of tokenization include
phonemes, ink strokes, gestures and any other discrete units of
input at any level.
In the following example, a speech recognizer has processed an
audio input signal and the hypothesized string is "from cambridge
to london tomorrow" contained in emma:tokens
. A human
labeller has listened to the audio and added the reference string
"from canterbury to london today" in the
emma:annotated-tokens
attribute.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:signal="http://example.com/audio/input678.amr"
emma:process="http://example.com/asr/params.xml"
emma:tokens="from cambridge to london tomorrow"
emma:annotated-tokens="from canterbury to london today">
<origin>Cambridge</origin>
<destination>London</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:emma>
In order to provide metadata on the annotation such as the name
of the annotator or time of annotation, the more powerful emma:annotation
element mechanism should be used. This also allows for structured
annotations such as labelling of a semantic interpretation in
XML.
emma:partial-content
Annotation | emma:partial-content |
---|---|
Definition | An attribute of type xsd:Boolean indicating
whether the content of an element is partial and the full element
can be retrieved by retrieving the URI indicated in the
ref attribute on the same element |
Applies to | emma:one-of , emma:group ,
emma:sequence , emma:lattice ,
emma:info , emma:annotation ,
emma:parameters and application instance data. |
The emma:partial-content
attribute is
required on the element it applies to when the content contained
within the element is a subset of the content contained within the
element referred to through the ref
attribute on the
same element. If the local element is empty, but a full document
can be retrieved from the server, then in that case
emma:partial-content
must be true
. If the
element is empty and the element on the server is also empty then
emma:partial-content
must be false
. The
default value in emma:partial-content
is not specified
is false
.
The emma:derived-from
element (Section
4.1.2) can be used to capture both sequential and composite
derivations. This section concerns the scope of EMMA annotations
across sequential derivations of user input connected
using the emma:derived-from
element (Section
4.1.2). Sequential derivations involve processing steps that do
not involve multimodal integration, such as applying natural
language understanding and then reference resolution to a speech
transcription. EMMA derivations describe only single turns of user
input and are not intended to describe a sequence of dialog
turns.
For example, an EMMA document could contain
emma:interpretation
elements for the transcription,
interpretation, and reference resolution of a speech input,
utilizing the id
values: raw
,
better
, and best
respectively:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="raw"
emma:process="http://example.com/myasr1.xml"
emma:medium="acoustic" emma:mode="voice">
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
<emma:interpretation id="better"
emma:process="http://example.com/mynlu1.xml">
<emma:derived-from resource="#raw" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="best"
emma:process="http://example.com/myrefresolution1.xml">
<emma:derived-from resource="#better" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
</emma:interpretation>
</emma:emma>
Each member of the derivation chain is linked to the previous
one by a derived-from
element (Section
4.1.2), which has an attribute resource
that
provides a pointer to the emma:interpretation
from
which it is derived. The emma:process
annotation (Section
4.2.2) provides a pointer to the process used for each stage of
the derivation.
The following EMMA example represents the same derivation as above but with a more fully specified set of annotations:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretation id="raw" emma:process="http://example.com/myasr1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation id="better" emma:process="http://example.com/mynlu1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> </emma:derivation> <emma:interpretation id="best" emma:process="http://example.com/myrefresolution1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
EMMA annotations on earlier stages of the derivation often
remain accurate at later stages of the derivation. Although this
can be captured in EMMA by repeating the annotations on each
emma:interpretation
within the derivation, as in the
example above, there are two disadvantages of this approach to
annotation. First, the repetition of annotations makes the
resulting EMMA documents significantly more verbose. Second, EMMA
processors used for intermediate tasks such as natural language
understanding and reference resolution will need to read in all of
the annotations and write them all out again.
EMMA overcomes these problems by assuming that annotations on
earlier stages of a derivation automatically apply to later stages
of the derivation unless a new value is specified. Later stages of
the derivation essentially inherit annotations from earlier stages
in the derivation. For example, if there was an
emma:source
annotation on the transcription
(raw
) it would also apply to the later stages of the
derivation such as the result of natural language understanding
(better
) or reference resolution
(best
).
Because of the assumption in EMMA that annotations have scope over later stages of a sequential derivation, the example EMMA document above can be equivalently represented as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretation id="raw" emma:process="http://example.com/myasr1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation id="better" emma:process="http://example.com/mynlu1.xml" emma:confidence="0.8"> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> </emma:derivation> <emma:interpretation id="best" emma:process="http://example.com/myrefresolution1.xml"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
The fully specified derivation illustrated above is equivalent to the reduced form derivation following it where only annotations with new values are specified at each stage. These two EMMA documents MUST yield the same result when processed by an EMMA processor.
The emma:confidence
annotation is respecified on
the better
interpretation. This indicates the
confidence score for natural language understanding, whereas
emma:confidence
on the raw
interpretation
indicates the speech recognition confidence score.
In order to determine the full set of annotations that apply to
an emma:interpretation
element an EMMA processor or
script needs to access the annotations directly on that element and
for any that are not specified follow the reference in the
resource
attribute of the
emma:derived-from
element to add in annotations from
earlier stages of the derivation.
The EMMA annotations break down into three groups with respect to their scope in sequential derivations. One group of annotations always holds true for all members of a sequential derivation. A second group is always respecified on each stage of the derivation. A third group may or may not be respecified.
Classification | Annotation |
---|---|
Applies to whole derivation | emma:signal |
emma:signal-size |
|
emma:dialog-turn |
|
emma:source |
|
emma:medium |
|
emma:mode |
|
emma:function |
|
emma:verbal |
|
emma:lang |
|
emma:tokens |
|
emma:start |
|
emma:end |
|
emma:time-ref-uri |
|
emma:time-ref-anchor-point |
|
emma:offset-to-start |
|
emma:duration |
|
Specified at each stage of derivation | emma:derived-from |
emma:process |
|
May be respecified | emma:confidence |
emma:cost |
|
emma:grammar-ref |
|
emma:model-ref |
|
emma:no-input |
|
emma:uninterpreted |
One potential problem with this annotation scoping mechanism is
that earlier annotations could be lost if earlier stages of a
derivation were dropped in order to reduce message size. This
problem can be overcome by considering annotation scope at the
point where earlier derivation stages are discarded and populating
the final interpretation in the derivation with all of the
annotations which it could inherit. For example, if the
raw
and better
stages were dropped the
resulting EMMA document would be:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="best" emma:start="1087995961542" emma:end="1087995963542" emma:process="http://example.com/myrefresolution1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
Annotations on an emma:one-of
element are assumed
to apply to all of the container elements within the
emma:one-of
.
If emma:one-of
appears with another
emma:one-of
then annotations on the parent
emma:one-of
are assumed to apply to the children of
the child emma:one-of
.
Annotations on emma:group
or
emma:sequence
do not apply to their child
elements.
The contents of this section are normative.
A document is a Conforming EMMA Document if it meets both the following conditions:
The EMMA specification and these conformance criteria provide no designated size limits on any aspect of EMMA documents. There are no maximum values on the number of elements, the amount of character data, or the number of characters in attribute values.
Within this specification, the term URI refers to a Universal Resource Identifier as defined in [RFC3986] and extended in [RFC3987] with the new name IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as "Base URI" that are defined or referenced across the whole family of XML specifications.
The EMMA namespace is intended to be used with other XML namespaces as per the Namespaces in XML Recommendation [XMLNS]. Future work by W3C is expected to address ways to specify conformance for documents involving multiple namespaces.
A EMMA processor is a program that can process and/or generate Conforming EMMA documents.
In a Conforming EMMA Processor, the XML parser MUST be able to parse and process all XML constructs defined by XML 1.1 [XML] and Namespaces in XML [XMLNS]. It is not required that a Conforming EMMA Processor uses a validating XML parser.
A Conforming EMMA Processor MUST correctly understand and apply the semantics of each markup element or attribute as described by this document.
There is, however, no conformance requirement with respect to performance characteristics of the EMMA Processor. For instance, no statement is required regarding the accuracy, speed or other characteristics of output produced by the processor. No statement is made regarding the size of input that a EMMA Processor is required to support.
This section is Normative.
This section defines the formal syntax for EMMA documents in terms of a normative XML Schema.
The schema provided here is for the EMMA 1.0 Recommendation. No schema exists as of yet for the EMMA 1.1 Working Draft as it is a work in progress.
There are both an XML Schema and RELAX NG Schema for the EMMA markup. The latest version of the XML Schema for EMMA is available at http://www.w3.org/TR/emma/emma.xsd and the RELAX NG Schema can be found at http://www.w3.org/TR/emma/emma.rng.
For stability it is RECOMMENDED that you use the dated URI available at http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd and http://www.w3.org/TR/2009/REC-emma-20090210/emma.rng.
This section is Normative.
The media type associated with the EMMA: Extensible MultiModal Annotation markup language specification is "application/emma+xml" and the filename suffix is ".emma" as defined in Appendix B.1 of the EMMA: Extensible Multimodal Annotation specification.
emma:hook
and SRGSThis section is Informative.
One of the most powerful aspects of multimodal interfaces is their ability to provide support for user inputs which are distributed over the available input modes. These composite inputs are contributions made by the user within a single turn which have component parts in different modes. For example, the user might say "zoom in here" in the speech mode while drawing an area on a graphical display in the ink mode. One of the central motivating factors for this kind of input is that different kinds of communicative content are best suited to different input modes. In the example of a user drawing an area on a map and saying "zoom in here", the zoom command is easiest to provide in speech but the spatial information, the specific area, is easier to provide in ink.
Enabling composite multimodality is critical in ensuring that multimodal systems support more natural and effective interaction for users. In order to support composite inputs, a multimodal architecture must provide some kind of multimodal integration mechanism. In the W3C Multimodal Interaction Framework [MMI Framework], multimodal integration can be handled by an integration component which follows the application of speech understanding and other kinds of interpretation procedures for individual modes.
Given the broad range of different techniques being employed for multimodal integration and the extent to which this is an ongoing research problem, standardization of the specific method or algorithm used for multimodal integration is not appropriate at this time. In order to facilitate the development and inter-operation of different multimodal integration mechanisms EMMA provides markup language enabling application independent specification of elements in the application markup where content from another mode needs to be integrated. These representation 'hooks' can then be used by different kinds of multimodal integration components and algorithms to drive the process of multimodal integration. In the processing of a composite multimodal input, the result of applying a mode-specific interpretation component to each of the individual modes will be EMMA markup describing the possible interpretation of that input.
One way to build an EMMA representation of a spoken input such
as "zoom in here" is to use grammar rules in the W3C Speech
Recognition Grammar Specification [SRGS]
using the Semantic Interpretation [SISR]
tags to build the application semantics with the
emma:hook
attribute. In this approach [ECMAScript]
is specified in order to build up an object representing the
semantics. The resulting ECMAScript object is then translated to
XML.
For our example case of "zoom in here". The following SRGS rule could be used. The Semantic Interpretation for Speech Recognition specification [SISR] provides a reserved property _nsprefix for indicating the namespace to be used with an attribute.
<rule id="zoom"> zoom in here <tag> $.command = new Object(); $.command.action = "zoom"; $.command.location = new Object(); $.command.location._attributes = new Object(); $.command.location._attributes.hook = new Object(); $.command.location._attributes.hook._nsprefix = "emma"; $.command.location._attributes.hook._value = "ink"; $.command.location.type = "area"; </tag> </rule>
Application of this rule will result in the following ECMAScript object being built.
command: { action: "zoom" location: { _attributes: { hook: { _nsprefix: "emma" _value: "ink" } } type: "area" } }
SI processing in an XML environment would generate the following document:
<command> <action>zoom</action> <location emma:hook="ink"> <type>area</type> </location> </command>
This XML fragment might then appear within an EMMA document as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="voice1" emma:medium="acoustic" emma:mode="voice"> <command> <action>zoom</action> <location emma:hook="ink"> <type>area</type> </location> </command> </emma:interpretation> </emma:emma>
The emma:hook
annotation indicates that this speech
input needs to be combined with ink input such as the
following:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="pen1" emma:medium="tactile" emma:mode="ink"> <location> <type>area</type> <points>42.1345 -37.128 42.1346 -37.120 ... </points> </location> </emma:interpretation> </emma:emma>
This representation could be generated by a pen modality component performing gesture recognition and interpretation. The input to the component would be an Ink Markup Language specification [INKML] of the ink trace and the output would be the EMMA document above.
The combination will result in the following EMMA document for the combined speech and pen multimodal input.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic tactile" emma:mode="voice ink" emma:process="http://example.com/myintegrator.xml"> <emma:derived-from resource="http://example.com/voice1.emma/#voice1" composite="true"/> <emma:derived-from resource="http://example.com/pen1.emma/#pen1" composite="true"/> <command> <action>zoom</action> <location> <type>area</type> <points>42.1345 -37.128 42.1346 -37.120 ... </points> </location> </command> </emma:interpretation> </emma:emma>
There are two components to the process of integrating these two
pieces of semantic markup. The first is to ensure that the two are
compatible; that is, that no semantic constraints are violated. The
second is to fuse the content from the two sources. In our example,
the <type>area</type>
element is intended
to indicate that this speech command requires integration with an
area gesture rather than, for example, a line gesture, which would
have the subelement <type>line</type>
.
This constraint needs to be enforced by whatever mechanism is
responsible for multimodal integration.
Many different techniques could be used for achieving this
integration of the semantic interpretation of the pen input, a
<location>
element, with the corresponding
<location>
element in the speech. The
emma:hook
simply serves to indicate the
existence of this relationship.
One way to achieve both the compatibility checking and fusion of
content from the two modes is to use a well-defined general purpose
matching mechanism such as unification. Graph unification
[Graph
unification] is a mathematical operation defined
over directed acylic graphs which captures both of the components
of integration in a single operation: the applications of the
semantic constraints and the fusing of content. One possible
semantics for the emma:hook
markup indicates that
content from the required mode needs to be unified with that
position in the application semantics. In order to unify, two
elements must not have any conflicting values for subelements or
attributes. This procedure can be defined recursively so that
elements within the subelements must also not clash and so on. The
result of unification is the union of all of the elements and
attributes of the two elements that are being unified.
In addition to the unification operation, in the resulting
emma:interpretation
the emma:hook
attribute needs to be removed and the emma:mode
attribute changed to the list of the modes of the individual
inputs , e.g. "voice ink"
.
Instead of the unification operation, for a specific application
semantics, integration could be achieved using some other algorithm
or script. The benefit of using the unification semantics for
emma:hook
is that it provides a general purpose
mechanism for checking the compatibility of elements and fusing
them, whatever the specific elements are in the application
specific semantic representation.
The benefit of using the emma:hook
annotation for
authors is that it provides an application independent method for
indicating where integration with content from another mode is
required. If a general purpose integration mechanism is used, such
as the unification approach described above, authors should be able
to use the same integration mechanism for a range of different
applications without having to change the integration rules or
logic. For each application the speech grammar rules [SRGS]
need to assign emma:hook
to the appropriate elements
in the semantic representation of the speech. The general purpose
multimodal integration mechanism will use the
emma:hook
annotations in order to determine where to
add in content from other modes. Another benefit of the
emma:hook
mechanism is that it facilitates
interoperability among different multimodal integration components,
so long as they are all general purpose and utilize
emma:hook
in order to determine where to integrate
content.
The following provides a more detailed example of the use of the
emma:hook
annotation. In this example, spoken input is
combined with two ink gestures. The semantic
representation assigned to the spoken input "send this file to
this" indicates two locations where content is required from ink
input using emma:hook="ink"
:
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="voice2"
emma:medium="acoustic"
emma:mode="voice"
emma:tokens="send this file to this"
emma:start="1087995961500"
emma:end="1087995963542">
<command>
<action>send</action>
<arg1>
<object emma:hook="ink">
<type>file</type>
<number>1</number>
</object>
</arg1>
<arg2>
<object emma:hook="ink">
<number>1</number>
</object>
</arg2>
</command>
</emma:interpretation>
</emma:emma>
The user gesturing on the two locations on the display can be
represented using emma:sequence
:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:sequence id="ink2"> <emma:interpretation emma:start="1087995960500" emma:end="1087995960900" emma:medium="tactile" emma:mode="ink"> <object> <type>file</type> <number>1</number> <id>test.pdf</id> <object> </emma:interpretation> <emma:interpretation emma:start="1087995961000" emma:end="1087995961100" emma:medium="tactile" emma:mode="ink"> <object> <type>printer</type> <number>1</number> <id>lpt1</id> <object> </emma:interpretation> </emma:sequence> </emma:emma>
A general purpose unification-based multimodal integration
algorithm could use the emma:hook
annotation as
follows. It identifies the elements marked with
emma:hook
in document order. For each of those in
turn, it attempts to unify the element with the corresponding
element in order in the emma:sequence
. Since none of
the subelements conflict, the unification goes through and as a
result, we have the following EMMA for the composite result:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation id="multimodal2" emma:medium="acoustic tactile" emma:mode="voice ink" emma:tokens="send this file to this" emma:process="http://example.com/myintegration.xml" emma:start="1087995960500" emma:end="1087995963542"> <emma:derived-from resource="http://example.com/voice2.emma/#voice2" composite="true"/> <emma:derived-from resource="http://example.com/ink2.emma/#ink2" composite="true"/> <command> <action>send</action> <arg1> <object> <type>file</type> <number>1</number> <id>test.pdf</id> </object> </arg1> <arg2> <object> <type>printer</type> <number>1</number> <id>lpt1</id> </object> </arg2> </command> </emma:interpretation> </emma:emma>
This section is Informative.
The W3C Document Object Model [DOM] defines platform and language neutral interfaces that gives programs and scripts the means to dynamically access and update the content, structure and style of documents. DOM Events define a generic event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event.
This section of the EMMA specification extends the DOM Event interface for use with events that describe interpreted user input in terms of a DOM Node for an EMMA document.
// File: emma.idl #ifndef _EMMA_IDL_ #define _EMMA_IDL_ #include "dom.idl"#include "views.idl"#include "events.idl" #pragma prefix "dom.w3c.org"module emma { typedef dom::DOMString DOMString; typedef dom::Node Node; interface EMMAEvent : events::UIEvent { readonly attribute dom::Node node; void initEMMAEvent(in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in Node node); }; }; #endif // _EMMA_IDL_
This section is Informative.
Since the publication of the EMMA 1.0 Recommendation, the following changes have been made.
emma:annotation
element for specification of human
annotations on the input (4.1.9)emma:process-model
for specifying a non-grammar
model used in processing of the input (4.1.7)emma:parameters
, emma:parameter
for
specification of a set of parameters used to configure a processor
(4.1.8)emma:grammar-active, emma:active
elements for
specifying the specific grammars in a set that were active for a
particular interpretation or set of interpretations (4.1.4)emma:expressed-through
for specification of the
modalities used in order to express an input (4.2.11)emma:result-format
for specification of the
specific format type for EMMA semantic payloads (4.2.18)emma:info-ref
for referencing the emma:info that
applies to an interpretation or set of interpretations
(4.2.19)emma:info
elementsemma:process-model-ref
for referencing the
emma:process-model
that applies to an interpretation
or set of interpretations(4.2.20)emma:parameter-ref
for referencing the
emma:parameters
that applies to an interpretation or
set of interpretations(4.2.21)emma:annotated-tokens
shorthand method for adding
reference transcription without needing full
emma:annotation
(4.2.22)emma:medium
and emma:mode
(4.2.11)emma:one-of
emma:process
can be used as syntax
rather than actual reference to process description (4.2.2)emma:signal
(4.2.6)emma:annotation
on lattice
emma:arc
emma:annotation
and
emma:annotated-tokens
emma:grammar-ref
on use of
emma:grammar-ref
on emma:active
to
indicate which grammars are activeemma:process-model
and emma:parameters
are required to have index scope and cannot have scope over
interpretations based on their position in the documentemma:device-type
to 4.2.11 and extended
example and added to tables of relevant elementsref
to several more elements enabling
documents to refer to content on the server:
emma:info
, emma:parameters
,
emma:one-of
, emma:group
,
emma:sequence
, emma:lattice
src
attribute on
emma:annotation
with ref
to keep it
consistent with other elements that allow for reference to remote
content, and added an example with Emotion ML.emma:location
element enabling
annotation of the location of the device capturing the input.
4.1.10prev-doc
and doc-ref
attributes
to emma:emma
.emma:partial-content
attribute 4.2.23This section is Informative.
The editors would like to recognize the contributions of the current and former members of the W3C Multimodal Interaction Group (listed in alphabetical order):