Please refer to the errata for this document, which may include normative corrections.
See also translations.
Copyright © 2014 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
As the Web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the Recommendation of "Emotion Markup Language 1.0". It has been published by the Multimodal Interaction Working Group, which is part of the Multimodal Interaction Activity.
Comments are welcome at [email protected] (archive). See W3C mailing list and archive usage guidelines.
This specification has been widely reviewed (see the Candidate Recommendation Disposition of Comments and the Last Call Working Draft Disposition of Comments) and satisfies the Working Group's technical requirements. A list of implementations is included in the EmotionML Implementation Report. The Working Group made several editorial changes to the 16 April 2013 Proposed Recommendation. Changes from the Proposed Recommendation can be found in Appendix C. There are no substantial changes since the Proposed Recommendation.
This document has been reviewed by W3C Members, by software developers, and by other W3C groups and interested parties, and is endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.
This document has been produced as part of the W3C Multimodal Interaction Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Multimodal Interaction Working Group.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
For more information about the Multimodal Interaction Activity, please see the Multimodal Interaction Activity statement.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The sections in the main body of this document are normative unless otherwise specified. The appendices in this document are informative unless otherwise indicated explicitly. The examples in the main body are also informative.
This section is informative.
Human emotions are increasingly understood to be a crucial aspect in human-machine interactive systems. Especially for non-expert end users, reactions to complex intelligent systems resemble social interactions, involving feelings such as frustration, impatience, or helplessness if things go wrong. Furthermore, technology is increasingly used to observe human-to-human interactions, such as customer frustration monitoring in call center applications. Dealing with these kinds of states in technological systems requires a suitable representation, which should make the concepts and descriptions developed in the affective sciences available for use in technological contexts.
This report specifies Emotion Markup Language (EmotionML) 1.0, a markup language designed to be usable in a broad variety of technological contexts while reflecting concepts from the affective sciences.
As for any standard format, the first and main goal of an EmotionML is twofold: to allow a technological component to represent and process data, and to enable interoperability between different technological components processing the data.
Use cases for EmotionML can be grouped into three broad types:
Interactive systems are likely to involve both analysis and generation of emotion-related behavior; furthermore, systems are likely to benefit from data that was manually annotated, be it as training data or for rule-based modeling. Therefore, it is desirable to propose a single EmotionML that can be used in all three contexts.
Concrete examples of existing technology that could apply EmotionML include:
The Emotion Incubator Group has listed 39 individual use cases for an EmotionML.
A second reason for defining an EmotionML is the observation that ad hoc attempts to deal with emotions and related states often lead people to make the same mistakes that others have made before. The most typical mistake is to model emotions as a small number of intense states such as anger, fear, joy, and sadness; this choice is often made irrespective of the question whether these states are the most appropriate for the intended application. Crucially, the available alternatives that have been developed in the affective science literature are not sufficiently known, resulting in dead-end situations after the initial steps of work. Careful consideration of states to study and of representations for describing them can help avoid such situations.
EmotionML makes scientific concepts of emotions practically applicable. This can help potential users to identify the suitable representations for their respective applications.
Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure: even scientists cannot agree on the number of relevant emotions, or on the names that should be given to them. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small number of dimensions; emotions have an intensity, and so on. For details, see Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group.
Given this lack of agreement on descriptors in the field, the only practical way of defining an EmotionML is the definition of possible structural elements and their valid child elements and attributes, but to allow users to "plug in" vocabularies that they consider appropriate for their work. A separate W3C Working Draft complements this specification to provide a central repository of [Vocabularies for EmotionML] which can serve as a starting point; where the vocabularies listed there seem inappropriate, users can create their custom vocabularies.
An additional challenge lies in the aim to provide a generally usable markup, as the requirements arising from the three different use cases (annotation, recognition, and generation) are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish only a very small number of different states.
For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed, and to propose reasonable default options for every choice.
The terms related to emotions are not used consistently, neither in common use nor in the scientific literature. The following glossary describes the intended meaning of terms in this document.
The following sections describe the syntax of the main elements of EmotionML.
<emotionml>
elementAnnotation | <emotionml> |
---|---|
Definition | The root element of an EmotionML document. |
Children | The element MAY contain one or more <emotion>
elements. It MAY contain a single <info> element. It MAY contain
one or more <vocabulary>
elements. |
Attributes |
|
Occurrence | This is the root element -- it cannot occur as a child of any other EmotionML element. |
The root element of a standalone EmotionML document MUST be
<emotionml>
. It MAY contain a single
<info>
element, providing document-level metadata.
The <emotionml>
element MUST define the EmotionML namespace: 'http://www.w3.org/2009/10/emotionml'.
The <emotionml>
element MAY contain arbitrary plain text.
See annotation of text as an example.
Standalone EmotionML documents usually serve one or both of the following two purposes:
<emotion>
elements into a single document;<emotion>
annotations in the same or
other documents.Example:
<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml"> ... </emotionml>
or
<emo:emotionml version="1.0" xmlns:emo="http://www.w3.org/2009/10/emotionml"> ... </emo:emotionml>
Note: One of the envisaged uses of EmotionML is to be used in the context of
other markup languages. In such cases, there will be no
<emotionml>
root element, but <emotion>
elements will be used directly in other markup -- see Examples
of possible use with other markup languages.
<emotion>
elementAnnotation | <emotion> |
---|---|
Definition | This element represents a single emotion annotation. |
Children | All children are optional. However, the <emotion>
element MUST contain at least one <category> or
<dimension> or <appraisal> or
<action-tendency> element.
If present, the following child element can occur only once:
If present, the following child elements may occur one or more
times: Note, that the There are no constraints on the combinations of children that are allowed. There are no constraints on the order in which children occur. Note that an |
Attributes |
|
Occurrence | as a child of <emotionml> , or in any markup using
EmotionML. |
The <emotion>
element represents an individual emotion
annotation. No matter how simple or complex its substructure is, it represents
a single statement about the emotional content of some annotated item. Where
several statements about the emotion in a certain context are to be made,
several <emotion>
elements MUST be used. See Examples of emotion annotation for illustrations of this
issue.
An <emotion>
element MAY have an id
attribute, allowing for a unique reference to the individual emotion
annotation. Since the <emotion>
annotation is an atomic
statement about the emotion, it is inappropriate to refer to individual emotion
representations such as <category>
,
<dimension>
, <appraisal>
,
<action-tendency>
or their children directly. For this
reason, these elements do not allow for an id
attribute.
The <emotion>
element MAY contain arbitrary plain text.
See annotation of text as an example.
NOTE: For each type of vocabulary (category, dimension, appraisal and action
tendency), only one declared vocabulary
of that type can exist within a given document subtree, with local declaration
taking precedence over global declaration. For example, if both a global
category vocabulary (on <emotionml>
)
and a local category vocabulary (on <emotion>
) are declared,
the local one "hides" the global one, so that the local category vocabulary is
the only declared category vocabulary within the given
<emotion>
element.
Whereas it is possible to use <emotion>
elements in a
standalone <emotionml>
document, a typical use case is
expected to be embedding an <emotion>
into some other markup
-- see Examples of possible use with other markup
languages.
<category>
elementAnnotation | <category> |
---|---|
Definition | Description of an emotion or a related state using a category. |
Children | <trace> : A
<category> MAY contain either a value
attribute or a <trace> element. |
Attributes |
|
Occurrence | One or more <category> elements MAY occur as a
child of <emotion> . For any given category name in
the set, zero or one occurrence is allowed within an
<emotion> element, i.e. a category with name "x"
MUST NOT appear twice in one <emotion> element. |
<category>
describes an emotion or a related state in terms of a category
name, given as the value of the name
attribute.
If the <category>
element is used, a category vocabulary
MUST be declared (see <emotion>
and <emotionml>
), and the category name as
given in the name
attribute MUST be an item in the declared
category vocabulary.
Different declared category vocabularies can be used, depending on the requirements of the use case. In particular, different types of emotion-related / affective states can be annotated by using appropriate value sets.
The intensity of an emotion category MAY be specified as a Scale value, either as a static value in the value
attribute, or as a dynamic trace over
time using the <trace>
element. A
<category>
MUST NOT contain both a value
attribute and a <trace>
element.
Examples:
In the following example, the emotion category "satisfaction" is being annotated; it must be contained in the definition of the emotion category vocabulary located at http://www.w3.org/TR/emotion-voc/xml#everyday-categories, which is one of the category vocabularies provided in [Vocabularies for EmotionML].
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="satisfied"/> </emotion>
The following is an annotation of an interpersonal stance "distant" which must be defined in the custom category set at the URI given in the category-set attribute:
<emotion category-set="http://www.example.com/custom/category/interpersonal-stances.xml#voc"> <category name="distant"/> </emotion>
In the following example, an emotion is described by several categories, each being present with different values of intensity. The category set used is the "big six" set described in [Vocabularies for EmotionML].
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="sadness" value="0.3"/> <category name="anger" value="0.8"/> <category name="fear" value="0.3"/> </emotion>
<dimension>
elementAnnotation | <dimension> |
---|---|
Definition | One or more <dimension> elements jointly describe
an emotion or a related state according to an emotion dimension
vocabulary. |
Children | <trace> : A
<dimension> MUST contain either a value
attribute or a <trace> element. |
Attributes |
|
Occurrence | <dimension> elements occur as children of
<emotion> . For any given dimension name in the set,
zero or one occurrence is allowed within an
<emotion> element, i.e. a dimension with name "x"
MUST NOT appear twice in one <emotion> element. |
One or more <dimension>
elements jointly describe an emotion or a related state in terms of a set of emotion dimensions.
If the <dimension>
element is used, a dimension
vocabulary MUST be declared (see <emotion>
and <emotionml>
), and the dimension name as
given in the name
attribute MUST be an item in the declared
dimension vocabulary. Different declared dimension vocabularies can be used,
depending on the requirements of the use case.
The position on an emotion dimension MUST be specified as a Scale value, either as a static value in the value
attribute, or as a dynamic trace over
time using the <trace>
element. A
<dimension>
MUST NOT contain both a value
attribute and a <trace>
element.
Examples:
One of the most widespread sets of emotion dimensions used (sometimes by different names) is the combination of valence, arousal and potency. The following example is a state of rather low arousal, very positive valence, and high potency -- in other words, a relaxed, positive state with a feeling of being in control of the situation. The example uses the Pleasure-Arousal-Dominance (PAD) vocabulary from [Vocabularies for EmotionML]:
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions"> <dimension name="arousal" value="0.3"/><!-- lower-than-average arousal --> <dimension name="pleasure" value="0.9"/><!-- very high positive valence --> <dimension name="dominance" value="0.8"/><!-- relatively high potency --> </emotion>
In some use cases, custom sets of application-specific dimensions will be required. The following example uses a custom set of dimensions, defining a single dimension "friendliness".
<emotion dimension-set="http://www.example.com/custom/dimension/friendliness.xml#voc"> <dimension name="friendliness" value="0.2"/><!-- a pretty unfriendly person --> </emotion>
The usual way to represent the intensity of an emotion would be the
value
attribute of a <category>
. However, if
only the intensity of an emotion is annotated, but not its nature, this can be
done by using an "intensity" dimension. Thus, an emotional state's "strength"
or "intensity" can be described independently from categorical or dimensional
descriptions, as shown by the following example.
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension"> <dimension name="intensity" value="0.2"/><!-- not in a strong emotional state --> </emotion>
<appraisal>
elementAnnotation | <appraisal> |
---|---|
Definition | One or more <appraisal> elements jointly describe
an emotion or a related state according to an emotion appraisal
vocabulary. |
Children | <trace> : An
<appraisal> MAY contain either a value
attribute or a <trace> element. |
Attributes |
|
Occurrence | <appraisal> elements occur as children of
<emotion> . For any given appraisal name in the set,
zero or one occurrence is allowed within an
<emotion> element, i.e. an appraisal with name "x"
MUST NOT appear twice in one <emotion> element. |
One or more <appraisal>
elements jointly describe an emotion or a related state in terms of a set of appraisals.
If the <appraisal>
element is used, an appraisal
vocabulary MUST be declared (see <emotion>
and <emotionml>
), and the appraisal name as
given in the name
attribute MUST be an item in the declared
appraisal vocabulary. Different declared appraisal vocabularies can be used,
depending on the requirements of the use case.
The degree to which an appraisal is present MAY be specified as a Scale value, either as a static value in the value
attribute, or as a dynamic trace over
time using the <trace>
element. An
<appraisal>
MUST NOT contain both a value
attribute and a <trace>
element.
Examples:
One of the most widespread sets of emotion appraisals used is the appraisals set proposed by Klaus Scherer, covering aspects of novelty, intrinsic pleasantness, goal/need significance, coping potential, and norm/self compatibility. Another very widespread set of emotion appraisals, used in particular in computational models of emotion, is the OCC set of appraisals (Ortony et al., 1988), which includes the consequences of events for oneself or for others, the actions of others and the perception of objects. Using Scherer's appraisals from [Vocabularies for EmotionML], the following example is a state arising from the evaluation of an unpredicted and quite unpleasant event:
<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <appraisal name="suddenness" value="0.8"/> <appraisal name="intrinsic-pleasantness" value="0.2"/> </emotion>
In some use cases, custom sets of application-specific appraisals will be required. The following example uses a custom set of appraisals, defining the single appraisal "likelihood".
<emotion appraisal-set="http://www.example.com/custom/appraisal/likelihood.xml#voc"> <appraisal name="likelihood" value="0.8"/><!-- a very predictable event --> </emotion>
<action-tendency>
elementAnnotation | <action-tendency> |
---|---|
Definition | One or more <action-tendency> elements jointly
describe an emotion or a related state according to an emotion action
tendency vocabulary. |
Children | <trace> : An
<action-tendency> MAY contain either a
value attribute or a <trace> element.
|
Attributes |
|
Occurrence | <action-tendency> elements occur as children of
<emotion> . For any given action tendency name in the
set, zero or one occurrence is allowed within an
<emotion> element, i.e. an action tendency with name
"x" MUST NOT appear twice in one <emotion>
element. |
One or more <action-tendency>
elements jointly describe
an emotion or a related state in terms of a set of action tendencies.
If the <action-tendency>
element is used, an action
tendency vocabulary MUST be declared
(see <emotion>
and <emotionml>
), and the action tendency
name as given in the name
attribute MUST be an item in the
declared action tendency vocabulary. Different declared action tendency
vocabularies can be used, depending on the requirements of the use case.
The degree to which an action tendency is present MAY be specified as a Scale value, either as a static value in the value
attribute, or as a dynamic trace over
time using the <trace>
element. An
<action-tendency>
MUST NOT contain both a value
attribute and a <trace>
element.
Examples:
One well known use of action tendencies is by N. Frijda. This model uses a number of action tendencies that are low level, diffuse behaviors from which more concrete actions could be determined. It is provided in [Vocabularies for EmotionML]. An example of someone attempting to attract someone they like by being confident, strong and attentive might look like this:
<emotion action-tendency-set="http://www.w3.org/TR/emotion-voc/xml#frijda-action-tendencies"> <action-tendency name="approach" value="0.7"/> <!-- get close --> <action-tendency name="being-with" value="0.8"/> <!-- be happy --> <action-tendency name="attending" value="0.7"/> <!-- pay attention --> <action-tendency name="dominating" value="0.7"/> <!-- be assertive --> </emotion>
In some use cases, custom sets of application-specific action tendencies will be required. The following example shows control values for a robot who works in a factory and uses a custom set of action-tendencies, defining example actions for a robot. In the example, the robot has very low battery, so it needs to get ready to charge its battery and stop its work of picking up boxes.
<emotion action-tendency-set="http://www.example.com/custom/action/robot.xml#voc"> <action-tendency name="charge-battery" value="0.9"/> <!-- need to charge battery soon --> <action-tendency name="pickup-boxes" value="0.3"/> <!-- feeling tired, avoid work --> </emotion>
confidence
attributeAnnotation | confidence |
---|---|
Definition | The degree of confidence or probability that the emotion
representation carrying this attribute is correct. The value of the
confidence attribute MUST be a floating point number in
the closed interval [0, 1]. |
Occurrence | An optional attribute of <category> , <dimension> , <appraisal> and <action-tendency> elements. |
Confidence MAY be indicated separately for each of the Representations of emotions and related states. For example,
the confidence that the <category>
is assumed correctly is
independent from the confidence that the position on a dimension is correctly
indicated.
Rooted in the tradition of statistics a confidence is given in an interval from 0 to 1, resembling a probability. Insofar, the confidence is a Scale value.
Examples:
In the following, one simple example is provided for each element that can
carry a confidence
attribute.
The first example indicates a very high confidence that surprise is the emotion to annotate.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="surprise" confidence="0.95"/> </emotion>
The next example illustrates using confidence
to indicate that
the annotation of high arousal is probably correct, but the annotation of
slightly positive pleasure may or may not be correct.
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions"> <dimension name="arousal" value="0.8" confidence="0.9"/> <dimension name="pleasure" value="0.6" confidence="0.3"/> </emotion>
Finally, an example for the case of intensity: A high confidence is given that the emotion has a low intensity.
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension"> <dimension name="intensity" value="0.1" confidence="0.8"/> </emotion>
Note that, as stated, obviously an emotional annotation can be a combination of the above, as in the following example: the intensity of the emotion is quite probably low, but if we have to guess, we would say the emotion is boredom.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension"> <category name="bored" confidence="0.1"/> <dimension name="intensity" value="0.1" confidence="0.8"/> </emotion>
expressed-through
attributeAnnotation | expressed-through |
---|---|
Definition | The modality, or list of modalities, through which the emotion is
expressed. An attribute of type xsd:NMTOKENS which
contains a space delimited set of values from an open set of values
including: {gaze , face , head ,
torso , gesture , leg ,
voice , text , locomotion ,
posture , physiology , ...}. |
Occurrence | An optional attribute of <emotion> elements. |
The expressed-through
attribute describes the modality through
which an emotion is produced, usually by a human being. It is not the technical
modality by which it was detected, e.g. "face" rather than "camera" and "voice"
rather than "microphone". The expressed-through
attribute is
agnostic about the use case: when detecting emotion, it represents the modality
from which the emotion has been detected; when generating emotion-related
system behavior, it represents the modality through which the emotion is to be
expressed.
The list of values provided covers a broad range of modalities through which emotions may be expressed. These values SHOULD be used if they are appropriate. The list is an open set in order to allow for more fine-grained distinctions such as "eyes" vs. "mouth" etc.
The expressed-through
attribute is not specific about the
sensors used for observing the modality. These can be specified using the <info>
element, or by the
emma:mode
attribute in an enclosing [EMMA]
document.
Example:
In the following example the emotion is expressed through the voice.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" expressed-through="voice"> <category name="satisfied"/> </emotion>
In case of multimodal expression of an emotion, a list of space separated
modalities can be indicated in the expressed-through
attribute,
like in the following example in which the two values "face" and "voice" are
used.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" expressed-through="face voice"> <category name="satisfied"/> </emotion>
See also the examples in sections 5.1.2 Automatic recognition of emotions, 5.1.3 Generation of emotion-related system behavior and 5.2.3 Use with SMIL.
<info>
elementAnnotation | <info> |
---|---|
Definition | This element can be used to annotate arbitrary metadata. |
Children | The <info> element MAY contain arbitrary plain
text as well as any elements with a namespace different from the EmotionML namespace,
'http://www.w3.org/2009/10/emotionml'. It MUST NOT contain any elements
in the EmotionML namespace. |
Attributes |
|
Occurrence | A single <info> element MAY occur as a child of
|
This element can contain arbitrary XML data in a different namespace (one option could be [RDF] data), either on a document global level or on a local "per annotation element" level.
Several initiatives of standardizing metadata exist, such as [IMDI] and [CLARIN]. Metadata may contain information on a large spectrum of elements such as: location description (continent, country, address), content type (e.g., genre, task, modalities), session (title, a recording date, a group of participants); each participant may be defined by her role in the session (e.g. annotator, filmer), her name, her social family role, etc.
Examples:
In the following example, the automatic classification for an annotation document was performed by a classifier based on Gaussian Mixture Models (GMM); the speakers of the annotated elements were of different German origins.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:classifiers="http://www.example.com/meta/classify/" xmlns:origin="http://www.example.com/meta/local/" category-set="http://www.w3.org/TR/emotion-voc/xml#big6" version="1.0"> <info> <classifiers:classifier classifiers:name="GMM"/> </info> <emotion> <info><origin:localization value="bavarian"/></info> <category name="happiness"/> </emotion> <emotion> <info><origin:localization value="swabian"/></info> <category name="sadness"/> </emotion> </emotionml>
The following example uses the IMDI metadata language to represent
information about the annotator who produced the emotion annotation in the
current document, in a global <info>
element.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:imdi="http://www.mpi.nl/IMDI/Schema/IMDI" version="1.0"> <info> <imdi:Actors> <imdi:Actor> <imdi:Role>Annotator</imdi:Role> <imdi:Name>John</imdi:Name> <imdi:FullName>John Smith Junior</imdi:FullName> <imdi:Code>JS</imdi:Code> <imdi:FamilySocialRole>Teacher</imdi:FamilySocialRole> ... </imdi:Actor> </imdi:Actors> </info> ... <emotion>...</emotion> <emotion>...</emotion> </emotionml>
The following example illustrates how <info>
can be used
for annotating information on sensors through which an affective signal has
been detected. In the global <info>
section, the sensors
used in the particular scenario are specified. Apart from their ID, information
on the modality observed by this sensor is provided as well as information on
the confidence for that sensor. In this example, the modality "posture" is
observed by a camera and a chair equipped with pressure sensors. For some
reason it is decided that emotion estimates based on camera data should be
trusted more than those based on chair data. Within the
<emotion>
elements, <info>
is used to
specify which sensor has been used to calculate the actual emotion value.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:sensors="http://www.example.com/meta/sensors/" category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" version="1.0"> <info> <sensors:sensor id="camera1" confidence="0.9" expressed-through="posture"/> <sensors:sensor id="chair" confidence="0.3" expressed-through="posture"/> ... </info> <emotion expressed-through="posture"> <info> <sensors:sensor idref="camera1"/> </info> <category name="angry"/> </emotion> <emotion expressed-through="posture"> <info> <sensors:sensor idref="chair"/> </info> <category name="neutral"/> </emotion> </emotionml>
<reference>
elementAnnotation | <reference> |
---|---|
Definition | References may be used to relate the emotion annotation to the "rest of the world", more specifically to the emotional expression, the experiencing subject, the trigger, and the target of the emotion. |
Children | None |
Attributes |
|
Occurrence | Multiple <reference> elements MAY occur as
children of <emotion> . |
A <reference>
element provides a link to media as a URI
[RFC 3986]. The semantics of references are described
by the role
attribute which, if present, MUST have one of four
values:
expressedBy
" indicates that the reference points to
observable behavior expressing the emotion. This is the default value if
the role
attribute is not explicitly stated;experiencedBy
" indicates that the reference points to the
subject experiencing the emotion;triggeredBy
" indicates that the reference points to an
emotion-eliciting event that caused an emotion and/or related
appraisals;targetedAt
" indicates that the reference points to an
object towards which an emotion-related action, or action tendency, is
directed.For reference targets representing a period of time, start and end time MAY be denoted by using the media fragments syntax, as explained in section 2.4.2.4.
The media-type
attribute MAY be used to differentiate between
different media types such as audio, video, text, etc.
There is no restriction regarding the number of
<reference>
elements that MAY occur as children of
<emotion>
.
Examples:
The following example illustrates the reference to two different URIs having
a different role
with respect to the emotion: one reference points
to the emotion's expression, a video clip showing a user expressing the
emotion; the other reference points to the trigger that caused the emotion, in
this case another video clip that was seen by the person who expressed the
emotion.
<emotion ... > ... <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/> <reference uri="http://www.example.com/events/e12.xml" role="triggeredBy"/> </emotion>
Several references may follow as children of one
<emotion>
tag, even having the same role
; for
example, the following annotation refers to a portion of a video and to
physiological sensor data, both of which expressed the emotion:
<emotion ... > ... <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/> <reference uri="http://www.example.com/data/physio/ph7.txt" role="expressedBy"/> </emotion>
It is possible to explicitly indicate the MIME type of the item that the reference refers to:
<emotion ... > ... <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" media-type="video/mp4"/> </emotion>
Annotation | start , end |
---|---|
Definition | Attributes to denote the starting and ending absolute times. They
MUST be of type xsd:nonNegativeInteger and indicate the
number of milliseconds since 1 January 1970 00:00:00 GMT. |
Occurrence | The attributes MAY occur inside an <emotion>
element. |
start
and end
attributes denote the absolute
starting and ending times at which an emotion or related state happened. This might be
used for example with an "emotional diary" application.
Examples:
In the following example, the emotion category "surprise" is annotated,
immediately followed by the category "happiness". The start
and
end
attributes specify for each emotion
element the
absolute beginning and ending times.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6" start="1268647332000" end="1268647334000"> <category name="surprise"/> </emotion> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6" start="1268647334001" end="1268647336000"> <category name="happiness"/> </emotion>
The end
value MUST be greater than or equal to the
start
value.
The ECMAScript Date object's getTime() function is a way to determine the absolute time.
Annotation | duration |
---|---|
Definition | Attribute of type xsd:nonNegativeInteger ,
defaulting to zero. It specifies the duration of the event in
milliseconds. |
Occurrence | This attribute MAY occur inside an <emotion>
element. |
The duration of an input in milliseconds MAY be specified with the
duration
attribute. The duration
attribute MAY be
used either in combination with the start
or
offset-to-start
attribute or independently.
A start
or offset-to-start
attribute together with
the duration
attribute set to zero MAY be used to indicate a
single timestamp on a time axis.
Note that the specification doesn't enforce consistency. Although it would be redundant, the presence of both duration
and end
attributes is
possible. If this leads to inconsistency, the responsibility lies with the EmotionML producer.
Examples:
In the following example, the start
and duration
of the emotion category "surprise" are annotated:
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6" start="1268647200000" duration="130"> <category name="surprise"/> </emotion>
Annotation | time-ref-uri |
---|---|
Definition | Attribute of type xsd:anyURI indicating the URI used to
anchor the relative timestamp. |
Occurrence | This attribute MAY occur inside an <emotion>
element. |
Annotation | time-ref-anchor-point |
Definition | Attribute with a value of start or end ,
defaulting to start . It indicates whether to measure the
time from the start or end of the interval designated with
time-ref-uri . |
Occurrence | This attribute MAY occur inside an <emotion>
element. |
Annotation | offset-to-start |
Definition | Attribute of type xsd:integer , defaulting
to zero. It specifies the offset in milliseconds for the start of input
from the anchor point designated with
time-ref-uri and
time-ref-anchor-point . |
Occurrence | This attribute MAY occur inside an <emotion>
element. |
Relative timestamps define the start of an input relative to the start or end of a reference interval such as another input.
The reference interval is designated with time-ref-uri
attribute. This MAY be combined with time-ref-anchor-point
attribute to specify whether the anchor point is the start or end of this
interval. The start of an input relative to this anchor point is then specified
with offset-to-start
attribute.
The time-ref-uri
attribute can point to a custom-defined
timestamp or can be, for example, a session identifier.
Examples:
Here is an example where the emotion "surprise" occurs two seconds after the reference time point:
<emotion id="referenceAnger" category-set="http://www.w3.org/TR/emotion-voc/xml#big6" start="1268647332000" end="1268647334000"> <category name="anger"/>
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6" time-ref-uri="#referenceAnger" offset-to-start="2000"> <category name="surprise"/> </emotion>
It is possible to mix absolute and relative time points in an emotionml statement. An emotional diary application for example might want to relate emotional statements to events that are marked up somewhere else. Note that a time consistency check is not part of the specification:
<exampleXMLElementWithId id="specialEvent"/>
<emotion id="yesterday" category-set="http://www.w3.org/TR/emotion-voc/xml#big6" start="1268647332000" end="1268647334000" time-ref-uri="#specialEvent" offset-to-start="2000"> <category name="anger"/> </emotion>
Annotation | URI fragment: t |
---|---|
Definition | Attributes to denote start and endpoint of an annotation in a media stream. Allowed values must be conform with the Media Fragments Specification [Media Fragments] |
Occurrence | The URI fragment MAY occur in the uri attribute of a
<reference> element. |
Temporal clipping is denoted by the name t, and specified as an interval with a begin time and an end time. Either or both may be omitted, with the begin time defaulting to 0 seconds and the end time defaulting to the duration of the source media. The interval is half-open: the begin time is considered part of the interval whereas the end time is considered to be the first time point that is not part of the interval. If a single number only is given, this is the begin time.
Temporal clipping can be specified either as Normal Play Time (npt) [RFC 2326], as SMPTE timecodes, [SMPTE], or as real-world clock time (clock) [RFC 2326]. Begin and end times are always specified in the same format. The format is specified by name, followed by a colon (:), with npt: being the default.
Examples:
In the following example, the emotion category "happiness" is displayed in an audio file called "myAudio.wav" from the 3rd to the 9th second.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="happiness"/> <reference uri="myAudio.wav#t=3,9"/> </emotion>
In the following example, the emotion category "happiness" is displayed in a video file called "myVideo.avi" in SMPTE values, resulting in the time interval [120,121.5).
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="happiness"/> <reference uri="myVideo.avi#t=smpte-30:0:02:00,0:02:01:15"/> </emotion>
A last example states this in a video file in real-world clock time code, as a 1 min interval on 26th Jul 2009 from 11hrs, 19min, 1sec.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="happiness"/> <reference uri="myVideo.avi#t=clock:2009-07-26T11:19:01Z,2009-07-26T11:20:01Z"/> </emotion>
Scale values are needed to represent content in <category>
, <dimension>
, <appraisal>
and <action-tendency>
elements, as well as in
confidence
.
Representations of scale values can be static or dynamic. A static, constant
scale value is represented using the value
attribute; for dynamic
values, their evolution over time is expressed using the
<trace>
element.
value
attributeAnnotation | value |
---|---|
Definition | Representation of a static scale value. The value of a
value attribute MUST be a floating point value from the
closed interval [0, 1]. |
Occurrence | The <dimension> element MUST
contain either a value attribute or a
<trace> element; <category> , <appraisal> and <action-tendency> MAY contain
either a value attribute or a <trace>
element. |
The value
attribute represents a static scale value of the
enclosing element.
Conceptually, a scale can represent concepts that vary from "nothing" to "a
lot" (unipolar scales), or concepts that vary between two opposites, from "very
negative" to "very positive" (bipolar scales). Both are represented in
EmotionML using floating point values from the closed interval [0, 1]. The min
and max values of the scale SHOULD be interpreted as the extreme values, for
both unipolar and bipolar scales. For example in a
<category>
, a value="0"
SHOULD be interpreted
to mean absolutely no emotion (emotionless); a value="1.0"
SHOULD
be interpreted to mean emotion at maximum intensity (pure uncontrolled
emotion). For bipolar scales, such as the valence dimension, a value of 0
represents the most negative possible value, whereas a value of 1 represents
the most positive value possible. The neutral middle point of the scale is at
0.5.
Here are several examples for the usage of scales with EmotionML.
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#fsre-dimensions"> <dimension name="arousal" value="0.4"/> <!-- a bit less than average arousal --> <dimension name="valence" value="0.6"/> <!-- a bit above average valence --> </emotion> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="angry" value="0.5"/> <!-- anger at medium intensity --> </emotion> <emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <appraisal name="suddenness" value="0.9"/> <!-- appraisal as a very sudden event --> </emotion> <emotion action-tendency-set="http://www.w3.org/TR/emotion-voc/xml#frijda-action-tendencies"> <action-tendency name="approach" value="0.3"/> <!-- a rather weak tendency to approach --> </emotion>
Further examples of the value
attribute can be found in the
context of the <category>
, <dimension>
, <appraisal>
and <action-tendency>
elements.
<trace>
elementAnnotation | <trace> |
---|---|
Definition | Representation of the time evolution of a dynamic scale value. |
Children | None |
Attributes |
|
Occurrence | The |
A <trace>
element represents the time course of a scale
value.
The freq
attribute indicates the sampling frequency at which
the values listed in the samples
attribute are given.
NOTE: The <trace>
representation requires a periodic
sampling of values. In order to represent values that are sampled
aperiodically, separate <emotion>
annotations with
appropriate timing information and individual value
attributes may
be used.
Examples:
The following example illustrates the use of a trace to represent an episode of fear during which the emotion's intensity is rising, first gradually, then quickly to a very high value. Values are taken at a sampling frequency of 10 Hz, i.e. one value every 100 ms.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="fear"> <trace freq="10Hz" samples="0.1 0.1 0.15 0.2 0.2 0.25 0.25 0.25 0.3 0.3 0.35 0.5 0.7 0.8 0.85 0.85"/> </category> </emotion>
The following example combines a trace of the appraisal "suddenness" with a global confidence that the values represent the facts properly. There is a sudden peak of suddenness; the annotator is reasonably certain that the annotation is correct:
<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <appraisal name="suddenness" confidence="0.75"> <trace freq="10Hz" samples="0.1 0.1 0.1 0.1 0.1 0.7 0.8 0.8 0.8 0.8 0.4 0.2 0.1 0.1 0.1"/> </appraisal> </emotion>
EmotionML markup MUST refer to one or more vocabularies to be used for
representing emotion-related states, as specified in the context of the <emotionml>
and <emotion>
elements. Due to the lack of
agreement in the community, the EmotionML specification does not preview a
single default set which should apply if no set is indicated. Instead, the user
MUST explicitly state the set of descriptor names used.
The document [Vocabularies for EmotionML] provides a number of emotion vocabularies which are likely to be of general interest. In order to promote interoperability, users SHOULD verify if one of the vocabularies defined in that document is suitable for their application. If that is not the case, users can define their own custom vocabularies as defined in the present section.
The syntax for defining emotion vocabularies is based on the element
<vocabulary>
and its child <item>
.
<vocabulary>
elementAnnotation | <vocabulary> |
---|---|
Definition | Contains the definition of an emotion vocabulary. |
Children | A <vocabulary> element MUST contain one or more <item> elements. A
<vocabulary> element MAY contain a single <info> element, providing
arbitrary metadata about the vocabulary itself. |
Attributes |
|
Occurrence | One or more |
Vocabulary definitions, when present, occur as direct children of the
document root element <emotionml>
. It
is possible to refer to a vocabulary defined in the same or in a separate
EmotionML document, through URIs specified by the values of the attributes
category-set
, dimension-set
,
appraisal-set
and action-tendency-set
of the <emotion>
and <emotionml>
elements.
The value of the type
attribute explicitly states whether the
vocabulary represents category names, dimension elements, appraisal elements or
action tendency elements.
<item>
elementAnnotation | <item> |
---|---|
Definition | Represents the definition of one vocabulary item, associated with a
value which can be used in the "name" attribute of <category> , <dimension> , <appraisal> or <action-tendency> (depending on
the type of vocabulary being defined). |
Children | An <item> element MAY contain a single <info> element, providing
arbitrary metadata about the vocabulary item. |
Attributes |
|
Occurrence | One or more <item> elements occur as direct
children of a <vocabulary>
element. |
An <item>
represents the definition of one vocabulary
item. A <vocabulary>
MUST contain at
least one <item>
element.
Examples:
In the following example, three vocabularies are wrapped into a single
EmotionML document. Their id
attributes are: "big6",
"fsre-dimensions" and "frijda-subset". They are used to represent categories,
dimensions and action tendencies respectively. The first
<emotion>
element specifies the emotion vocabularies used
through the attributes category-set
and
action-tendency-set
, while the second <emotion>
element uses the attribute dimension-set
.
<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml"> <!-- Vocabulary definitions --> <vocabulary type="category" id="big6"> <item name="anger"/> <item name="disgust"/> <item name="fear"/> <item name="happiness"/> <item name="sadness"/> <item name="surprise"/> </vocabulary> <vocabulary type="dimension" id="fsre-dimensions"> <item name="valence"/> <item name="potency"/> <item name="arousal"/> <item name="unpredictability"/> </vocabulary> <vocabulary type="action-tendency" id="frijda-subset"> <item name="approach"/> <item name="avoidance"/> <item name="rejecting"/> </vocabulary> <!-- Emotion elements --> <emotion category-set="#big6" action-tendency-set="#frijda-subset"> <category name="fear"/> <action-tendency name="approach" value="0.0"/> <action-tendency name="avoidance" value="0.9"/> </emotion> <emotion dimension-set="#fsre-dimensions"> <dimension name="arousal" value="0.3"/> </emotion> </emotionml>
EmotionML refers to emotion vocabularies using the
category-set
, dimension-set
,
appraisal-set
and action-tendency-set
attributes of
<emotion>
and <emotionml>
. A vocabulary can be referred
to using the Fragment Identifier syntax described in [RFC 3023]. As described in the [XPointer Framework] Shorthand Pointer
notation, the value of the vocabulary's id
attribute is to be used
as the fragment identifier.
The following example demonstrates the use of this notation. Note that the
EmotionML document residing at http://www.w3.org/TR/emotion-voc/xml defines an
emotion category vocabulary with an attribute id="big6"
.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
The EmotionML namespace is "http://www.w3.org/2009/10/emotionml". All EmotionML elements MUST use this namespace.
The EmotionML namespace is intended to be used with other XML namespaces as per the Namespaces in XML Recommendation (1.0 [XML-NS10] or 1.1 [XML-NS11], depending on the version of XML being used).
The EmotionML schema is designed to validate the structural integrity of an
EmotionML document or document fragment, but cannot verify whether the emotion
descriptors used in the name
attribute of
<category>
, <dimension>
,
<appraisal>
and <action-tendency>
are
consistent with the vocabularies indicated in the respective
category-set
, dimension-set
,
appraisal-set
and action-tendency-set
attributes.
It is the responsibility of an EmotionML processor to verify that the use of descriptor names and values is consistent with the vocabulary definition.
There are also a variety of test assertions that can not be validated by the schema but lie in the reponsibility ot the EmotionML producer.
A document is a Conforming Stand-Alone EmotionML Document if it meets both the following conditions:
A document fragment is a Conforming EmotionML Fragment if it conforms to the criteria for Conforming EmotionML Documents after adding a surrounding emotionml root element.
A Conforming EmotionML Processor must correctly understand and apply the semantics of each markup element as described by this document.
There is, however, no conformance requirement with respect to performance characteristics of the EmotionML Processor. For instance, no statement is required regarding the accuracy, speed or other characteristics of output produced by the processor. No statement is made regarding the size of input that an EmotionML Processor is required to support.
An EmotionML Producer is an EmotionML Processor that can produce Conforming EmotionML Documents.
An EmotionML Consumer is an EmotionML Processor that can parse and process Conforming EmotionML Documents.
When a Conforming EmotionML Consumer encounters markup that it cannot interpret, e.g. misspelled element names or attribute values like category or dimension set values that it doesn't know about, it may:
This section is informative.
A part of Lewis Carroll's "Alice's Adventures in Wonderland" gets annotated with emotions.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <info> <meta:doc>Example adapted from (Zhang, Black & Sproat 2003) http://www.cs.cmu.edu/~awb/papers/eurospeech2003/esper.pdf </meta:doc> </info> <emotion> <category name="Disgust" value="0.82"/> ‘Come, there’s no use in crying like that!’ </emotion> said Alice to herself rather sharply; <emotion> <category name="Anger" value="0.57"/> ‘I advise you to leave off this minute!’ </emotion> </emotionml>Note that, although the text may be scattered, each statement applies to the whole text, for example in the following "Peter" is angry and disgusted at the same time:
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <emotion> <category name="Disgust" value="0.82"/> Peter was angry <category name="Anger" value="0.57"/> and disgusted. </emotion> </emotionml>
An image gets annotated with several emotion categories at the same time, but different intensities.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/hall-matsumoto-emotions.xml"> <info> <meta:media-type>image</meta:media-type> <meta:media-id>disgust</meta:media-id> <meta:media-set>JACFEE-database</meta:media-set> <meta:doc>Example adapted from (Hall & Matsumoto 2004) http://davidmatsumoto.com/content/2004%20hall%20and%20matsumoto.pdf </meta:doc> </info> <emotion> <category name="Disgust" value="0.82"/> <category name="Contempt" value="0.35"/> <category name="Anger" value="0.12"/> <category name="Surprise" value="0.53"/> </emotion> </emotionml>
Example 1: Annotation of a whole video: several emotions are annotated with different intensities.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/humaine-database-labels.xml"> <info> <meta:media-type>video</meta:media-type> <meta:media-name>ed1_4</meta:media-name> <meta:media-set>humaine database</meta:media-set> <meta:coder-set>JM-AB-UH</meta:coder-set> </info> <emotion> <category name="Amusement" value="0.52"/> <category name="Irritation" value="0.63"/> <category name="Relaxed" value="0.02"/> <category name="Frustration" value="0.87"/> <category name="Calm" value="0.21"/> <category name="Friendliness" value="0.28"/> </emotion> </emotionml>
Example 2: Annotation of a video segment, where two emotions are annotated for overlapping but not identical timespans.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/emotv-labels.xml"> <info> <meta:media-type>video</meta:media-type> <meta:media-name>ext-03</meta:media-name> <meta:media-set>EmoTV</meta:media-set> <meta:coder>4</meta:coder> </info> <emotion> <category name="irritation" value="0.46"/> <reference uri="file:ext03.avi?t=3.24,15.4"> </emotion> <emotion> <category name="despair" value="0.48"/> <reference uri="file:ext03.avi?t=5.15,17.9"/> </emotion> </emotionml>
This example shows how automatically annotated data from three affective sensor devices might be stored or communicated.
It shows an excerpt of an episode experienced on 23 November 2001 from 14:36 onwards (absolute start time is 1006526160000 milliseconds since 1 January 1970 00:00:00 GMT). Each device detects an emotion, but at slightly different times and for different durations.
The next entry of observed emotions occurs about 6 minutes later (absolute start time is 1006526520000 milliseconds since 1 January 1970 00:00:00 GMT). Only the physiology sensor has detected a short glimpse of anger, for the visual and IR camera it was below their individual threshold so no entry from them.
For simplicity, all devices use categorical annotations and the same set of categories. Obviously it would be possible, and even likely, that different devices from different manufacturers provide their data annotated with different emotion sets.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> ... <emotion start="1006526160000" expressed-through="face"> <!--the first modality detects excitement. It is a camera observing the face. A URI to the database is provided to access the video stream.--> <category name="excited"/> <reference uri="http://www.example.com/facedb#t=26,98"/> </emotion> <emotion start="1006526160000" expressed-through="facial-skin-color"> <!--the second modality detects anger. It is an IR camera observing the face. A URI to the database is provided to access the video stream.--> <category name="angry"/> <reference uri="http://www.example.com/skindb#t=23,108"/> </emotion> <emotion start="1006526160000" expressed-through="physiology"> <!--the third modality detects excitement again. It is a wearable device monitoring physiological changes in the body. A URI to the database is provided to access the data stream.--> <category name="excited"/> <reference uri="http://www.example.com/physiodb#t=19,101"/> </emotion> <emotion start="1006526520000" expressed-through="physiology"> <category name="angry"/> <reference uri="http://www.example.com/physiodb2#t=2,6"/> </emotion> ... </emotionml>
Note that handling of complex emotions is not explicitly specified. This example assumes that parallel occurrences of emotions will be determined on the time stamp.
The MPEG-4 standard offers 68 parameters, called Facial Animation Parameters FAPs, to animate a 3D facial model. 66 of these parameters correspond to low level parameters. These parameters act on the facial feature points defining a 3D facial model. They specify how these feature points are displaced. They simulate muscular contraction. On the other hand, two FAPs, namely FAP1 and FAP2, refer respectively to viseme and expression. FAP2 corresponds to one of the six basic facial expressions (anger, disgust, fear, happiness, sadness and surprise). The expressions associated to the six emotions are defined by textual descriptions [Ostermann, 2002].
In emotion theory, the idea of mixing emotions to create new emotions is disputed. For the purposes of facial expression modeling, however, it is possible to simulate different emotions as linear combinations of the six basic facial expressions. MPEG-4 allows the linear combination of any two of these expressions: emotion_1 * intensity_1 + emotion_2 * intensity_2. For example, [Raouzaiou et al., 2005] found that the expressions of depression and guilt can be obtained by combinations of fear and sadness with different intensities, while the expression of suspicion is obtained by combining anger and disgust.
In EmotionML it is possible to represent the emotional input to an MPEG-4
based facial animation system using multiple <category>
elements, for example as follows.
<emotion xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <!-- attempt to express suspicion as a combination of anger and disgust --> <category name="anger" value="0.5"/> <category name="disgust" value="0.3"/> </emotion>
The following example describes various aspects of an emotionally competent robot whose battery is nearly empty. The robot is in a global state of high arousal, negative pleasure and low dominance, i.e. a negative state of distress paired with some urgency but quite limited power to influence the situation. It has a tendency to seek a recharge and to avoid picking up boxes. However, sensor data displays an unexpected obstacle on the way to the charging station. This triggers planning of expressive behavior of frowning. The annotations are grouped into a stand-alone EmotionML document here; in the real world, the various aspects would more likely be embedded into different specialized markup in various parts of the Robot architecture.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata"> <info> <meta:name>robbie the robot example</meta:name> </info> <!-- Robot's current global state configuration: negative, active, powerless --> <emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions"> <dimension name="pleasure" value="0.2"/> <dimension name="arousal" value="0.8"/> <dimension name="dominance" value="0.3"/> </emotion> <!-- Robot's action tendencies: want to recharge --> <emotion action-tendency-set="http://www.example.com/custom/action/robot.xml"> <action-tendency name="charge-battery" value="0.9"/> <action-tendency name="seek-shelter" value="0.7"/> <action-tendency name="pickup-boxes" value="0.1"/> </emotion> <!-- Appraised value of incoming event: obstacle detected, appraised as novel and unpleasant --> <emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <appraisal name="suddenness" value="0.8" confidence="0.4"/> <appraisal name="intrinsic-pleasantness" value="0.2" confidence="0.8"/> <reference role="triggeredBy" uri="file:scannerdata.xml#obstacle27"/> </emotion> <!-- Robot's planned facial gestures: will frown --> <emotion category-set="http://www.example.com/custom/robot-emotions.xml" expressed-through="face"> <category name="frustration"/> <reference role="expressedBy" uri="file:behavior-repository.xml#frown"/> </emotion> </emotionml>
One intended use of EmotionML is as a plug-in for existing markup languages. For compatibility with text-annotating markup languages such as SSML, EmotionML avoids the use of text nodes. All EmotionML information is encoded in element and attribute structures.
This section illustrates the concept using three existing W3C markup languages: EMMA, SSML, and SMIL.
EMMA is made for representing arbitrary analysis results; one of them could be the emotional state. The following example represents an analysis of a non-verbal vocalization; its emotion is described as a low-intensity state, maybe boredom.
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2009/10/emotionml"> <emma:interpretation emma:start="1245790094000" emma:end="1245790095000" emma:mode="voice" emma:verbal="false"> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="bored" value="0.1" confidence="0.1"/> </emotion> </emma:interpretation> </emma:emma>
In the following example, the EMMA <emma:derivation>
element is used to represent multiple emotion interpretations associated with
audio and video media sources. The first and the third interpretations specify
the same emotion category, "content", while the result of the second one is
"amused". The consolidated emotion is the result of some processing made on the
interpretations included in the derivation element. In this case it is
"content", which is the most frequent category within the available
interpretations.
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2009/10/emotionml"> <emma:derivation> <emma:interpretation id="text1" emma:start="1245790094000" emma:end="1245790095000" emma:mode="voice" emma:verbal="true" emma:signal="http://example.com/signals/emo123.wav" emma:process="http://example.com/text_analysis.xml"> <emma:literal>I feel happy</emma:literal> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="content" value="0.7" confidence="0.7"/> </emotion> </emma:interpretation> <emma:interpretation id="voice1" emma:start="1245790094000" emma:end="1245790095000" emma:mode="voice" emma:verbal="false" emma:signal="http://example.com/signals/emo123.wav" emma:process="http://example.com/voice_analysis.xml"> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="amused" value="0.4" confidence="0.5"/> </emotion> </emma:interpretation> <emma:interpretation id="video1" emma:start="1245790090000" emma:end="1245790100000" emma:mode="video" emma:verbal="false" emma:signal="http://example.com/signals/emo123.mpg" emma:process="http://example.com/video_analysis.xml"> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="content" value="0.5" confidence="0.7"/> </emotion> </emma:interpretation> </emma:derivation> <emma:interpretation id="multimodal1" emma:start="1245790094000" emma:end="1245790100000" emma:medium="acoustic visual" emma:mode="voice video"> <emma:derived-from resource="#text1" composite="true"/> <emma:derived-from resource="#voice1" composite="true"/> <emma:derived-from resource="#video1" composite="true"/> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="content" value="0.6" confidence="0.7"/> </emotion> </emma:interpretation> </emma:emma>
EmotionML could be used with the Speech Synthesis Markup Language SSML as follows.
It is possible with [SSML 1.1] to use arbitrary
markup belonging to a different namespace anywhere in an SSML document; only
SSML processors that support the markup would take it into account. Therefore,
it is possible to insert EmotionML below, for example, an
<s>
element representing a sentence; the intended meaning is
that the enclosing sentence should be spoken with the given emotion, in this
case a moderately worried tone of voice:
<?xml version="1.0"?> <speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="en-US"> <s> <emo:emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <emo:category name="worried" value="0.4"/> </emo:emotion> Do you need help? </s> </speak>
Using EmotionML for the use case of generating system behavior requires elements of scheduling and surface form realization which are not part of EmotionML. Necessarily, this use case relies on other languages to provide the needed functionality. This is in line with the aim of EmotionML to serve as a specialized plug-in language.
This example illustrates the idea in terms of a simplified version of a
storytelling application. A virtual agent tells a story using voice and facial
animation. The expression in face and voice is influenced by the rendering
engine in terms of EmotionML. The engine in this example uses SMIL [SMIL] for defining the temporal relation between events;
EmotionML is used via SMIL's generic <ref>
element. In
general it is the engine which knows how to render the emotion in the virtual
agent's expressive capabilities. To override this, the second
<emotion>
contains an explicit request to realize the
emotional expression using both face and voice modalities.
ridinghood.smil:
<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0"> <head> ... </head> <body> <par duration="8s"> <img src="file:forest.jpg"/> <smileText>The little girl was enjoying the walk in the forest.</smileText> <ref src="file:ridinghood.emotionml#emotion1"/> </par> <par duration="5s"> <img src="file:wolf.jpg"/> <smileText>Suddenly a dark shadow appeared in front of her.</smileText> <ref src="file:ridinghood.emotionml#emotion2"/> </par> </body> </smil>
ridinghood.emotionml:
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <emotion id="emotion1"> <category name="content" value="0.7"/> </emotion> <emotion id="emotion2" expressed-through="face voice"> <category name="afraid" value="0.9"/> <appraisal name="suddenness" value="0.9"/> <appraisal name="intrinsic-pleasantness" value="0.1"/> </emotion> </emotionml>
Similar principles for decoupling emotion markup from the temporal organization of generating system behavior can be applied using other representations, including interactive setups.
The authors wish to acknowledge the contributions by all members of the Multimodal Interaction Working Group, the Emotion Markup Language Incubator Group and the Emotion Incubator Group, as well as the participants to the W3C Workshop on EmotionML, in particular the following persons (in alphabetic order):
This section is Normative.
This section defines the formal syntax for EmotionML documents in terms of a normative XML Schema.
The latest version of the XML Schema for Conforming Stand-Alone EmotionML Documents is available at http://www.w3.org/TR/emotionml/emotionml.xsd. The latest version of the XML Schema for Conforming EmotionML Document Fragments is available at http://www.w3.org/TR/emotionml/emotionml-fragments.xsd.
For stability it is RECOMMENDED that you use the dated URI available at http://www.w3.org/TR/2014/REC-emotionml-20140522/emotionml.xsd and http://www.w3.org/TR/2014/REC-emotionml-20140522/emotionml-fragments.xsd, respectively.
This section is Normative.
This appendix registers a new MIME media type,
"application/emotionml+xml
".
The "application/emotionml+xml
" media type is registered
with IANA at http://www.iana.org/assignments/media-types/application/.
application
emotionml+xml
None.
charset
This parameter has identical semantics to the
charset
parameter of the application/xml
media type as specified in [RFC 3023] or its
successor.
By virtue of EmotionML content being XML, it has the same
considerations when sent as "application/emotionml+xml
" as
does XML. See RFC 3023 (or its successor), section 3.2.
EmotionML elements may include arbitrary URIs. Therefore the security issues of [RFC 3986], section 7, should be considered.
In addition, because of the extensibility features for EmotionML, it
is possible that "application/emotionml+xml
" will describe
content that has security implications beyond those described here.
However, if the processor follows only the normative semantics of this
specification, this content will be ignored. Only in the case where the
processor recognizes and processes the additional content, or where
further processing of that content is dispatched to other processors,
would security issues potentially arise. And in that case, they would
fall outside the domain of this registration document.
This specification describes processing semantics that dictate the required behavior for dealing with, among other things, unrecognized elements.
Because EmotionML is extensible, conformant
"application/emotionml+xml
" processors MAY expect that
content received is well-formed XML, but processors SHOULD NOT assume
that the content is valid EmotionML or expect to recognize all of the
elements and attributes in the document.
This media type registration is extracted from Appendix B of the "Emotion Markup Language (EmotionML) 1.0" specification.
There is no single initial octet sequence that is always present in EmotionML documents.
EmotionML documents are most often identified with the
extensions ".emotionml
".
TEXT
Kazuyuki Ashimura, <[email protected]>.
COMMON
None.
The EmotionML specification is a work product of the World Wide Web Consortium's Multimodal Interaction Working Group.
The W3C has change control over these specifications.
For documents labeled as "application/emotionml+xml
", the
fragment identifier notation is exactly that for
"application/xml
", as specified in RFC 3023.
This section is informative.
This section summarizes the changes since the Proposed Recommendation of 16 April 2013.
<emotion>
elementduration
and end
attributesNote that we fixed several errors and typos in the Vocabularies for EmotionML Working Group Note based on the recent public comments and republished the document.
This section summarizes the changes since the Candidate Recommendation of 10 May 2012.
This section summarizes the changes since the Last Call Working Draft of 07 April 2011.
category-set
etc. The term is defined in the glossary and is used in the attribute occurrence definitions for <emotionml>
and <emotion>
as well as the emotion representations <category>
, <dimension>
, <appraisal>
, and <action-tendency>
; <emotionml>
and <emotion>
may contain arbitrary text;version
, value
, <trace>
, confidence
, <info>
, freq
;<category>
element was
harmonized with the other emotion descriptors to allow a
value
attribute or a <trace>
child
element indicating the intensity of that category. Multiple
<category>
elements are now allowed within a single
<emotion>
to reflect the possible co-presence of
these categories. The <intensity>
element was
removed since the usual use is now covered by the value
attribute in <category>
.value
attribute or the <trace>
child element was made
optional for <appraisal>
and
<action-tendency>
elements, in
order to allow for the possibility to merely represent the fact that a
certain appraisal or action tendency is present, irrespective of its
intensity.duration
and relative timestamps was added.<info>
element for representing
metadata.emma:mode
attribute in [EMMA], the modality
attribute was
renamed to expressed-through
.<dimension>
, <appraisal>
and <action-tendency>
elements with a
name
attribute.start
and end
attributes to represent
absolute time, and Media Fragment
URIs to refer to portions of media files.<info>
element, in synchrony with EMMA.<link>
element was renamed to <reference>
to avoid a name clash
with the <link>
element in HTML, which has a
different scope and syntax.