Emotion Markup Language (EmotionML) 1.0

W3C Recommendation 22 May 2014

This version:: http://www.w3.org/TR/2014/REC-emotionml-20140522/
Latest version:: http://www.w3.org/TR/emotionml/
Previous version:: http://www.w3.org/TR/2013/PR-emotionml-20130416/

Editors:: Felix Burkhardt (Deutsche Telekom AG); Marc Schröder (until July 2012, while with DFKI GmbH)
Authors:: (in alphabetic order); Paolo Baggia (while at Loquendo, currently Nuance Communications); Catherine Pelachaud (Telecom ParisTech); Christian Peter (Fraunhofer Gesellschaft); Enrico Zovato (while at Loquendo, currently Nuance Communications)

Please refer to the errata for this document, which may include normative corrections.

Abstract

As the Web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the Recommendation of "Emotion Markup Language 1.0". It has been published by the Multimodal Interaction Working Group, which is part of the Multimodal Interaction Activity.

Comments are welcome at [email protected] (archive). See W3C mailing list and archive usage guidelines.

This specification has been widely reviewed (see the Candidate Recommendation Disposition of Comments and the Last Call Working Draft Disposition of Comments) and satisfies the Working Group's technical requirements. A list of implementations is included in the EmotionML Implementation Report. The Working Group made several editorial changes to the 16 April 2013 Proposed Recommendation. Changes from the Proposed Recommendation can be found in Appendix C. There are no substantial changes since the Proposed Recommendation.

This document has been reviewed by W3C Members, by software developers, and by other W3C groups and interested parties, and is endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

This document has been produced as part of the W3C Multimodal Interaction Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Multimodal Interaction Working Group.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

For more information about the Multimodal Interaction Activity, please see the Multimodal Interaction Activity statement.

Conventions of this document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

The sections in the main body of this document are normative unless otherwise specified. The appendices in this document are informative unless otherwise indicated explicitly. The examples in the main body are also informative.

1 Introduction

This section is informative.

Human emotions are increasingly understood to be a crucial aspect in human-machine interactive systems. Especially for non-expert end users, reactions to complex intelligent systems resemble social interactions, involving feelings such as frustration, impatience, or helplessness if things go wrong. Furthermore, technology is increasingly used to observe human-to-human interactions, such as customer frustration monitoring in call center applications. Dealing with these kinds of states in technological systems requires a suitable representation, which should make the concepts and descriptions developed in the affective sciences available for use in technological contexts.

This report specifies Emotion Markup Language (EmotionML) 1.0, a markup language designed to be usable in a broad variety of technological contexts while reflecting concepts from the affective sciences.

1.1 Reasons for defining an Emotion Markup Language

As for any standard format, the first and main goal of an EmotionML is twofold: to allow a technological component to represent and process data, and to enable interoperability between different technological components processing the data.

Use cases for EmotionML can be grouped into three broad types:

Manual annotation of material involving emotionality, such as annotation of videos, of speech recordings, of faces, of texts, etc;
Automatic recognition of emotions from sensors, including physiological sensors, speech recordings, facial expressions, etc., as well as from multi-modal combinations of sensors;
Generation of emotion-related system responses, which may involve reasoning about the emotional implications of events, emotional prosody in synthetic speech, facial expressions and gestures of embodied agents or robots, the choice of music and colors of lighting in a room, etc.

Interactive systems are likely to involve both analysis and generation of emotion-related behavior; furthermore, systems are likely to benefit from data that was manually annotated, be it as training data or for rule-based modeling. Therefore, it is desirable to propose a single EmotionML that can be used in all three contexts.

Concrete examples of existing technology that could apply EmotionML include:

Opinion mining / sentiment analysis in Web 2.0, to automatically track customer's attitude regarding a product across blogs;
Affective monitoring, such as ambient assisted living applications for the elderly, fear detection for surveillance purposes, or using wearable sensors to test customer satisfaction;
Character design and control for games and virtual worlds;
Social robots, such as guide robots engaging with visitors;
Expressive speech synthesis, generating synthetic speech with different emotions, such as happy or sad, friendly or apologetic; expressive synthetic speech would for example make more information available to blind and partially sighted people, and enrich their experience of the content;
Emotion recognition (e.g., for spotting angry customers in speech dialog systems);
Support for people with disabilities, such as educational programs for people with autism. EmotionML can be used to make the emotional intent of content explicit. This would enable people with learning disabilities (such as Asperger's Syndrome) to realize the emotional context of the content;
EmotionML can be used for media transcripts and captions. Where emotions are marked up to help deaf or hearing impaired people who cannot hear the soundtrack, more information is made available to enrich their experience of the content.

The Emotion Incubator Group has listed 39 individual use cases for an EmotionML.

A second reason for defining an EmotionML is the observation that ad hoc attempts to deal with emotions and related states often lead people to make the same mistakes that others have made before. The most typical mistake is to model emotions as a small number of intense states such as anger, fear, joy, and sadness; this choice is often made irrespective of the question whether these states are the most appropriate for the intended application. Crucially, the available alternatives that have been developed in the affective science literature are not sufficiently known, resulting in dead-end situations after the initial steps of work. Careful consideration of states to study and of representations for describing them can help avoid such situations.

EmotionML makes scientific concepts of emotions practically applicable. This can help potential users to identify the suitable representations for their respective applications.

1.2 The challenge of defining a generally usable Emotion Markup Language

Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure: even scientists cannot agree on the number of relevant emotions, or on the names that should be given to them. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small number of dimensions; emotions have an intensity, and so on. For details, see Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group.

Given this lack of agreement on descriptors in the field, the only practical way of defining an EmotionML is the definition of possible structural elements and their valid child elements and attributes, but to allow users to "plug in" vocabularies that they consider appropriate for their work. A separate W3C Working Draft complements this specification to provide a central repository of [Vocabularies for EmotionML] which can serve as a starting point; where the vocabularies listed there seem inappropriate, users can create their custom vocabularies.

An additional challenge lies in the aim to provide a generally usable markup, as the requirements arising from the three different use cases (annotation, recognition, and generation) are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish only a very small number of different states.

For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed, and to propose reasonable default options for every choice.

1.3 Glossary of terms

The terms related to emotions are not used consistently, neither in common use nor in the scientific literature. The following glossary describes the intended meaning of terms in this document.

Action tendency: Emotions have a strong influence on the motivational state of a subject. Emotion theory associates emotions to a small set of so-called action tendencies, e.g. avoidance (relates to fear), rejecting (disgust) etc. Action tendencies can be viewed as a link between the outcome of an appraisal process and actual actions.
Affect / Affective state: In the scientific literature, the term "affect" is often used as a general term covering a range of phenomena called "affective states", including emotions, moods, attitudes, etc. Proponents of the term consider it to be more generic than "emotion", in the sense that it covers both acute and long-term, specific and unspecific states. In this report, the term "affect" is avoided so that the scope of the intended markup language is more easily accessible to the non-expert; the term "affective state" is used interchangeably with "emotion-related state".
Appraisal: The term "appraisal" is used in the scientific literature to describe the evaluation process leading to an emotional response. Triggered by an "emotion-eliciting event", an individual carries out an automatic, subjective assessment of the event, in order to determine the relevance of the event to the individual. This assessment is carried out along a number of "appraisal dimensions" such as the novelty, pleasantness or goal conduciveness of the event.
Attitude: In psychology, "attitude" is related to the global evaluation of an object, such as a person, an object, oneself, etc. Attitude is considered to include an emotional component, as well as cognition and behavior. However, the term "attitude" is sometimes used with slightly different meanings, such as speaking style ("he said it with a certain attitude") or more generally personal lifestyle ("she has quite an attitude"). Because of this ambiguity, this specification avoids the term.
Declared vocabulary: The emotion vocabulary for emotion representations declared in a given EmotionML document context. There are four types of vocabulary: category vocabularies for category-based representations, dimension vocabularies for dimension-based representations, appraisal vocabularies for appraisal-based representations, and action tendency vocabularies for action tendency-based representations. Only vocabulary items defined in the declared vocabulary of the given type can be used for representations of that type. This is to ensure that annotations are well-defined by making explicit the vocabulary from which they are taken.
Emotion: In this report, the term "emotion" is used in a very broad sense, covering both intense and weak states, short and long term, with and without event focus. This meaning is intended to reflect the understanding of the term "emotion" by the general public. In the scientific literature on emotion theories, the term "emotion" or "fullblown emotion" refers to intense states with a strong focus on current events, often in the context of the survival-benefiting function of behavioral responses such as "fight or flight". This reading of the term seems inappropriate for the vast majority of human-machine interaction contexts, in which more subtle states dominate; therefore, where this reading is intended, the term "fullblown emotion" is used in this report.
Emotion-related state: A cover term for the broad range of phenomena intended to be covered by this specification. In the scientific literature, several kinds of emotion-related or affective states are distinguished, see Emotions and related states in the final report of the Emotion Incubator Group.
Emotion dimensions: A small number of continuous scales describing the most basic properties of an emotion. Often three dimensions are used: valence (sometimes named pleasure), arousal (or activity/activation), and potency (sometimes called control, power or dominance). However, sometimes two, or more than three dimensions are used.
Fullblown emotion: Intense states with a strong focus on current events, often in the context of the survival-benefiting function of behavioral responses such as "fight or flight".

2 Elements of Emotion Markup

The following sections describe the syntax of the main elements of EmotionML.

2.1 Document structure

2.1.1 Document root: The `<emotionml>` element

Annotation	`<emotionml>`
Definition	The root element of an EmotionML document.
Children	The element MAY contain one or more `<emotion>` elements. It MAY contain a single `<info>` element. It MAY contain one or more `<vocabulary>` elements.
Attributes	Required: Namespace declaration for EmotionML, see EmotionML namespace. `version` indicates the version of the specification to be used for the document. Documents using this specification MUST use `1.0` for the value. Optional: `category-set` declares a global category vocabulary (see also `<category>`). The attribute MUST be of type `xsd:anyURI` and MUST refer to the ID of a `<vocabulary>` element defining an emotion vocabulary with `type="category"`, as specified in Defining vocabularies for representing emotions. `dimension-set` declares a global dimension vocabulary (see also `<dimension>`). The attribute MUST be of type `xsd:anyURI` and MUST refer to the ID of a `<vocabulary>` element defining an emotion vocabulary with `type="dimension"`, as specified in Defining vocabularies for representing emotions. `appraisal-set` declares a global appraisal vocabulary (see also `<appraisal>`). The attribute MUST be of type `xsd:anyURI` and MUST refer to the ID of a `<vocabulary>` element defining an emotion vocabulary with `type="appraisal"`, as specified in Defining vocabularies for representing emotions. `action-tendency-set` declares a global action tendency vocabulary (see also `<action-tendency>`). The attribute MUST be of type `xsd:anyURI` and MUST refer to the ID of a `<vocabulary>` element defining an emotion vocabulary with `type="action-tendency"`, as specified in Defining vocabularies for representing emotions.
Occurrence	This is the root element -- it cannot occur as a child of any other EmotionML element.

The root element of a standalone EmotionML document MUST be <emotionml>. It MAY contain a single <info> element, providing document-level metadata.

The <emotionml> element MUST define the EmotionML namespace: 'http://www.w3.org/2009/10/emotionml'.

The <emotionml> element MAY contain arbitrary plain text. See annotation of text as an example.

Standalone EmotionML documents usually serve one or both of the following two purposes:

to wrap a number of <emotion> elements into a single document;
to define emotion vocabularies for use with <emotion> annotations in the same or other documents.

Example:

<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml">
 ...
</emotionml>

<emo:emotionml version="1.0" xmlns:emo="http://www.w3.org/2009/10/emotionml">
 ...
</emo:emotionml>

Note: One of the envisaged uses of EmotionML is to be used in the context of other markup languages. In such cases, there will be no <emotionml> root element, but <emotion> elements will be used directly in other markup -- see Examples of possible use with other markup languages.

2.1.2 A single emotion annotation: The `<emotion>` element

Annotation	`<emotion>`
Definition	This element represents a single emotion annotation.
Children	All children are optional. However, the `<emotion>` element MUST contain at least one `<category>` or `<dimension>` or `<appraisal>` or `<action-tendency>` element. If present, the following child element can occur only once: `<info>`. If present, the following child elements may occur one or more times: `<category>`; `<dimension>`; `<appraisal>`; `<action-tendency>`; `<reference>`. Note, that the `<emotion>` element MUST NOT contain two of the afore mentioned child elements that belong to different sets of one kind, or two child elements with the same name. It is NOT possible to mix elements from different `<category-set>`, `<dimension-set>`, `<appraisal-set>` or `<action-tendency-set>`. There are no constraints on the combinations of children that are allowed. There are no constraints on the order in which children occur. Note that an `<emotion>` element MAY contain arbitrary text like shown in the text annotation example.
Attributes	Optional: `version` indicates the version of the specification to be used for the `<emotion>` and its descendants. Documents using this specification MUST use `1.0` for the value. The value of the `version` attribute defaults to "1.0". `id`, a unique identifier for the emotion, of type `xsd:ID`. `category-set` declares a local category vocabulary (see also `<category>`) for the current `<emotion>` element. The attribute MUST be of type `xsd:anyURI` and MUST refer to the ID of a `<vocabulary>` element defining an emotion vocabulary with `type="category"`, as specified in Defining vocabularies for representing emotions. `dimension-set` declares a local dimension vocabulary (see also `<dimension>`) for the current `<emotion>` element. The attribute MUST be of type `xsd:anyURI` and MUST refer to the ID of a `<vocabulary>` element defining an emotion vocabulary with `type="dimension"`, as specified in Defining vocabularies for representing emotions. `appraisal-set` declares a local appraisal vocabulary (see also `<appraisal>`) for the current `<emotion>` element. The attribute MUST be of type `xsd:anyURI` and MUST refer to the ID of a `<vocabulary>` element defining an emotion vocabulary with `type="appraisal"`, as specified in Defining vocabularies for representing emotions. `action-tendency-set` declares a local action tendency vocabulary (see also `<action-tendency>`) for the current `<emotion>` element. The attribute MUST be of type `xsd:anyURI` and MUST refer to the ID of a `<vocabulary>` element defining an emotion vocabulary with `type="action-tendency"`, as specified in Defining vocabularies for representing emotions. `start`, `end`, `duration`, `time-ref-uri`, `time-ref-anchor-point` and `offset-to-start` provide information about the times at which an emotion happened, as defined in Timestamps. `expressed-through`, the modality, or list of modalities, through which the emotion is expressed.
Occurrence	as a child of `<emotionml>`, or in any markup using EmotionML.

The <emotion> element represents an individual emotion annotation. No matter how simple or complex its substructure is, it represents a single statement about the emotional content of some annotated item. Where several statements about the emotion in a certain context are to be made, several <emotion> elements MUST be used. See Examples of emotion annotation for illustrations of this issue.

An <emotion> element MAY have an id attribute, allowing for a unique reference to the individual emotion annotation. Since the <emotion> annotation is an atomic statement about the emotion, it is inappropriate to refer to individual emotion representations such as <category>, <dimension>, <appraisal>, <action-tendency> or their children directly. For this reason, these elements do not allow for an id attribute.

The <emotion> element MAY contain arbitrary plain text. See annotation of text as an example.

NOTE: For each type of vocabulary (category, dimension, appraisal and action tendency), only one declared vocabulary of that type can exist within a given document subtree, with local declaration taking precedence over global declaration. For example, if both a global category vocabulary (on <emotionml>) and a local category vocabulary (on <emotion>) are declared, the local one "hides" the global one, so that the local category vocabulary is the only declared category vocabulary within the given <emotion> element.

Whereas it is possible to use <emotion> elements in a standalone <emotionml> document, a typical use case is expected to be embedding an <emotion> into some other markup -- see Examples of possible use with other markup languages.

2.2 Representations of emotions and related states

2.2.1 The `<category>` element

Annotation	`<category>`
Definition	Description of an emotion or a related state using a category.
Children	`<trace>`: A `<category>` MAY contain either a `value` attribute or a `<trace>` element.
Attributes	Required: `name`, the name of the category, which MUST be contained in the declared category vocabulary (see below). Optional: `value`: A `<category>` MAY contain either a `value` attribute or a `<trace>` element. `confidence`, the annotator's confidence that the annotation given for this category is correct.
Occurrence	One or more `<category>` elements MAY occur as a child of `<emotion>`. For any given category name in the set, zero or one occurrence is allowed within an `<emotion>` element, i.e. a category with name "x" MUST NOT appear twice in one `<emotion>` element.

<category> describes an emotion or a related state in terms of a category name, given as the value of the name attribute.

If the <category> element is used, a category vocabulary MUST be declared (see <emotion> and <emotionml>), and the category name as given in the name attribute MUST be an item in the declared category vocabulary.

Different declared category vocabularies can be used, depending on the requirements of the use case. In particular, different types of emotion-related / affective states can be annotated by using appropriate value sets.

The intensity of an emotion category MAY be specified as a Scale value, either as a static value in the value attribute, or as a dynamic trace over time using the <trace> element. A <category> MUST NOT contain both a value attribute and a <trace> element.

Examples:

In the following example, the emotion category "satisfaction" is being annotated; it must be contained in the definition of the emotion category vocabulary located at http://www.w3.org/TR/emotion-voc/xml#everyday-categories, which is one of the category vocabularies provided in [Vocabularies for EmotionML].

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
    <category name="satisfied"/>
</emotion>

The following is an annotation of an interpersonal stance "distant" which must be defined in the custom category set at the URI given in the category-set attribute:

<emotion category-set="http://www.example.com/custom/category/interpersonal-stances.xml#voc">
    <category name="distant"/>
</emotion>

In the following example, an emotion is described by several categories, each being present with different values of intensity. The category set used is the "big six" set described in [Vocabularies for EmotionML].

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="sadness" value="0.3"/>
    <category name="anger" value="0.8"/>
    <category name="fear" value="0.3"/>
</emotion>

2.2.2 The `<dimension>` element

Annotation	`<dimension>`
Definition	One or more `<dimension>` elements jointly describe an emotion or a related state according to an emotion dimension vocabulary.
Children	`<trace>`: A `<dimension>` MUST contain either a `value` attribute or a `<trace>` element.
Attributes	Required: `name`, the name of the dimension, which MUST be contained in the declared dimension vocabulary (see below). `value`: A `<dimension>` MUST contain either a `value` attribute or a `<trace>` element. Optional: `confidence`, the annotator's confidence that the annotation given for this dimension is correct.
Occurrence	`<dimension>` elements occur as children of `<emotion>`. For any given dimension name in the set, zero or one occurrence is allowed within an `<emotion>` element, i.e. a dimension with name "x" MUST NOT appear twice in one `<emotion>` element.

One or more <dimension> elements jointly describe an emotion or a related state in terms of a set of emotion dimensions.

If the <dimension> element is used, a dimension vocabulary MUST be declared (see <emotion> and <emotionml>), and the dimension name as given in the name attribute MUST be an item in the declared dimension vocabulary. Different declared dimension vocabularies can be used, depending on the requirements of the use case.

The position on an emotion dimension MUST be specified as a Scale value, either as a static value in the value attribute, or as a dynamic trace over time using the <trace> element. A <dimension> MUST NOT contain both a value attribute and a <trace> element.

Examples:

One of the most widespread sets of emotion dimensions used (sometimes by different names) is the combination of valence, arousal and potency. The following example is a state of rather low arousal, very positive valence, and high potency -- in other words, a relaxed, positive state with a feeling of being in control of the situation. The example uses the Pleasure-Arousal-Dominance (PAD) vocabulary from [Vocabularies for EmotionML]:

<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions">
    <dimension name="arousal" value="0.3"/><!-- lower-than-average arousal -->
    <dimension name="pleasure" value="0.9"/><!-- very high positive valence -->
    <dimension name="dominance" value="0.8"/><!-- relatively high potency    -->
</emotion>

In some use cases, custom sets of application-specific dimensions will be required. The following example uses a custom set of dimensions, defining a single dimension "friendliness".

<emotion dimension-set="http://www.example.com/custom/dimension/friendliness.xml#voc">
    <dimension name="friendliness" value="0.2"/><!-- a pretty unfriendly person -->
</emotion>

The usual way to represent the intensity of an emotion would be the value attribute of a <category>. However, if only the intensity of an emotion is annotated, but not its nature, this can be done by using an "intensity" dimension. Thus, an emotional state's "strength" or "intensity" can be described independently from categorical or dimensional descriptions, as shown by the following example.

<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension">
    <dimension name="intensity" value="0.2"/><!-- not in a strong emotional state -->
</emotion>

2.2.3 The `<appraisal>` element

Annotation	`<appraisal>`
Definition	One or more `<appraisal>` elements jointly describe an emotion or a related state according to an emotion appraisal vocabulary.
Children	`<trace>`: An `<appraisal>` MAY contain either a `value` attribute or a `<trace>` element.
Attributes	Required: `name`, the name of the appraisal, which MUST be contained in the declared appraisal vocabulary (see below). Optional: `value`: An `<appraisal>` MAY contain either a `value` attribute or a `<trace>` element. `confidence`, the annotator's confidence that the annotation given for this appraisal is correct.
Occurrence	`<appraisal>` elements occur as children of `<emotion>`. For any given appraisal name in the set, zero or one occurrence is allowed within an `<emotion>` element, i.e. an appraisal with name "x" MUST NOT appear twice in one `<emotion>` element.

One or more <appraisal> elements jointly describe an emotion or a related state in terms of a set of appraisals.

If the <appraisal> element is used, an appraisal vocabulary MUST be declared (see <emotion> and <emotionml>), and the appraisal name as given in the name attribute MUST be an item in the declared appraisal vocabulary. Different declared appraisal vocabularies can be used, depending on the requirements of the use case.

The degree to which an appraisal is present MAY be specified as a Scale value, either as a static value in the value attribute, or as a dynamic trace over time using the <trace> element. An <appraisal> MUST NOT contain both a value attribute and a <trace> element.

Examples:

One of the most widespread sets of emotion appraisals used is the appraisals set proposed by Klaus Scherer, covering aspects of novelty, intrinsic pleasantness, goal/need significance, coping potential, and norm/self compatibility. Another very widespread set of emotion appraisals, used in particular in computational models of emotion, is the OCC set of appraisals (Ortony et al., 1988), which includes the consequences of events for oneself or for others, the actions of others and the perception of objects. Using Scherer's appraisals from [Vocabularies for EmotionML], the following example is a state arising from the evaluation of an unpredicted and quite unpleasant event:

<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">
    <appraisal name="suddenness" value="0.8"/>
    <appraisal name="intrinsic-pleasantness" value="0.2"/>
</emotion>

In some use cases, custom sets of application-specific appraisals will be required. The following example uses a custom set of appraisals, defining the single appraisal "likelihood".

<emotion appraisal-set="http://www.example.com/custom/appraisal/likelihood.xml#voc">
    <appraisal name="likelihood" value="0.8"/><!-- a very predictable event -->
</emotion>

2.2.4 The `<action-tendency>` element

Annotation	`<action-tendency>`
Definition	One or more `<action-tendency>` elements jointly describe an emotion or a related state according to an emotion action tendency vocabulary.
Children	`<trace>`: An `<action-tendency>` MAY contain either a `value` attribute or a `<trace>` element.
Attributes	Required: `name`, the name of the action tendency, which MUST be contained in the declared action tendency vocabulary (see below). Optional: `value`: An `<action-tendency>` MAY contain either a `value` attribute or a `<trace>` element. `confidence`, the annotator's confidence that the annotation given for this action tendency is correct.
Occurrence	`<action-tendency>` elements occur as children of `<emotion>`. For any given action tendency name in the set, zero or one occurrence is allowed within an `<emotion>` element, i.e. an action tendency with name "x" MUST NOT appear twice in one `<emotion>` element.

One or more <action-tendency> elements jointly describe an emotion or a related state in terms of a set of action tendencies.

If the <action-tendency> element is used, an action tendency vocabulary MUST be declared (see <emotion> and <emotionml>), and the action tendency name as given in the name attribute MUST be an item in the declared action tendency vocabulary. Different declared action tendency vocabularies can be used, depending on the requirements of the use case.

The degree to which an action tendency is present MAY be specified as a Scale value, either as a static value in the value attribute, or as a dynamic trace over time using the <trace> element. An <action-tendency> MUST NOT contain both a value attribute and a <trace> element.

Examples:

One well known use of action tendencies is by N. Frijda. This model uses a number of action tendencies that are low level, diffuse behaviors from which more concrete actions could be determined. It is provided in [Vocabularies for EmotionML]. An example of someone attempting to attract someone they like by being confident, strong and attentive might look like this:

<emotion action-tendency-set="http://www.w3.org/TR/emotion-voc/xml#frijda-action-tendencies">
    <action-tendency name="approach" value="0.7"/>   <!-- get close -->
    <action-tendency name="being-with" value="0.8"/> <!-- be happy -->
    <action-tendency name="attending" value="0.7"/>  <!-- pay attention -->
    <action-tendency name="dominating" value="0.7"/> <!-- be assertive -->
</emotion>

In some use cases, custom sets of application-specific action tendencies will be required. The following example shows control values for a robot who works in a factory and uses a custom set of action-tendencies, defining example actions for a robot. In the example, the robot has very low battery, so it needs to get ready to charge its battery and stop its work of picking up boxes.

<emotion action-tendency-set="http://www.example.com/custom/action/robot.xml#voc">
    <action-tendency name="charge-battery" value="0.9"/> <!-- need to charge battery soon -->
    <action-tendency name="pickup-boxes" value="0.3"/>   <!-- feeling tired, avoid work -->
</emotion>

2.3 Meta-information

2.3.1 The `confidence` attribute

Annotation	`confidence`
Definition	The degree of confidence or probability that the emotion representation carrying this attribute is correct. The value of the `confidence` attribute MUST be a floating point number in the closed interval [0, 1].
Occurrence	An optional attribute of `<category>`, `<dimension>`, `<appraisal>` and `<action-tendency>` elements.

Confidence MAY be indicated separately for each of the Representations of emotions and related states. For example, the confidence that the <category> is assumed correctly is independent from the confidence that the position on a dimension is correctly indicated.

Rooted in the tradition of statistics a confidence is given in an interval from 0 to 1, resembling a probability. Insofar, the confidence is a Scale value.

Examples:

In the following, one simple example is provided for each element that can carry a confidence attribute.

The first example indicates a very high confidence that surprise is the emotion to annotate.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="surprise" confidence="0.95"/> 
</emotion>

The next example illustrates using confidence to indicate that the annotation of high arousal is probably correct, but the annotation of slightly positive pleasure may or may not be correct.

<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions">
    <dimension name="arousal" value="0.8" confidence="0.9"/>
    <dimension name="pleasure" value="0.6" confidence="0.3"/>
</emotion>

Finally, an example for the case of intensity: A high confidence is given that the emotion has a low intensity.

 <emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension">
    <dimension name="intensity" value="0.1" confidence="0.8"/>
</emotion>

Note that, as stated, obviously an emotional annotation can be a combination of the above, as in the following example: the intensity of the emotion is quite probably low, but if we have to guess, we would say the emotion is boredom.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"
         dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension">
    <category name="bored" confidence="0.1"/>
    <dimension name="intensity" value="0.1" confidence="0.8"/>
</emotion>

2.3.2 The `expressed-through` attribute

Annotation	`expressed-through`
Definition	The modality, or list of modalities, through which the emotion is expressed. An attribute of type `xsd:NMTOKENS` which contains a space delimited set of values from an open set of values including: {`gaze`, `face`, `head`, `torso`, `gesture`, `leg`, `voice`, `text`, `locomotion`, `posture`, `physiology`, ...}.
Occurrence	An optional attribute of `<emotion>` elements.

The expressed-through attribute describes the modality through which an emotion is produced, usually by a human being. It is not the technical modality by which it was detected, e.g. "face" rather than "camera" and "voice" rather than "microphone". The expressed-through attribute is agnostic about the use case: when detecting emotion, it represents the modality from which the emotion has been detected; when generating emotion-related system behavior, it represents the modality through which the emotion is to be expressed.

The list of values provided covers a broad range of modalities through which emotions may be expressed. These values SHOULD be used if they are appropriate. The list is an open set in order to allow for more fine-grained distinctions such as "eyes" vs. "mouth" etc.

The expressed-through attribute is not specific about the sensors used for observing the modality. These can be specified using the <info> element, or by the emma:mode attribute in an enclosing [EMMA] document.

Example:

In the following example the emotion is expressed through the voice.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" 
         expressed-through="voice">
    <category name="satisfied"/>
</emotion>

In case of multimodal expression of an emotion, a list of space separated modalities can be indicated in the expressed-through attribute, like in the following example in which the two values "face" and "voice" are used.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" 
         expressed-through="face voice">
    <category name="satisfied"/>
</emotion>

2.3.3 The `<info>` element

Annotation	`<info>`
Definition	This element can be used to annotate arbitrary metadata.
Children	The `<info>` element MAY contain arbitrary plain text as well as any elements with a namespace different from the EmotionML namespace, 'http://www.w3.org/2009/10/emotionml'. It MUST NOT contain any elements in the EmotionML namespace.
Attributes	Optional: `id`, a unique identifier for the info element, of type `xsd:ID`.
Occurrence	A single `<info>` element MAY occur as a child of the `<emotionml>` root tag to indicate global metadata, i.e. the annotations are valid for the document scope; an `<emotion>` element to indicate local metadata that is only valid for that `<emotion>` element; a `<vocabulary>` element, providing arbitrary metadata about the vocabulary itself; an `<item>` element, providing arbitrary metadata about the vocabulary item.

This element can contain arbitrary XML data in a different namespace (one option could be [RDF] data), either on a document global level or on a local "per annotation element" level.

Several initiatives of standardizing metadata exist, such as [IMDI] and [CLARIN]. Metadata may contain information on a large spectrum of elements such as: location description (continent, country, address), content type (e.g., genre, task, modalities), session (title, a recording date, a group of participants); each participant may be defined by her role in the session (e.g. annotator, filmer), her name, her social family role, etc.

Examples:

In the following example, the automatic classification for an annotation document was performed by a classifier based on Gaussian Mixture Models (GMM); the speakers of the annotated elements were of different German origins.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
        xmlns:classifiers="http://www.example.com/meta/classify/"
        xmlns:origin="http://www.example.com/meta/local/"
        category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
        version="1.0">
    <info>
        <classifiers:classifier classifiers:name="GMM"/>
    </info>

    <emotion>
        <info><origin:localization value="bavarian"/></info>
        <category name="happiness"/>
    </emotion>

    <emotion>
        <info><origin:localization value="swabian"/></info>
        <category name="sadness"/>
    </emotion>
</emotionml>

The following example uses the IMDI metadata language to represent information about the annotator who produced the emotion annotation in the current document, in a global <info> element.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"  
           xmlns:imdi="http://www.mpi.nl/IMDI/Schema/IMDI"
           version="1.0">
<info>
  <imdi:Actors>
      <imdi:Actor>
           <imdi:Role>Annotator</imdi:Role>
           <imdi:Name>John</imdi:Name>
           <imdi:FullName>John Smith Junior</imdi:FullName>
           <imdi:Code>JS</imdi:Code>
           <imdi:FamilySocialRole>Teacher</imdi:FamilySocialRole>
          ...
      </imdi:Actor>
  </imdi:Actors>
</info>
 ...
<emotion>...</emotion>
<emotion>...</emotion>
</emotionml>

The following example illustrates how <info> can be used for annotating information on sensors through which an affective signal has been detected. In the global <info> section, the sensors used in the particular scenario are specified. Apart from their ID, information on the modality observed by this sensor is provided as well as information on the confidence for that sensor. In this example, the modality "posture" is observed by a camera and a chair equipped with pressure sensors. For some reason it is decided that emotion estimates based on camera data should be trusted more than those based on chair data. Within the <emotion> elements, <info> is used to specify which sensor has been used to calculate the actual emotion value.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
    xmlns:sensors="http://www.example.com/meta/sensors/"
    category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"
    version="1.0">

  <info>
    <sensors:sensor id="camera1" confidence="0.9" expressed-through="posture"/>
    <sensors:sensor id="chair" confidence="0.3" expressed-through="posture"/>
    ...
  </info>

  <emotion expressed-through="posture">
    <info>
      <sensors:sensor idref="camera1"/>
    </info>
    <category name="angry"/>
  </emotion>

  <emotion expressed-through="posture">
    <info>
      <sensors:sensor idref="chair"/>
    </info>
    <category name="neutral"/>
  </emotion>

</emotionml>

2.4 References and time

2.4.1 The `<reference>` element

Annotation	`<reference>`
Definition	References may be used to relate the emotion annotation to the "rest of the world", more specifically to the emotional expression, the experiencing subject, the trigger, and the target of the emotion.
Children	None
Attributes	Required: `uri`, a URI identifying the actual reference target. The attribute MUST be of type `xsd:anyURI`. The URI MAY be extended by a media fragment, as explained in section 2.4.2.4. Optional: `role`, the type of relation between the emotion and the external item referred to; the value MUST be one of "`expressedBy`" (default), "`experiencedBy`", "`triggeredBy`", "`targetedAt`". `media-type`, an attribute of type `xsd:string` holding the MIME type of the data that the `uri` attribute points to.
Occurrence	Multiple `<reference>` elements MAY occur as children of `<emotion>`.

A <reference> element provides a link to media as a URI [RFC 3986]. The semantics of references are described by the role attribute which, if present, MUST have one of four values:

"expressedBy" indicates that the reference points to observable behavior expressing the emotion. This is the default value if the role attribute is not explicitly stated;
"experiencedBy" indicates that the reference points to the subject experiencing the emotion;
"triggeredBy" indicates that the reference points to an emotion-eliciting event that caused an emotion and/or related appraisals;
"targetedAt" indicates that the reference points to an object towards which an emotion-related action, or action tendency, is directed.

For reference targets representing a period of time, start and end time MAY be denoted by using the media fragments syntax, as explained in section 2.4.2.4.

The media-type attribute MAY be used to differentiate between different media types such as audio, video, text, etc.

There is no restriction regarding the number of <reference> elements that MAY occur as children of <emotion>.

Examples:

The following example illustrates the reference to two different URIs having a different role with respect to the emotion: one reference points to the emotion's expression, a video clip showing a user expressing the emotion; the other reference points to the trigger that caused the emotion, in this case another video clip that was seen by the person who expressed the emotion.

<emotion ... >
    ...
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/>
    <reference uri="http://www.example.com/events/e12.xml" role="triggeredBy"/>
</emotion>

Several references may follow as children of one <emotion> tag, even having the same role; for example, the following annotation refers to a portion of a video and to physiological sensor data, both of which expressed the emotion:

<emotion ... >
    ...
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/>
    <reference uri="http://www.example.com/data/physio/ph7.txt" role="expressedBy"/>
</emotion>

It is possible to explicitly indicate the MIME type of the item that the reference refers to:

<emotion ... >
    ...
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" media-type="video/mp4"/>
</emotion>

2.4.2 Timestamps

2.4.2.1 Absolute time

Annotation	`start`, `end`
Definition	Attributes to denote the starting and ending absolute times. They MUST be of type `xsd:nonNegativeInteger` and indicate the number of milliseconds since 1 January 1970 00:00:00 GMT.
Occurrence	The attributes MAY occur inside an `<emotion>` element.

start and end attributes denote the absolute starting and ending times at which an emotion or related state happened. This might be used for example with an "emotional diary" application.

Examples:

In the following example, the emotion category "surprise" is annotated, immediately followed by the category "happiness". The start and end attributes specify for each emotion element the absolute beginning and ending times.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
         start="1268647332000" end="1268647334000">
    <category name="surprise"/>
</emotion>
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
         start="1268647334001" end="1268647336000">
    <category name="happiness"/>
</emotion>

The end value MUST be greater than or equal to the start value.

The ECMAScript Date object's getTime() function is a way to determine the absolute time.

2.4.2.2 Duration

Annotation	`duration`
Definition	Attribute of type `xsd:nonNegativeInteger`, defaulting to zero. It specifies the duration of the event in milliseconds.
Occurrence	This attribute MAY occur inside an `<emotion>` element.

The duration of an input in milliseconds MAY be specified with the duration attribute. The duration attribute MAY be used either in combination with the start or offset-to-start attribute or independently.

A start or offset-to-start attribute together with the duration attribute set to zero MAY be used to indicate a single timestamp on a time axis.

Note that the specification doesn't enforce consistency. Although it would be redundant, the presence of both duration and end attributes is possible. If this leads to inconsistency, the responsibility lies with the EmotionML producer.

Examples:

In the following example, the start and duration of the emotion category "surprise" are annotated:

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
         start="1268647200000" duration="130">
    <category name="surprise"/>
</emotion>

2.4.2.3 Relative time

Annotation	`time-ref-uri`
Definition	Attribute of type `xsd:anyURI` indicating the URI used to anchor the relative timestamp.
Occurrence	This attribute MAY occur inside an `<emotion>` element.
Annotation	`time-ref-anchor-point`
Definition	Attribute with a value of `start` or `end`, defaulting to `start`. It indicates whether to measure the time from the start or end of the interval designated with `time-ref-uri`.
Occurrence	This attribute MAY occur inside an `<emotion>` element.
Annotation	`offset-to-start`
Definition	Attribute of type `xsd:integer`, defaulting to zero. It specifies the offset in milliseconds for the start of input from the anchor point designated with `time-ref-uri` and `time-ref-anchor-point`.
Occurrence	This attribute MAY occur inside an `<emotion>` element.

Relative timestamps define the start of an input relative to the start or end of a reference interval such as another input.

The reference interval is designated with time-ref-uri attribute. This MAY be combined with time-ref-anchor-point attribute to specify whether the anchor point is the start or end of this interval. The start of an input relative to this anchor point is then specified with offset-to-start attribute.

The time-ref-uri attribute can point to a custom-defined timestamp or can be, for example, a session identifier.

Examples:

Here is an example where the emotion "surprise" occurs two seconds after the reference time point:

<emotion id="referenceAnger" category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
     start="1268647332000" end="1268647334000">
  <category name="anger"/>


<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
         time-ref-uri="#referenceAnger" offset-to-start="2000">
    <category name="surprise"/>
</emotion>

It is possible to mix absolute and relative time points in an emotionml statement. An emotional diary application for example might want to relate emotional statements to events that are marked up somewhere else. Note that a time consistency check is not part of the specification:

<exampleXMLElementWithId id="specialEvent"/>


<emotion id="yesterday" category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
     start="1268647332000" end="1268647334000" time-ref-uri="#specialEvent" offset-to-start="2000">
  <category name="anger"/>
</emotion>

2.4.2.4 Timing in media

Annotation	`URI fragment: t`
Definition	Attributes to denote start and endpoint of an annotation in a media stream. Allowed values must be conform with the Media Fragments Specification [Media Fragments]
Occurrence	The URI fragment MAY occur in the `uri` attribute of a `<reference>` element.

Temporal clipping is denoted by the name t, and specified as an interval with a begin time and an end time. Either or both may be omitted, with the begin time defaulting to 0 seconds and the end time defaulting to the duration of the source media. The interval is half-open: the begin time is considered part of the interval whereas the end time is considered to be the first time point that is not part of the interval. If a single number only is given, this is the begin time.

Temporal clipping can be specified either as Normal Play Time (npt) [RFC 2326], as SMPTE timecodes, [SMPTE], or as real-world clock time (clock) [RFC 2326]. Begin and end times are always specified in the same format. The format is specified by name, followed by a colon (:), with npt: being the default.

Examples:

In the following example, the emotion category "happiness" is displayed in an audio file called "myAudio.wav" from the 3rd to the 9th second.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="happiness"/>
    <reference uri="myAudio.wav#t=3,9"/>
</emotion>

In the following example, the emotion category "happiness" is displayed in a video file called "myVideo.avi" in SMPTE values, resulting in the time interval [120,121.5).

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="happiness"/>
    <reference uri="myVideo.avi#t=smpte-30:0:02:00,0:02:01:15"/>
</emotion>

A last example states this in a video file in real-world clock time code, as a 1 min interval on 26th Jul 2009 from 11hrs, 19min, 1sec.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="happiness"/>
    <reference uri="myVideo.avi#t=clock:2009-07-26T11:19:01Z,2009-07-26T11:20:01Z"/>
</emotion>

2.5 Scale values

Scale values are needed to represent content in <category>, <dimension>, <appraisal> and <action-tendency> elements, as well as in confidence.

Representations of scale values can be static or dynamic. A static, constant scale value is represented using the value attribute; for dynamic values, their evolution over time is expressed using the <trace> element.

2.5.1 The `value` attribute

Annotation	`value`
Definition	Representation of a static scale value. The value of a `value` attribute MUST be a floating point value from the closed interval [0, 1].
Occurrence	The `<dimension>` element MUST contain either a `value` attribute or a `<trace>` element; `<category>`, `<appraisal>` and `<action-tendency>` MAY contain either a `value` attribute or a `<trace>` element.

The value attribute represents a static scale value of the enclosing element.

Conceptually, a scale can represent concepts that vary from "nothing" to "a lot" (unipolar scales), or concepts that vary between two opposites, from "very negative" to "very positive" (bipolar scales). Both are represented in EmotionML using floating point values from the closed interval [0, 1]. The min and max values of the scale SHOULD be interpreted as the extreme values, for both unipolar and bipolar scales. For example in a <category>, a value="0" SHOULD be interpreted to mean absolutely no emotion (emotionless); a value="1.0" SHOULD be interpreted to mean emotion at maximum intensity (pure uncontrolled emotion). For bipolar scales, such as the valence dimension, a value of 0 represents the most negative possible value, whereas a value of 1 represents the most positive value possible. The neutral middle point of the scale is at 0.5.

Here are several examples for the usage of scales with EmotionML.

<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#fsre-dimensions">
    <dimension name="arousal" value="0.4"/> <!-- a bit less than average arousal -->
    <dimension name="valence" value="0.6"/> <!-- a bit above average valence -->
</emotion>

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
    <category name="angry" value="0.5"/> <!-- anger at medium intensity -->
</emotion>

<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">
    <appraisal name="suddenness" value="0.9"/> <!-- appraisal as a very sudden event -->
</emotion>

<emotion action-tendency-set="http://www.w3.org/TR/emotion-voc/xml#frijda-action-tendencies">
    <action-tendency name="approach" value="0.3"/> <!-- a rather weak tendency to approach -->
</emotion>

Further examples of the value attribute can be found in the context of the <category>, <dimension>, <appraisal> and <action-tendency> elements.

2.5.2 The `<trace>` element

Annotation	`<trace>`
Definition	Representation of the time evolution of a dynamic scale value.
Children	None
Attributes	Required: `freq`, a sampling frequency in Hz. The value of this attribute MUST be a positive floating point number, formatted e.g. "10" or "10.5", followed by optional blank followed by "Hz". `samples`, a space-separated list of floating point values from the closed interval [0, 1] representing the scale value of the enclosing element as it changes over time.
Occurrence	The `<dimension>` element MUST contain either a `value` attribute or a `<trace>` element; `<category>`, `<appraisal>` and `<action-tendency>` MAY contain either a `value` attribute or a `<trace>` element.

A <trace> element represents the time course of a scale value.

The freq attribute indicates the sampling frequency at which the values listed in the samples attribute are given.

NOTE: The <trace> representation requires a periodic sampling of values. In order to represent values that are sampled aperiodically, separate <emotion> annotations with appropriate timing information and individual value attributes may be used.

Examples:

The following example illustrates the use of a trace to represent an episode of fear during which the emotion's intensity is rising, first gradually, then quickly to a very high value. Values are taken at a sampling frequency of 10 Hz, i.e. one value every 100 ms.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
  <category name="fear">
    <trace freq="10Hz"
           samples="0.1 0.1 0.15 0.2 0.2 0.25 0.25 0.25 0.3 0.3 0.35 0.5 0.7 0.8 0.85 0.85"/>
  </category>
</emotion>

The following example combines a trace of the appraisal "suddenness" with a global confidence that the values represent the facts properly. There is a sudden peak of suddenness; the annotator is reasonably certain that the annotation is correct:

<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">
  <appraisal name="suddenness" confidence="0.75">
    <trace freq="10Hz" samples="0.1 0.1 0.1 0.1 0.1 0.7 0.8 0.8 0.8 0.8 0.4 0.2 0.1 0.1 0.1"/>
  </appraisal>
</emotion>

3 Defining vocabularies for representing emotions

EmotionML markup MUST refer to one or more vocabularies to be used for representing emotion-related states, as specified in the context of the <emotionml> and <emotion> elements. Due to the lack of agreement in the community, the EmotionML specification does not preview a single default set which should apply if no set is indicated. Instead, the user MUST explicitly state the set of descriptor names used.

The document [Vocabularies for EmotionML] provides a number of emotion vocabularies which are likely to be of general interest. In order to promote interoperability, users SHOULD verify if one of the vocabularies defined in that document is suitable for their application. If that is not the case, users can define their own custom vocabularies as defined in the present section.

3.1 Mechanism for defining vocabularies

The syntax for defining emotion vocabularies is based on the element <vocabulary> and its child <item>.

3.1.1 The `<vocabulary>` element

Annotation	`<vocabulary>`
Definition	Contains the definition of an emotion vocabulary.
Children	A `<vocabulary>` element MUST contain one or more `<item>` elements. A `<vocabulary>` element MAY contain a single `<info>` element, providing arbitrary metadata about the vocabulary itself.
Attributes	Required: `type`, MUST be one of "`category`", "`dimension`", "`appraisal`" or "`action-tendency`". `id`, a unique vocabulary identifier of type `xsd:ID`.
Occurrence	One or more `<vocabulary>` elements MAY occur as direct children of an `<emotionml>` element.

Vocabulary definitions, when present, occur as direct children of the document root element <emotionml>. It is possible to refer to a vocabulary defined in the same or in a separate EmotionML document, through URIs specified by the values of the attributes category-set, dimension-set, appraisal-set and action-tendency-set of the <emotion> and <emotionml> elements.

The value of the type attribute explicitly states whether the vocabulary represents category names, dimension elements, appraisal elements or action tendency elements.

3.1.2 The `<item>` element

Annotation	`<item>`
Definition	Represents the definition of one vocabulary item, associated with a value which can be used in the "name" attribute of `<category>`, `<dimension>`, `<appraisal>` or `<action-tendency>` (depending on the type of vocabulary being defined).
Children	An `<item>` element MAY contain a single `<info>` element, providing arbitrary metadata about the vocabulary item.
Attributes	Required: `name`: a name for the item, used to refer to this item. An `<item>` MUST NOT have the same name as any other `<item>` within the same `<vocabulary>`.
Occurrence	One or more `<item>` elements occur as direct children of a `<vocabulary>` element.

An <item> represents the definition of one vocabulary item. A <vocabulary> MUST contain at least one <item> element.

Examples:

In the following example, three vocabularies are wrapped into a single EmotionML document. Their id attributes are: "big6", "fsre-dimensions" and "frijda-subset". They are used to represent categories, dimensions and action tendencies respectively. The first <emotion> element specifies the emotion vocabularies used through the attributes category-set and action-tendency-set, while the second <emotion> element uses the attribute dimension-set.

<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml">
   
    <!-- Vocabulary definitions -->
   
    <vocabulary type="category" id="big6">
        <item name="anger"/>
        <item name="disgust"/>
        <item name="fear"/>
        <item name="happiness"/>
        <item name="sadness"/>
        <item name="surprise"/>
    </vocabulary>

    <vocabulary type="dimension" id="fsre-dimensions">
        <item name="valence"/>
        <item name="potency"/>
        <item name="arousal"/>
        <item name="unpredictability"/>
    </vocabulary>

    <vocabulary type="action-tendency" id="frijda-subset">
        <item name="approach"/>
        <item name="avoidance"/>
        <item name="rejecting"/>
    </vocabulary>

    <!-- Emotion elements -->
   
    <emotion category-set="#big6" action-tendency-set="#frijda-subset">
        <category name="fear"/>
        <action-tendency name="approach" value="0.0"/>
        <action-tendency name="avoidance" value="0.9"/>
    </emotion>

    <emotion dimension-set="#fsre-dimensions">
        <dimension name="arousal" value="0.3"/>
    </emotion>

</emotionml>

3.2 Mechanism for referring to vocabularies

EmotionML refers to emotion vocabularies using the category-set, dimension-set, appraisal-set and action-tendency-set attributes of <emotion> and <emotionml>. A vocabulary can be referred to using the Fragment Identifier syntax described in [RFC 3023]. As described in the [XPointer Framework] Shorthand Pointer notation, the value of the vocabulary's id attribute is to be used as the fragment identifier.

The following example demonstrates the use of this notation. Note that the EmotionML document residing at http://www.w3.org/TR/emotion-voc/xml defines an emotion category vocabulary with an attribute id="big6".

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">

4 Conformance

4.1 EmotionML namespace

The EmotionML namespace is "http://www.w3.org/2009/10/emotionml". All EmotionML elements MUST use this namespace.

4.2 Use with other namespaces

The EmotionML namespace is intended to be used with other XML namespaces as per the Namespaces in XML Recommendation (1.0 [XML-NS10] or 1.1 [XML-NS11], depending on the version of XML being used).

4.3 Schema validation and processor validation of EmotionML documents

The EmotionML schema is designed to validate the structural integrity of an EmotionML document or document fragment, but cannot verify whether the emotion descriptors used in the name attribute of <category>, <dimension>, <appraisal> and <action-tendency> are consistent with the vocabularies indicated in the respective category-set, dimension-set, appraisal-set and action-tendency-set attributes.

It is the responsibility of an EmotionML processor to verify that the use of descriptor names and values is consistent with the vocabulary definition.

There are also a variety of test assertions that can not be validated by the schema but lie in the reponsibility ot the EmotionML producer.

4.4 Conforming EmotionML Documents

A document is a Conforming Stand-Alone EmotionML Document if it meets both the following conditions:

It is a well-formed XML document [XML §2.1] conforming to Namespaces in XML [XMLNS-10].
It is a valid XML document [XML §2.8] which adheres to the specification described in this document (EmotionML Specification) including the constraints expressed in the Schema (see Appendix A) and having an emotionml root element as specified in Section 2.1.

4.5 Conforming EmotionML Fragments

A document fragment is a Conforming EmotionML Fragment if it conforms to the criteria for Conforming EmotionML Documents after adding a surrounding emotionml root element.

4.6 Conforming EmotionML Processors

A Conforming EmotionML Processor must correctly understand and apply the semantics of each markup element as described by this document.

There is, however, no conformance requirement with respect to performance characteristics of the EmotionML Processor. For instance, no statement is required regarding the accuracy, speed or other characteristics of output produced by the processor. No statement is made regarding the size of input that an EmotionML Processor is required to support.

4.6.1 Conforming EmotionML Producers

An EmotionML Producer is an EmotionML Processor that can produce Conforming EmotionML Documents.

4.6.2 Conforming EmotionML Consumers

An EmotionML Consumer is an EmotionML Processor that can parse and process Conforming EmotionML Documents.

When a Conforming EmotionML Consumer encounters markup that it cannot interpret, e.g. misspelled element names or attribute values like category or dimension set values that it doesn't know about, it may:

ignore the non-standard elements and/or attributes
or, process the non-standard elements and/or attributes
or, reject the document containing those elements and/or attributes

5 Examples

This section is informative.

5.1 Examples of emotion annotation

5.1.1 Manual annotation of emotional material

Annotation of text

A part of Lewis Carroll's "Alice's Adventures in Wonderland" gets annotated with emotions.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
    category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
   <info>
      <meta:doc>Example adapted from (Zhang, Black & Sproat 2003) http://www.cs.cmu.edu/~awb/papers/eurospeech2003/esper.pdf
      </meta:doc>
   </info>

<emotion>
  <category name="Disgust" value="0.82"/> 
    ‘Come, there’s no use in crying like that!’
</emotion> 
  said Alice to herself rather sharply;
<emotion>
  <category name="Anger" value="0.57"/> 
   ‘I advise you to leave off this minute!’ 
</emotion>
</emotionml>

Note that, although the text may be scattered, each statement applies to the whole text, for example in the following "Peter" is angry and disgusted at the same time:

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
    category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">

<emotion>
  <category name="Disgust" value="0.82"/> 
Peter was angry
  <category name="Anger" value="0.57"/> 
and disgusted.
</emotion>
</emotionml>

Annotation of static images

An image gets annotated with several emotion categories at the same time, but different intensities.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/hall-matsumoto-emotions.xml">
   <info>
      <meta:media-type>image</meta:media-type>
      <meta:media-id>disgust</meta:media-id>
      <meta:media-set>JACFEE-database</meta:media-set>
      <meta:doc>Example adapted from (Hall & Matsumoto 2004) http://davidmatsumoto.com/content/2004%20hall%20and%20matsumoto.pdf
      </meta:doc>
   </info>

   <emotion>
       <category name="Disgust" value="0.82"/>
       <category name="Contempt" value="0.35"/>
       <category name="Anger" value="0.12"/>
       <category name="Surprise" value="0.53"/>
   </emotion>
</emotionml>

Annotation of videos

Example 1: Annotation of a whole video: several emotions are annotated with different intensities.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/humaine-database-labels.xml">
    <info>
        <meta:media-type>video</meta:media-type>
        <meta:media-name>ed1_4</meta:media-name>
        <meta:media-set>humaine database</meta:media-set>
        <meta:coder-set>JM-AB-UH</meta:coder-set>
    </info>
    <emotion>
        <category name="Amusement" value="0.52"/>
        <category name="Irritation" value="0.63"/>
        <category name="Relaxed" value="0.02"/>
        <category name="Frustration" value="0.87"/>
        <category name="Calm" value="0.21"/>
        <category name="Friendliness" value="0.28"/>
    </emotion>
</emotionml>

Example 2: Annotation of a video segment, where two emotions are annotated for overlapping but not identical timespans.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/emotv-labels.xml">
    <info>
        <meta:media-type>video</meta:media-type>
        <meta:media-name>ext-03</meta:media-name>
        <meta:media-set>EmoTV</meta:media-set>
        <meta:coder>4</meta:coder>
    </info>

    <emotion>
        <category name="irritation" value="0.46"/>
        <reference uri="file:ext03.avi?t=3.24,15.4">
    </emotion>
    <emotion>
        <category name="despair" value="0.48"/>
        <reference uri="file:ext03.avi?t=5.15,17.9"/>
    </emotion>
</emotionml>

5.1.2 Automatic recognition of emotions

This example shows how automatically annotated data from three affective sensor devices might be stored or communicated.

It shows an excerpt of an episode experienced on 23 November 2001 from 14:36 onwards (absolute start time is 1006526160000 milliseconds since 1 January 1970 00:00:00 GMT). Each device detects an emotion, but at slightly different times and for different durations.

The next entry of observed emotions occurs about 6 minutes later (absolute start time is 1006526520000 milliseconds since 1 January 1970 00:00:00 GMT). Only the physiology sensor has detected a short glimpse of anger, for the visual and IR camera it was below their individual threshold so no entry from them.

For simplicity, all devices use categorical annotations and the same set of categories. Obviously it would be possible, and even likely, that different devices from different manufacturers provide their data annotated with different emotion sets.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
    category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
 ...
<emotion start="1006526160000" expressed-through="face">
  <!--the first modality detects excitement.
      It is a camera observing the face. A URI to the database
      is provided to access the video stream.-->
  <category name="excited"/>
  <reference uri="http://www.example.com/facedb#t=26,98"/>
</emotion>

<emotion start="1006526160000" expressed-through="facial-skin-color">
  <!--the second modality detects anger. It is an IR camera
      observing the face. A URI to the database
      is provided to access the video stream.-->
  <category name="angry"/>
  <reference uri="http://www.example.com/skindb#t=23,108"/>
</emotion>

<emotion start="1006526160000" expressed-through="physiology">
  <!--the third modality detects excitement again. It is a
      wearable device monitoring physiological changes in the
      body. A URI to the database
      is provided to access the data stream.-->
  <category name="excited"/>
  <reference uri="http://www.example.com/physiodb#t=19,101"/>
</emotion>

<emotion start="1006526520000" expressed-through="physiology">
  <category name="angry"/>
  <reference uri="http://www.example.com/physiodb2#t=2,6"/>
</emotion>
 ...
</emotionml>

Note that handling of complex emotions is not explicitly specified. This example assumes that parallel occurrences of emotions will be determined on the time stamp.

5.1.3 Generation of emotion-related system behavior

Generation of facial expressions in an MPEG-4 face model

The MPEG-4 standard offers 68 parameters, called Facial Animation Parameters FAPs, to animate a 3D facial model. 66 of these parameters correspond to low level parameters. These parameters act on the facial feature points defining a 3D facial model. They specify how these feature points are displaced. They simulate muscular contraction. On the other hand, two FAPs, namely FAP1 and FAP2, refer respectively to viseme and expression. FAP2 corresponds to one of the six basic facial expressions (anger, disgust, fear, happiness, sadness and surprise). The expressions associated to the six emotions are defined by textual descriptions [Ostermann, 2002].

In emotion theory, the idea of mixing emotions to create new emotions is disputed. For the purposes of facial expression modeling, however, it is possible to simulate different emotions as linear combinations of the six basic facial expressions. MPEG-4 allows the linear combination of any two of these expressions: emotion_1 * intensity_1 + emotion_2 * intensity_2. For example, [Raouzaiou et al., 2005] found that the expressions of depression and guilt can be obtained by combinations of fear and sadness with different intensities, while the expression of suspicion is obtained by combining anger and disgust.

In EmotionML it is possible to represent the emotional input to an MPEG-4 based facial animation system using multiple <category> elements, for example as follows.

<emotion xmlns="http://www.w3.org/2009/10/emotionml"
         category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
  <!-- attempt to express suspicion as a combination of anger and disgust -->
  <category name="anger" value="0.5"/>
  <category name="disgust" value="0.3"/>
</emotion>

Generation of robot behavior

The following example describes various aspects of an emotionally competent robot whose battery is nearly empty. The robot is in a global state of high arousal, negative pleasure and low dominance, i.e. a negative state of distress paired with some urgency but quite limited power to influence the situation. It has a tendency to seek a recharge and to avoid picking up boxes. However, sensor data displays an unexpected obstacle on the way to the charging station. This triggers planning of expressive behavior of frowning. The annotations are grouped into a stand-alone EmotionML document here; in the real world, the various aspects would more likely be embedded into different specialized markup in various parts of the Robot architecture.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata">
    <info>
        <meta:name>robbie the robot example</meta:name>
    </info>

    <!-- Robot's current global state configuration: negative, active, powerless -->
    <emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions">
        <dimension name="pleasure" value="0.2"/>
        <dimension name="arousal" value="0.8"/>
        <dimension name="dominance" value="0.3"/>
    </emotion>

    <!-- Robot's action tendencies: want to recharge -->
    <emotion action-tendency-set="http://www.example.com/custom/action/robot.xml">
        <action-tendency name="charge-battery" value="0.9"/>
        <action-tendency name="seek-shelter" value="0.7"/>
        <action-tendency name="pickup-boxes" value="0.1"/>
    </emotion>

    <!-- Appraised value of incoming event: obstacle detected, appraised as novel and unpleasant -->
    <emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">
        <appraisal name="suddenness" value="0.8" confidence="0.4"/>
        <appraisal name="intrinsic-pleasantness" value="0.2" confidence="0.8"/>
        <reference role="triggeredBy" uri="file:scannerdata.xml#obstacle27"/>
    </emotion>

    <!-- Robot's planned facial gestures: will frown -->
    <emotion category-set="http://www.example.com/custom/robot-emotions.xml"
        expressed-through="face">
        <category name="frustration"/>
        <reference role="expressedBy" uri="file:behavior-repository.xml#frown"/>
    </emotion>
</emotionml>

5.2 Examples of possible use with other markup languages

One intended use of EmotionML is as a plug-in for existing markup languages. For compatibility with text-annotating markup languages such as SSML, EmotionML avoids the use of text nodes. All EmotionML information is encoded in element and attribute structures.

This section illustrates the concept using three existing W3C markup languages: EMMA, SSML, and SMIL.

5.2.1 Use with EMMA

EMMA is made for representing arbitrary analysis results; one of them could be the emotional state. The following example represents an analysis of a non-verbal vocalization; its emotion is described as a low-intensity state, maybe boredom.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"
        xmlns="http://www.w3.org/2009/10/emotionml">
    <emma:interpretation emma:start="1245790094000" emma:end="1245790095000" emma:mode="voice" emma:verbal="false">

        <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
            <category name="bored" value="0.1" confidence="0.1"/>
        </emotion>

    </emma:interpretation>
</emma:emma>

In the following example, the EMMA <emma:derivation> element is used to represent multiple emotion interpretations associated with audio and video media sources. The first and the third interpretations specify the same emotion category, "content", while the result of the second one is "amused". The consolidated emotion is the result of some processing made on the interpretations included in the derivation element. In this case it is "content", which is the most frequent category within the available interpretations.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2009/10/emotionml">
 
    <emma:derivation>

        <emma:interpretation id="text1" emma:start="1245790094000" emma:end="1245790095000" emma:mode="voice"
                emma:verbal="true" emma:signal="http://example.com/signals/emo123.wav"  
                emma:process="http://example.com/text_analysis.xml">
            <emma:literal>I feel happy</emma:literal>
            <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
                <category name="content" value="0.7" confidence="0.7"/>
            </emotion>
        </emma:interpretation>

        <emma:interpretation id="voice1" emma:start="1245790094000" emma:end="1245790095000" emma:mode="voice"
                emma:verbal="false" emma:signal="http://example.com/signals/emo123.wav"
                emma:process="http://example.com/voice_analysis.xml">
            <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
                <category name="amused" value="0.4" confidence="0.5"/>
            </emotion>
        </emma:interpretation>

        <emma:interpretation id="video1" emma:start="1245790090000" emma:end="1245790100000" emma:mode="video"
                emma:verbal="false" emma:signal="http://example.com/signals/emo123.mpg"
                emma:process="http://example.com/video_analysis.xml">
            <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
                <category name="content" value="0.5" confidence="0.7"/>
            </emotion>
        </emma:interpretation>
  
    </emma:derivation>


    <emma:interpretation id="multimodal1" emma:start="1245790094000" emma:end="1245790100000"
            emma:medium="acoustic visual" emma:mode="voice video">
        <emma:derived-from resource="#text1" composite="true"/>
        <emma:derived-from resource="#voice1" composite="true"/>
        <emma:derived-from resource="#video1" composite="true"/>
        <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
            <category name="content" value="0.6" confidence="0.7"/>
        </emotion>
    </emma:interpretation>

</emma:emma>

5.2.2 Use with SSML

EmotionML could be used with the Speech Synthesis Markup Language SSML as follows.

It is possible with [SSML 1.1] to use arbitrary markup belonging to a different namespace anywhere in an SSML document; only SSML processors that support the markup would take it into account. Therefore, it is possible to insert EmotionML below, for example, an <s> element representing a sentence; the intended meaning is that the enclosing sentence should be spoken with the given emotion, in this case a moderately worried tone of voice:

<?xml version="1.0"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:emo="http://www.w3.org/2009/10/emotionml"
         xml:lang="en-US">
    <s>
        <emo:emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
            <emo:category name="worried" value="0.4"/>
        </emo:emotion>

        Do you need help?
    </s>
</speak>

5.2.3 Use with SMIL

Using EmotionML for the use case of generating system behavior requires elements of scheduling and surface form realization which are not part of EmotionML. Necessarily, this use case relies on other languages to provide the needed functionality. This is in line with the aim of EmotionML to serve as a specialized plug-in language.

This example illustrates the idea in terms of a simplified version of a storytelling application. A virtual agent tells a story using voice and facial animation. The expression in face and voice is influenced by the rendering engine in terms of EmotionML. The engine in this example uses SMIL [SMIL] for defining the temporal relation between events; EmotionML is used via SMIL's generic <ref> element. In general it is the engine which knows how to render the emotion in the virtual agent's expressive capabilities. To override this, the second <emotion> contains an explicit request to realize the emotional expression using both face and voice modalities.

ridinghood.smil:

<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0">
  <head> ... </head>
  <body>
    <par duration="8s">
      <img src="file:forest.jpg"/>
      <smileText>The little girl was enjoying the walk in the forest.</smileText>
      <ref src="file:ridinghood.emotionml#emotion1"/>
    </par>
    <par duration="5s">
      <img src="file:wolf.jpg"/>
      <smileText>Suddenly a dark shadow appeared in front of her.</smileText>
      <ref src="file:ridinghood.emotionml#emotion2"/>
    </par>

  </body>
</smil>

ridinghood.emotionml:

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
    category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"
    appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">

  <emotion id="emotion1">
    <category name="content" value="0.7"/>
  </emotion>

  <emotion id="emotion2" expressed-through="face voice">
    <category name="afraid" value="0.9"/>
    <appraisal name="suddenness" value="0.9"/>
    <appraisal name="intrinsic-pleasantness" value="0.1"/>
  </emotion>
</emotionml>

Similar principles for decoupling emotion markup from the temporal organization of generating system behavior can be applied using other representations, including interactive setups.

6 References

6.1 Normative references

EMMA: EMMA: Extensible MultiModal Annotation markup language version 1.0, Michael Johnston, Editor. World Wide Web Consortium, W3C Recommendation 10 February 2009.
Media Fragments URI: Media Fragments URI 1.0, Raphaël Troncy et al., Editors. World Wide Web Consortium, W3C Proposed Recommendation 15 March 2012.
RDF: RDF/XML Syntax Specification (Revised), Dave Beckett, Editor. World Wide Web Consortium, W3C Recommendation 10 February 2004.
RFC 2119: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, Editor. IETF RFC 2119, March 1997.
RFC 2326: Real Time Streaming Protocol (RTSP), H. Schulzrinne et al., Editors. IETF RFC 2326, April 1998.
RFC 3023: XML Media Types, M. Murata et al., Editors. IETF RFC 3023, January 2001.
RFC 3986: Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee et al., Editors. IETF RFC 3986, January 2005.
SMIL: Synchronized Multimedia Integration Language (SMIL) Version 3.0, Dick Bulterman et al., Editors. W3C Recommendation, 1 December 2008.
SMPTE: SMPTE RP 136 Time and Control Codes for 24, 25 or 30 Frame-Per-Second Motion-Picture Systems.
SSML: Speech Synthesis Markup Language (SSML) Version 1.0, Daniel C. Burnett, et al., Editors. World Wide Web Consortium, W3C Recommendation, 7 September 2004.
SSML 1.1: Speech Synthesis Markup Language (SSML) Version 1.1, Daniel C. Burnett, et al., Editors. World Wide Web Consortium, W3C Recommendation, 7 September 2010.
XML: Extensible Markup Language (XML) 1.0 (Fifth Edition), Tim Bray et al., Editors. World Wide Web Consortium, 26 November 2008. This version of the XML 1.0 Recommendation is http://www.w3.org/TR/2008/REC-xml-20081126/. The latest version of XML 1.0 is available at http://www.w3.org/TR/REC-xml/.
XML-NS10: Namespaces in XML 1.0 (Third Edition), Tim Bray et al., Editors. World Wide Web Consortium, W3C Recommendation, 8 December 2009.
XML-NS11: Namespaces in XML 1.1 (Second Edition), Tim Bray et al., Editors. World Wide Web Consortium, W3C Recommendation, 2006.
XML Schema: XML Schema Part 1: Structures Second Edition, Henry S. Thompson et al., Editors. World Wide Web Consortium, W3C Recommendation, 2004.
XPointer Framework: XPointer Framework, Paul Grosso et al., Editors. World Wide Web Consortium, W3C Recommendation, 2003.

6.2 Informative references

CLARIN: CLARIN Metadata Infrastructure for Language Resources and Technology,Version 5, D. Broeder et al., Editors. Common Language Resources and Technology Infrastructure Report, 4 February 2009.
Emotion Incubator Group: W3C Emotion Incubator Group, M. Schröder, E. Zovato, H. Pirker, C. Peter, F. Burkhardt, Editors. Final Report of the Emotion Incubator Group at the World Wide Web Consortium, 10 July 2007.
Emotion Markup Language Incubator Group: Elements of an EmotionML 1.0, M. Schröder, Editor. Final Report of the Emotion Markup Language Incubator Group at the World Wide Web Consortium, 20 November 2008.
EmotionML Requirements: Emotion Markup Language: Requirements with Priorities. F. Burkhardt and M. Schröder. W3C Incubator Group Report, 13 May 2008.
IMDI: IMDI Editor version 3.2, B. Hellwig and D. van Uytvanck. ISLE Metadata Initiative Report, 19 June 2007.
Ortony et al., 1988: Ortony, A., Clore, G. L., & Collins, A. (1988). The Cognitive Structure of Emotion. Cambridge, UK: Cambridge University Press.
Ostermann, 2002: Ostermann, J. (2002). Face Animation in MPEG-4. In: MPEG-4 Facial Animation - The Standard Implementation and Applications (I.S. Pandzic and R. Forchheimer, eds.), pp. 17-55. England: Wiley.
Raouzaiou et al., 2005: Raouzaiou, A., Spyrou, E., Karpouzis, K. and Kollias, S. (2005). Emotion Synthesis: an Intermediate Expressions’ Generator System in the MPEG-4 Framework. International Workshop VLBV05, 15-16 September 2005, Sardinia, Italy.
Vocabularies for EmotionML: Vocabularies for EmotionML. M. Schröder and C. Pelachaud, Editors. W3C Working Group Note, 10 May 2012.

7 Acknowledgments

The authors wish to acknowledge the contributions by all members of the Multimodal Interaction Working Group, the Emotion Markup Language Incubator Group and the Emotion Incubator Group, as well as the participants to the W3C Workshop on EmotionML, in particular the following persons (in alphabetic order):

Kazuyuki Ashimura, W3C
Andrew Breen, Nuance Communications
Roddy Cowie, Queen's University Belfast
Deborah Dahl, Conversational Technologies
Sarah Jane Delany, Dublin Institute of Technology
Dylan Evans, University College Cork
Nestor Garay Vitoria, University of the Basque Country
Alain Giboin, INRIA Sophia Antipolis
Bill Jarrold, SRI International
Michael Johnston, AT&T
Kostas Karpouzis, Image, Video and Multimedia Systems Lab (IVML-NTUA)
Myriam Lamolle, University of Paris VIII
Tim Llewellyn, nViso
Jean-Claude Martin, CNRS
Alessandro Oltramari, CNR
Hannes Pirker, Austrian Research Institute for Artificial Intelligence
Björn Schuller, Technische Universität München
Jianhua Tao, Chinese Academy of Sciences
Ian Wilson, Emotion AI
Gill Windall, University of Greenwich
Idoia Zearreta, University of the Basque Country

Appendix A. XML Schema

This section is Normative.

This section defines the formal syntax for EmotionML documents in terms of a normative XML Schema.

The latest version of the XML Schema for Conforming Stand-Alone EmotionML Documents is available at http://www.w3.org/TR/emotionml/emotionml.xsd. The latest version of the XML Schema for Conforming EmotionML Document Fragments is available at http://www.w3.org/TR/emotionml/emotionml-fragments.xsd.

For stability it is RECOMMENDED that you use the dated URI available at http://www.w3.org/TR/2014/REC-emotionml-20140522/emotionml.xsd and http://www.w3.org/TR/2014/REC-emotionml-20140522/emotionml-fragments.xsd, respectively.

Appendix B. MIME type

This section is Normative.

This appendix registers a new MIME media type, "application/emotionml+xml".

The "application/emotionml+xml" media type is registered with IANA at http://www.iana.org/assignments/media-types/application/.

B.1 Registration of MIME media type application/emotionml+xml

MIME media type name:

application

MIME subtype name:

emotionml+xml

Required parameters:

None.

Optional parameters:

charset: This parameter has identical semantics to the charset parameter of the application/xml media type as specified in [RFC 3023] or its successor.

Encoding considerations:

By virtue of EmotionML content being XML, it has the same considerations when sent as "application/emotionml+xml" as does XML. See RFC 3023 (or its successor), section 3.2.

Security considerations:

EmotionML elements may include arbitrary URIs. Therefore the security issues of [RFC 3986], section 7, should be considered.

In addition, because of the extensibility features for EmotionML, it is possible that "application/emotionml+xml" will describe content that has security implications beyond those described here. However, if the processor follows only the normative semantics of this specification, this content will be ignored. Only in the case where the processor recognizes and processes the additional content, or where further processing of that content is dispatched to other processors, would security issues potentially arise. And in that case, they would fall outside the domain of this registration document.

Interoperability considerations:

This specification describes processing semantics that dictate the required behavior for dealing with, among other things, unrecognized elements.

Because EmotionML is extensible, conformant "application/emotionml+xml" processors MAY expect that content received is well-formed XML, but processors SHOULD NOT assume that the content is valid EmotionML or expect to recognize all of the elements and attributes in the document.

Published specification:

This media type registration is extracted from Appendix B of the "Emotion Markup Language (EmotionML) 1.0" specification.

Additional information:

Magic number(s):: There is no single initial octet sequence that is always present in EmotionML documents.
File extension(s):: EmotionML documents are most often identified with the extensions ".emotionml".
Macintosh File Type Code(s):: TEXT

Person & email address to contact for further information:

Kazuyuki Ashimura, <[email protected]>.

Intended usage:

COMMON

Restrictions on usage:

None.

Author:

The EmotionML specification is a work product of the World Wide Web Consortium's Multimodal Interaction Working Group.

Change controller:

The W3C has change control over these specifications.

B.2 Fragment Identifiers

For documents labeled as "application/emotionml+xml", the fragment identifier notation is exactly that for "application/xml", as specified in RFC 3023.

Appendix C: Changes

This section is informative.

Changes in the Recommendation

This section summarizes the changes since the Proposed Recommendation of 16 April 2013.

We added clarifications to the following sections:
- 2.1.2: Note on children of <emotion> element
- 2.4.2.2: Note on co-existence of duration and end attributes
- 2.4.2.3: Note on the mixture of absolute and relative time points; The example code is also updated.
- 4.3: Note on schema validation
- 5.1.1: Note on having more than one emotion at the same time; Another example is also added for clarification purposes.
We changed the status of MIME type registration in Appendix B and described the EmotionML MIME type is now registered with IANA.
We fixed a few typos and updated the EmotionML Schema file (emotionml-fragments.xsd).

Note that we fixed several errors and typos in the Vocabularies for EmotionML Working Group Note based on the recent public comments and republished the document.

Changes in the Proposed Recommendation

This section summarizes the changes since the Candidate Recommendation of 10 May 2012.

We changed the prefix of the namespace in all examples consistently to "emo".
We fixed a typo in the example vocabularies from "agnostic" to "agonistic".
We clarified that only blank is optionally allowed before "Hz" in the spec and the xsd
We clarified, that "The value of this attribute MUST be a positive floating point number, formatted e.g. "10" or "10.5", followed by optional blank followed by "Hz".
We replaced "\d+(\.\d*)?\s*Hz" by "\d+(\.\d+)? ?Hz" in the xsd to restrict whitespace on blank.

Changes in the Candidate Recommendation

This section summarizes the changes since the Last Call Working Draft of 07 April 2011.

The use cases motivating the reasons for defining EmotionML now mention several ways how EmotionML can benefit people with disabilities;
the time stamp values used in the examples throughout the document now correctly use milliseconds rather than seconds;
The concept of a 'declared vocabulary' was introduced to clarify the requirements and scope for the references to vocabularies used, such as category-set etc. The term is defined in the glossary and is used in the attribute occurrence definitions for <emotionml> and <emotion> as well as the emotion representations <category>, <dimension>, <appraisal>, and <action-tendency>;
The specification now clarifies the fact that <emotionml> and <emotion> may contain arbitrary text;
the formulations for occurrence constraints and value ranges for several attributes and elements were clarified, notably the version, value, <trace>, confidence, <info>, freq;
Section 3.2 was added to make fully explicit the mechanism to be used for referring to emotion vocabularies;
the Conformance section was expanded to include explicit definitions of conforming EmotionML documents and processors;
An XML Schema for EmotionML was added in Appendix A;
A MIME-type for EmotionML is defined in Appendix B.

Changes in the Last Call Working Draft (07 April 2011)

Changes specific to EmotionML
- The document now distinguishes between normative and non-normative sections.
- A mechanism for defining emotion vocabularies was specified.
- The definitions of emotion vocabularies were moved from the specification into a dedicated W3C Working Draft [Vocabularies for EmotionML].
- The <category> element was harmonized with the other emotion descriptors to allow a value attribute or a <trace> child element indicating the intensity of that category. Multiple <category> elements are now allowed within a single <emotion> to reflect the possible co-presence of these categories. The <intensity> element was removed since the usual use is now covered by the value attribute in <category>.
- The specification of scale values through the value attribute or the <trace> child element was made optional for <appraisal> and <action-tendency> elements, in order to allow for the possibility to merely represent the fact that a certain appraisal or action tendency is present, irrespective of its intensity.
- A mechanism for indicating duration and relative timestamps was added.
- More examples were added to illustrate possible uses of the <info> element for representing metadata.
- An example was added to illustrate the use of EmotionML in the context of predicting mixed emotions in an MPEG-4-based facial animation model.
- An example was added to illustrate the use of EMMA for representing the derivation of a consolidated emotion from individual emotion observations.
- All examples were updated to be consistent with [Vocabularies for EmotionML] where appropriate.

Consistency issues with other W3C specifications
- To avoid confusion with the emma:mode attribute in [EMMA], the modality attribute was renamed to expressed-through.

Changes in Working Draft 2 (29 July 2010)

Changes specific to EmotionML
- A mechanism for pointing to emotion vocabularies was agreed. An emotion vocabulary is now identified by a URI in the attributes category-set, dimension-set, appraisal-set and action-tendency-set of <emotion> or, in the sense of a document-wide default, of <emotionml>. The consistency of an EmotionML annotation with the indicated vocabulary must be verified by an EmotionML processor; it cannot be verified using Schema validation. The section on validation was updated accordingly.
- A collection of emotion vocabularies was compiled which may be useful defaults for many users. The list is incomplete and not fully developed, but is already published in this form to elicit feedback.
- The notion of Scale values was simplified to only allow for continuous values in the range [0;1].
- The notion of a confidence-trace was dropped for simplicity.

Consistency issues with other W3C specifications
- The syntax for representing dimensions, appraisals and action tendencies was changed to be more in line with the expectation that user-defined strings figure in attribute values rather than element or attribute names. The specification now defines <dimension>, <appraisal> and <action-tendency> elements with a name attribute.
- The representation of time was synchronized with EMMA: the specification now uses start and end attributes to represent absolute time, and Media Fragment URIs to refer to portions of media files.
- Metadata is now represented using an <info> element, in synchrony with EMMA.
- The <link> element was renamed to <reference> to avoid a name clash with the <link> element in HTML, which has a different scope and syntax.

Emotion Markup Language (EmotionML) 1.0

W3C Recommendation 22 May 2014

Abstract

Status of this document

Conventions of this document

Table of Contents

1 Introduction

1.1 Reasons for defining an Emotion Markup Language

1.2 The challenge of defining a generally usable Emotion Markup Language

1.3 Glossary of terms

2 Elements of Emotion Markup

2.1 Document structure

2.1.1 Document root: The <emotionml> element

2.1.2 A single emotion annotation: The <emotion> element

2.2 Representations of emotions and related states

2.2.1 The <category> element

2.2.2 The <dimension> element

2.2.3 The <appraisal> element

2.2.4 The <action-tendency> element

2.3 Meta-information

2.3.1 The confidence attribute

2.3.2 The expressed-through attribute

2.3.3 The <info> element

2.4 References and time

2.4.1 The <reference> element

2.4.2 Timestamps

2.4.2.1 Absolute time

2.4.2.2 Duration

2.4.2.3 Relative time

2.4.2.4 Timing in media

2.5 Scale values

2.5.1 The value attribute

2.5.2 The <trace> element

3 Defining vocabularies for representing emotions

3.1 Mechanism for defining vocabularies

3.1.1 The <vocabulary> element

3.1.2 The <item> element

3.2 Mechanism for referring to vocabularies

4 Conformance

4.1 EmotionML namespace

4.2 Use with other namespaces

4.3 Schema validation and processor validation of EmotionML documents

4.4 Conforming EmotionML Documents

4.5 Conforming EmotionML Fragments

4.6 Conforming EmotionML Processors

4.6.1 Conforming EmotionML Producers

4.6.2 Conforming EmotionML Consumers

5 Examples

5.1 Examples of emotion annotation

5.1.1 Manual annotation of emotional material

Annotation of text

Annotation of static images

Annotation of videos

5.1.2 Automatic recognition of emotions

5.1.3 Generation of emotion-related system behavior

Generation of facial expressions in an MPEG-4 face model

Generation of robot behavior

5.2 Examples of possible use with other markup languages

5.2.1 Use with EMMA

5.2.2 Use with SSML

5.2.3 Use with SMIL

6 References

6.1 Normative references

6.2 Informative references

7 Acknowledgments

Appendix A. XML Schema

Appendix B. MIME type

B.1 Registration of MIME media type application/emotionml+xml

B.2 Fragment Identifiers

Appendix C: Changes

Changes in the Recommendation

Changes in the Proposed Recommendation

Changes in the Candidate Recommendation

Changes in the Last Call Working Draft (07 April 2011)

Changes in Working Draft 2 (29 July 2010)

2.1.1 Document root: The `<emotionml>` element

2.1.2 A single emotion annotation: The `<emotion>` element

2.2.1 The `<category>` element

2.2.2 The `<dimension>` element

2.2.3 The `<appraisal>` element

2.2.4 The `<action-tendency>` element

2.3.1 The `confidence` attribute

2.3.2 The `expressed-through` attribute

2.3.3 The `<info>` element

2.4.1 The `<reference>` element

2.5.1 The `value` attribute

2.5.2 The `<trace>` element

3.1.1 The `<vocabulary>` element

3.1.2 The `<item>` element