W3C

HTML+RDFa

A mechanism for embedding RDF in HTML

W3C Working Draft 04 March 2010

This Version
http://www.w3.org/TR/2010/WD-rdfa-in-html-20100304/
Latest Version
http://www.w3.org/TR/rdfa-in-html/
Previous Versions
http://www.w3.org/TR/2009/WD-rdfa-in-html-20091015/
Authors (alphabetical order):
Ben Adida (Chair, RDFa Task Force, Creative Commons)
Mark Birbeck (Editor, XHTML+RDFa and inventor of RDFa concept, Web Backplane Ltd.)
Shane McCarron (Editor, XHTML+RDFa, Applied Testing and Technology, Inc.)
Steven Pemberton (Chair, XHTML2 and RDFa Task Force member, CWI)
Manu Sporny, (Editor, HTML+RDFa and RDFa Task Force member, Digital Bazaar, Inc.)

Abstract

This specification defines rules and guidelines for adapting the RDF in XHTML: Syntax and Processing (RDFa) specification for use in the HTML5 and XHTML5 members of the HTML family. The rules defined in this specification not only apply to HTML5 documents in non-XML and XML mode, but also to HTML4 and XHTML documents interpreted through the HTML5 parsing rules.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Working Draft of the "HTML+RDFa: A mechanism for embedding RDF in HTML" specification for review by W3C members and other interested parties.

This Working Draft includes the following changes:

If you wish to make comments regarding this document, please send them to [email protected] (subscribe, archives) or to [email protected] (subscribe, archives), or submit them using the W3C's public bug database.

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification should note the status, and are encouraged to join the RDFa Working Group.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The publication of this document by the W3C as a W3C Working Draft does not imply that all of the participants in the W3C HTML working group endorse the contents of the specification. Indeed, for any section of the specification, one can usually find many members of the working group or of the W3C as a whole who object strongly to the current text, the existence of the section at all, or the idea that the working group should even spend time discussing the concept of that section.

The latest stable version of the editor's draft of this specification is always available on the W3C CVS server. The latest editor's working copy (which may contain unfinished text in the process of being prepared) is also available. A Diff-marked version is available.

This specification has been jointly developed by the RDFa Task Force and the HTML Working Group and is currently being published by the HTML Working Group to further discussions there.

This specification is an extension to the HTML5 language. All normative content in the HTML5 specification, unless specifically overridden by this specification, is intended to be the basis for this specification.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of contents

  1. 1 Introduction
    1. 1.1 History
  2. 2 Parsing Model
    1. 2.1 Modifying the Input Document
  3. 3 Conformance Requirements
    1. 3.1 Document Conformance
    2. 3.2 User Agent Conformance
    3. 3.3 RDFa Processor Conformance
  4. 4 Modifications to XHTML+RDFa
    1. 4.1 Specifying the language for a literal
    2. 4.2 Invalid XMLLiteral values
    3. 4.3 xmlns:-Prefixed Attributes
  5. 5 Extensions to the HTML5 Syntax
    1. 5.1 The RDFa Attributes and Valid Values
    2. 5.2 Conformance Criteria for xmlns:-Prefixed Attributes
    3. 5.3 Preserving Namespaces via Coercion to Infoset
  6. 6 Infoset-based Processors
    1. 6.1 Processing Namespaced RDFa Attributes
    2. 6.2 Processing RDFa Attributes
  7. 7 DOM Level 2-based Processors
    1. 7.1 Processing Namespaced RDFa Attributes
    2. 7.2 Processing RDFa Attributes
  8. 8 References
    1. 8.1 Normative References
    2. 8.2 Non-Normative References

1 Introduction

This section is informative.

Today's web is built predominantly for human consumption. Even as machine-readable data begins to permeate the web, it is typically distributed in a separate file, with a separate format, and very limited correspondence between the human and machine versions. As a result, web browsers can provide only minimal assistance to humans in parsing and processing web data: browsers only see presentation information. RDFa is intended to solve the problem of machine-readable data in HTML documents. RDFa provides a set of HTML attributes to augment visual data with machine-readable hints. Using RDFa, authors may turn their existing human-visible text and links into machine-readable data without repeating content.

1.1 History

In early 2004, Mark Birbeck published a document named [XHTMLRDF] via the XHTML2 Working Group wherein he laid the groundwork for what would eventually become RDFa (The Resource Description Framework in Attributes).

In 2006, the work was co-sponsored by the Semantic Web Deployment Work Group, which began to formalize a technology to express semantic data in XHTML. This technology was successfully developed and reached consensus at the W3C, later published as an official W3C Recommendation. While HTML provides a mechanism to express the structure of a document (title, paragraphs, links), RDFa provides a mechanism to express the meaning in a document (people, places, events).

The document, titled "RDF in XHTML: Syntax and Processing" [XHTML+RDFa], defined a set of attributes and rules for processing those attributes that resulted in the output of machine-readable semantic data. While the document applied to XHTML, the attributes and rules were always intended to operate across any tree-based structure containing attributes on tree nodes (such as HTML4, SVG and ODF).

While RDFa was initially specified for use in XHTML, adoption by a number of large organizations on the Web spurred RDFa's use in non-XHTML languages. Its use in HTML4, before an official specification was developed for those languages, caused concern regarding document conformance.

Over the years, the members of the RDFa Task Force [RDFaTF] had discussed the possibility of applying the same attributes and processing rules outlined in the XHTML+RDFa specification to all HTML family documents. By design, the possibility of a unified semantic data expression mechanism between all HTML and XHTML family documents was squarely in the realm of possibility.

This section describes the modifications to the original XHTML+RDFa specification that permit the use of RDFa in all HTML family documents. By using the attributes and processing rules described in the XHTML+RDFa specification and heeding the minor changes in this section, authors can expect to generate markup that produces the same semantic data output in HTML4, HTML5 and XHTML5.

2 Parsing Model

This section is normative.

Section 5.5: Sequence, of the [XHTML+RDFa] specification defines a generic processing model for extracting RDF from a tree-based model. The method of transforming an input document into a model suited for the RDFa processing rules is intentionally not defined in the XHTML+RDFa specification. The method of transformation was intended to be defined in the implementation language, in this case, this section of the HTML+RDFa specification.

The HTML5 and XHTML5 DOMs are each a super-set of the tree-based model on which the RDFa processing rules operate. Therefore, a mapping mechanism to translate from a DOM to a tree-model is not necessary. The HTML5 and XHTML5 DOM, or equivalent data structure, should be used as input to the RDFa processing rules. The normative language for construction of the HTML5 DOM and XHTML5 DOM is contained in the HTML5 specification.

2.1 Modifying the Input Document

This section is informative.

RDFa's tree-based processing rules, outlined in Section 5.5: Sequence of the XHTML+RDFa specification, allow an input document to be automatically corrected, cleaned-up, re-arranged, or modified in any way that is approved by the host language prior to processing. For example, element nesting issues in HTML documents may be corrected before the input document is translated into the DOM, a valid tree-based model, on which the RDFa processing rules will operate.

Any mechanism that generates a data structure equivalent to the HTML5 or XHTML5 DOM, such as the html5lib library, may be used as the mechanism to construct the tree-based model provided as input to the RDFa processing rules.

3 Conformance Requirements

This section is normative.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3.1 Document Conformance

In order for a document to claim that it is a conforming HTML+RDFa document, it must provide the facilities described as mandatory in this section. The document conformance criteria are listed below, of which only a subset are mandatory:

  1. All document conformance requirements stated as mandatory in the HTML5 specification must be met.
  2. There should be a version attribute on the html element. The value of the version attribute should be "HTML+RDFa 1.0" if the document is a non-XML mode document, or "XHTML+RDFa 1.0" if the document is a XML mode document.
  3. There may be a link element contained in the head element that contains profile for the the rel attribute and http://www.w3.org/1999/xhtml/vocab for the href attribute.

3.2 User Agent Conformance

A conforming RDFa user agent must:

3.3 RDFa Processor Conformance

A conforming RDFa Processor must implement all of the mandatory features specified in the XHTML+RDFa specification. It must also support any mandatory features specified in this specification.

4 Modifications to XHTML+RDFa

This section is normative.

The [XHTML+RDFa] Recommendation is the base document on which this specification builds. XHTML+RDFa specifies the attributes, in Section 2.1: The RDFa Attributes, and processing model, in Section 5: Processing Model, for extracting RDF from an XHTML document. This section specifies changes to the attributes and processing model defined in XHTML+RDFa in order to support extracting RDF from HTML documents.

The requirements and rules, as specified in XHTML+RDFa and further modified in this document, apply to all HTML5 documents. The RDFa Processor operating on HTML and XHTML documents, specifically the resulting DOMs, must apply the same processing rules for both types of serializations and DOMs.

4.1 Specifying the language for a literal

The lang attribute must be processed in the same manner as the xml:lang attribute is in the XHTML+RDFa specification, Section 5.5: Sequence, step #3.

The rules for determining the language of a node are specified in the section titled The lang and xml:lang attributes in the HTML5 specification.

If an author is editing an HTML fragment and is unsure of the final encapsulating MIME type for their markup, it is suggested that the author specify both lang and xml:lang where the value in both attributes is exactly the same.

4.2 Invalid XMLLiteral values

When generating literals of type XMLLiteral, the processor must ensure that the output XMLLiteral is a namespace well-formed XML fragment. A namespace well-formed XML fragment has the following properties:

If the input is not a namespace well-formed XML fragment, the processor must transform the input text in a way that ensures the well-formedness rules described in this section. If a sequence of characters cannot be transformed into a namespace well-formed XML fragment, the triple containing the XMLLiteral must not be generated.

An RDFa Processor that transforms the XML fragment must use the Coercing an HTML DOM into an Infoset rules, as specified in the HTML5 specification, prior to generating the triple containing the XMLLiteral. The serialization algorithm that must be used for generating the XMLLiteral is normatively defined in the Serializing XHTML Fragments section of the HTML5 specification.

Transformation to a namespace well-formed XML fragment is required because an application that consumes XMLLiteral data expects that data to be a namespace well-formed XML fragment.

The transformation requirement does not apply to input data that are text-only, such as literals that contain a datatype attribute with an empty value (""), or input data that that contain only text nodes.

An example transformation demonstrating the preservation of namespace values is provided below. The → symbol is used to denote that the line is a continuation of the previous line and is included purely for the purposes of readability:

<p xmlns:ex="http://example.org/vocab#"      
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 Two rectangles (the example markup for them are stored in a triple):
 <svg xmlns="http://www.w3.org/2000/svg" property="ex:markup" datatype="rdf:XMLLiteral">
→ <rect width="300" height="100" 
→ style="fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)"/>
→ <rect width="50" height="50" 
→ style="fill:rgb(255,0,0);stroke-width:2; 
→ stroke:rgb(0,0,0)"/></svg>
</p>
   
The markup above should produce the following triple:
<> 
   <http://example.org/vocab#markup>
      "<rect xmlns=\"http://www.w3.org/2000/svg\" width=\"300\" 
→ height=\"100\" style=\"fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)\"/>
→ <rect xmlns=\"http://www.w3.org/2000/svg\" width=\"50\" 
→ height=\"50\" style=\"fill:rgb(255,0,0);stroke-width:2; 
→ stroke:rgb(0,0,0)\"/>"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
   
Note the preservation of the SVG namespace by injecting a new xmlns attribute. Since the ex and rdf namespaces are not used in either rect element, they are not preserved in the XMLLiteral.

4.3 xmlns:-Prefixed Attributes

While this section outlines xmlns: processing in RDFa, the support for distributed extensibility in non-XML mode HTML5 (using xmlns and xmlns:) is still an open issue. This section may be further modified before Last Call based on progress made on the distributed extensibility issue.

CURIE prefix mappings specified using attributes prepended with xmlns: must be processed using the rules specified in Section 5.4, CURIE and URI Processing, contained in the XHTML+RDFa specification.

Since CURIE prefix mappings have been specified using xmlns:, and since HTML attribute names are case-insensitive, CURIE prefix names declared using the xmlns:attribute-name pattern xmlns:<PREFIX>="<URI>" should be specified using only lower-case characters. For example, the text "xmlns:" and the text in "<PREFIX>" should be lower-case only. This is to ensure that prefix mappings are interpreted in the same way between HTML (case-insensitive attribute names) and XHTML (case-sensitive attribute names) document types.

5 Extensions to the HTML5 Syntax

Status: ISSUE-41 (decentralized extensibility) blocks progress to Last Call

This section is normative.

There are a few changes that are required to the HTML5 specification in order to fully support RDFa. The following sub-sections outline the necessary modifications to the base HTML5 specification.

5.1 The RDFa Attributes and Valid Values

All RDFa attributes and valid values (including CURIEs), as listed in Section 2.1: The RDFa Attributes, are conforming when used in an HTML5 or XHTML5 document.

5.2 Conformance Criteria for xmlns:-Prefixed Attributes

While this section outlines xmlns: conformance criteria for HTML+RDFa, the support for distributed extensibility in non-XML mode HTML5 (using xmlns and xmlns:) is still an open issue. This section may be further modified before Last Call based on progress made on the distributed extensibility issue.

Since RDFa uses attributes starting with xmlns: to specify CURIE prefixes, it is important that any attribute starting with a case-insensitive match on the text string "xmlns:" be preserved in the DOM or other tree-like model that is passed to the RDFa Processor. While it is specified that HTML5 must preserve these attributes in the DOM, it must also accept these attributes as conforming in non-XML HTML5. For documents conforming to this specification, attributes with names that have the case insensitive prefix "xmlns:" are conforming in both HTML5 and XHTML5.

5.3 Preserving Namespaces via Coercion to Infoset

This section needs feedback from the user agent vendors to ensure that this feature does not conflict with user agent architecture and has no technical reason that it cannot be implemented.

RDFa is currently dependent on the xmlns: pattern to declare prefix mappings, it is imperative that namespace information that is declared in non-XML mode HTML5 documents are mapped to an Infoset correctly. In order to ensure this mapping is performed correctly, the "Coercing an HTML DOM into an infoset" rules defined in [HTML5] must be modified to include the following rule:

If the XML API is namespace-aware, the tool must ensure that proper ([namespace name], [local name], [normalized value]) namespace tuples are created when converting the non-XML mode DOM into an Infoset.

For example, given the following input text:

       <div xmlns:audio="http://purl.org/media/audio#">
    
The div element above, when coerced from an HTML DOM into an Infoset, should contain an attribute in the [namespace attributes] list with a [namespace name] set to "http://www.w3.org/2000/xmlns/", a [local name] set to audio, and a [normalized value] of "http://purl.org/media/audio#".

6 Infoset-based Processors

This section is informative

While the intent of the RDFa processing instructions were to provide a set of rules that are as language and toolchain agnostic as possible, for the sake of clarity, detailed methods of extracting RDFa content from processors operating on an XML Information Set are provided below.

6.1 Processing Namespaced RDFa Attributes

Extracting namespaced RDFa attributes while operating from within an Infoset-based RDFa processor can be achieved using the following algorithm:

While processing an element as described in [XHTML+RDFA], Section 5.5, Step #2:

  1. For each attribute in the [namespace attributes] list that has a [prefix] value, create a [URI mapping] by storing the [prefix] as the value to be mapped, and the [normalized value] as the value to map.
  2. For each attribute in the in the [attributes] list that has no value for [prefix] and a [local name] that starts with xmlns:, create a [URI mapping] by storing the [local name] part with the xmlns: characters removed as the value to be mapped, and the [normalized value] as the value to map.

    Note: This step is unnecessary if the Infoset coercion rules preserve namespaces specified in non-XML mode.

To demonstrate, assume that the following markup is processed by an Infoset-based RDFa processor:

<div xmlns:audio="http://purl.org/media/audio#" ...
After the markup is processed, there should exist a [URI mapping] in the [local list of URI mappings] that contains a mapping from audio to http://purl.org/media/audio#.

6.2 Processing RDFa Attributes

There are a number of non-prefixed attributes that are associated with RDFa Processing in HTML5. If an XML Information Set based RDFa processor is used to process these attributes, the following algorithm should be used to detect and extract the values of the attributes.

While processing an element as described in [XHTML+RDFA], Section 5.5, Step #4 through Step #9:

  1. For each RDFa attribute in the [attributes] list that has a [prefix]] with no value or a [prefix] with the value of http://www.w3.org/1999/xhtml, extract and use the [normalized value].

7 DOM Level 2-based Processors

This section is informative

This mechanism should be double-checked against all of the RDFa Javascript implementations to ensure correctness.

While the intent of the RDFa processing instructions were to provide a set of rules that are as language and toolchain agnostic as possible, for the sake of clarity, detailed methods of extracting RDFa content from processors operating in a DOM2 environment are provided below.

7.1 Processing Namespaced RDFa Attributes

Extracting namespaced RDFa attributes while operating from within a DOM Level 2 based RDFa processor can be achieved using the following algorithm:

While processing each [Element] as described in [XHTML+RDFA], Section 5.5, Step #2:

  1. For each [Attr] in the [Node.attributes] list that has a [namespace prefix] value of xmlns, create a [URI mapping] by storing the [local name] as the value to be mapped, and the [Node.nodeValue] as the value to map.
  2. For each [Attr] in the [Node.attributes] list that has a [namespace prefix] value of null and a [local name] that starts with xmlns:, create a [URI mapping] by storing the [local name] part with the xmlns: characters removed as the value to be mapped, and the [Node.nodeValue] as the value to map.

    Note: This step is unnecessary if the XML and non-XML mode DOMs are namespace consistent.

To demonstrate, assume that the following markup is processed by a DOM2-based RDFa processor:

<div xmlns:audio="http://purl.org/media/audio#" ...
After the markup is processed, there should exist a [URI mapping] in the [local list of URI mappings] that contains a mapping from audio to http://purl.org/media/audio#.

7.2 Processing RDFa Attributes

There are a number of non-prefixed attributes that are associated with RDFa processing in HTML5. If an DOM2-based RDFa processor is used to process these attributes, the following algorithm should be used to detect and extract the values of the attributes.

While processing an element as described in [XHTML+RDFA], Section 5.5, Step #4 through Step #9:

  1. For each RDFa attribute in the [Node.attributes] list that has a [namespace prefix] that is null or a [namespace prefix] with the value of http://www.w3.org/1999/xhtml, extract and use the [Node.nodeValue] as the value.

8 References

8.1 Normative References

[HTML5] (currently not a REC)
The HTML5 Specification, Ian Hickson. W3C, Work in Progress
[RFC2119]
RFC2119: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. IETF, March 1997.
[XHTML+RDFA]
RDFa in XHTML: Syntax and Processing, Mark Birbeck, Shane McCarron, Steven Pemberton. W3C, October 2008.
[DOM2]
Document Object Model (DOM) Level 2 Core Specification, Arnaud Le Hors, Philippe Le Hégaret, Lauren Wood, Gavin Nicol, Jonathan Robie, Mike Champion, Steve Byrne, W3C, November 2000
[INFOSET]
XML Information Set (Second Edition), John Cowan, Richard Tobin, W3C, February 2004

8.2 Non-Normative References

[XHTMLRDF]
XHTML and RDF, Mark Birbeck. W3C, February 2004.