Copyright © 2010 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This specification defines rules and guidelines for adapting the RDF in XHTML: Syntax and Processing (RDFa) specification for use in the HTML5 and XHTML5 members of the HTML family. The rules defined in this specification not only apply to HTML5 documents in non-XML and XML mode, but also to HTML4 and XHTML documents interpreted through the HTML5 parsing rules.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a Working Draft of the "HTML+RDFa: A mechanism for embedding RDF in HTML" specification for review by W3C members and other interested parties.
This Working Draft includes the following changes:
If you wish to make comments regarding this document, please send them to [email protected] (subscribe, archives) or to [email protected] (subscribe, archives), or submit them using the W3C's public bug database.
Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification should note the status, and are encouraged to join the RDFa Working Group.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The publication of this document by the W3C as a W3C Working Draft does not imply that all of the participants in the W3C HTML working group endorse the contents of the specification. Indeed, for any section of the specification, one can usually find many members of the working group or of the W3C as a whole who object strongly to the current text, the existence of the section at all, or the idea that the working group should even spend time discussing the concept of that section.
The latest stable version of the editor's draft of this specification is always available on the W3C CVS server. The latest editor's working copy (which may contain unfinished text in the process of being prepared) is also available. A Diff-marked version is available.
This specification has been jointly developed by the RDFa Task Force and the HTML Working Group and is currently being published by the HTML Working Group to further discussions there.
This specification is an extension to the HTML5 language. All normative content in the HTML5 specification, unless specifically overridden by this specification, is intended to be the basis for this specification.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This section is informative.
Today's web is built predominantly for human consumption. Even as machine-readable data begins to permeate the web, it is typically distributed in a separate file, with a separate format, and very limited correspondence between the human and machine versions. As a result, web browsers can provide only minimal assistance to humans in parsing and processing web data: browsers only see presentation information. RDFa is intended to solve the problem of machine-readable data in HTML documents. RDFa provides a set of HTML attributes to augment visual data with machine-readable hints. Using RDFa, authors may turn their existing human-visible text and links into machine-readable data without repeating content.
In early 2004, Mark Birbeck published a document named [XHTMLRDF] via the XHTML2 Working Group wherein he laid the groundwork for what would eventually become RDFa (The Resource Description Framework in Attributes).
In 2006, the work was co-sponsored by the Semantic Web Deployment Work Group, which began to formalize a technology to express semantic data in XHTML. This technology was successfully developed and reached consensus at the W3C, later published as an official W3C Recommendation. While HTML provides a mechanism to express the structure of a document (title, paragraphs, links), RDFa provides a mechanism to express the meaning in a document (people, places, events).
The document, titled "RDF in XHTML: Syntax and Processing" [XHTML+RDFa], defined a set of attributes and rules for processing those attributes that resulted in the output of machine-readable semantic data. While the document applied to XHTML, the attributes and rules were always intended to operate across any tree-based structure containing attributes on tree nodes (such as HTML4, SVG and ODF).
While RDFa was initially specified for use in XHTML, adoption by a number of large organizations on the Web spurred RDFa's use in non-XHTML languages. Its use in HTML4, before an official specification was developed for those languages, caused concern regarding document conformance.
Over the years, the members of the RDFa Task Force [RDFaTF] had discussed the possibility of applying the same attributes and processing rules outlined in the XHTML+RDFa specification to all HTML family documents. By design, the possibility of a unified semantic data expression mechanism between all HTML and XHTML family documents was squarely in the realm of possibility.
This section describes the modifications to the original XHTML+RDFa specification that permit the use of RDFa in all HTML family documents. By using the attributes and processing rules described in the XHTML+RDFa specification and heeding the minor changes in this section, authors can expect to generate markup that produces the same semantic data output in HTML4, HTML5 and XHTML5.
This section is normative.
Section 5.5: Sequence, of the [XHTML+RDFa] specification defines a generic processing model for extracting RDF from a tree-based model. The method of transforming an input document into a model suited for the RDFa processing rules is intentionally not defined in the XHTML+RDFa specification. The method of transformation was intended to be defined in the implementation language, in this case, this section of the HTML+RDFa specification.
The HTML5 and XHTML5 DOMs are each a super-set of the tree-based model on which the RDFa processing rules operate. Therefore, a mapping mechanism to translate from a DOM to a tree-model is not necessary. The HTML5 and XHTML5 DOM, or equivalent data structure, should be used as input to the RDFa processing rules. The normative language for construction of the HTML5 DOM and XHTML5 DOM is contained in the HTML5 specification.
This section is informative.
RDFa's tree-based processing rules, outlined in Section 5.5: Sequence of the XHTML+RDFa specification, allow an input document to be automatically corrected, cleaned-up, re-arranged, or modified in any way that is approved by the host language prior to processing. For example, element nesting issues in HTML documents may be corrected before the input document is translated into the DOM, a valid tree-based model, on which the RDFa processing rules will operate.
Any mechanism that generates a data structure equivalent to the HTML5 or XHTML5 DOM, such as the html5lib library, may be used as the mechanism to construct the tree-based model provided as input to the RDFa processing rules.
This section is normative.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
In order for a document to claim that it is a conforming HTML+RDFa document, it must provide the facilities described as mandatory in this section. The document conformance criteria are listed below, of which only a subset are mandatory:
version
attribute on the
html
element. The value of the version
attribute should be "HTML+RDFa 1.0" if the document is a non-XML mode
document, or "XHTML+RDFa 1.0" if the document is a XML mode
document.link
element contained in the
head
element that contains profile
for the the
rel
attribute and
http://www.w3.org/1999/xhtml/vocab
for the href
attribute.A conforming RDFa user agent must:
A conforming RDFa Processor must implement all of the mandatory features specified in the XHTML+RDFa specification. It must also support any mandatory features specified in this specification.
This section is normative.
The [XHTML+RDFa] Recommendation is the base document on which this specification builds. XHTML+RDFa specifies the attributes, in Section 2.1: The RDFa Attributes, and processing model, in Section 5: Processing Model, for extracting RDF from an XHTML document. This section specifies changes to the attributes and processing model defined in XHTML+RDFa in order to support extracting RDF from HTML documents.
The requirements and rules, as specified in XHTML+RDFa and further modified in this document, apply to all HTML5 documents. The RDFa Processor operating on HTML and XHTML documents, specifically the resulting DOMs, must apply the same processing rules for both types of serializations and DOMs.
The lang
attribute must be processed in the same manner as
the xml:lang
attribute is in the XHTML+RDFa specification,
Section 5.5:
Sequence, step #3.
If an author is editing an HTML fragment and is unsure of the final
encapsulating MIME type for their markup, it is suggested that the author
specify both lang
and xml:lang
where the value in
both attributes is exactly the same.
When generating literals of type XMLLiteral, the processor must ensure that the output XMLLiteral is a namespace well-formed XML fragment. A namespace well-formed XML fragment has the following properties:
xmlns
attribute as well as all currently active attributes starting with
xmlns:
must be preserved in the generated XMLLiteral. This
preservation must be accomplished by placing all active namespaces in
each top-level element in the generated XMLLiteral, taking care to not
over-write pre-existing namespace values.An RDFa Processor that transforms the XML fragment must use the Coercing an HTML DOM into an Infoset rules, as specified in the HTML5 specification, prior to generating the triple containing the XMLLiteral. The serialization algorithm that must be used for generating the XMLLiteral is normatively defined in the Serializing XHTML Fragments section of the HTML5 specification.
Transformation to a namespace well-formed XML fragment is required because an application that consumes XMLLiteral data expects that data to be a namespace well-formed XML fragment.
The transformation requirement does not apply to input data that are
text-only, such as literals that contain a datatype
attribute
with an empty value (""
), or input data that that contain only
text nodes.
An example transformation demonstrating the preservation of namespace values is provided below. The → symbol is used to denote that the line is a continuation of the previous line and is included purely for the purposes of readability:
<p xmlns:ex="http://example.org/vocab#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> Two rectangles (the example markup for them are stored in a triple): <svg xmlns="http://www.w3.org/2000/svg" property="ex:markup" datatype="rdf:XMLLiteral"> → <rect width="300" height="100" → style="fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)"/> → <rect width="50" height="50" → style="fill:rgb(255,0,0);stroke-width:2; → stroke:rgb(0,0,0)"/></svg> </p>The markup above should produce the following triple:
<> <http://example.org/vocab#markup> "<rect xmlns=\"http://www.w3.org/2000/svg\" width=\"300\" → height=\"100\" style=\"fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)\"/> → <rect xmlns=\"http://www.w3.org/2000/svg\" width=\"50\" → height=\"50\" style=\"fill:rgb(255,0,0);stroke-width:2; → stroke:rgb(0,0,0)\"/>"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteralNote the preservation of the SVG namespace by injecting a new
xmlns
attribute. Since the ex
and rdf
namespaces are not used in either rect
element, they are not
preserved in the XMLLiteral.
xmlns:
-Prefixed AttributesWhile this section outlines xmlns: processing in RDFa, the support for distributed extensibility in non-XML mode HTML5 (using xmlns and xmlns:) is still an open issue. This section may be further modified before Last Call based on progress made on the distributed extensibility issue.
CURIE prefix mappings specified using attributes prepended with
xmlns:
must be processed using the rules specified in
Section
5.4, CURIE and URI Processing, contained in the XHTML+RDFa
specification.
Since CURIE prefix mappings have been specified using
xmlns:
, and since HTML attribute names are case-insensitive,
CURIE prefix names declared using the xmlns:
attribute-name
pattern xmlns:<PREFIX>="<URI>"
should be specified
using only lower-case characters. For example, the text
"xmlns:
" and the text in "<PREFIX>"
should
be lower-case only. This is to ensure that prefix mappings are interpreted
in the same way between HTML (case-insensitive attribute names) and XHTML
(case-sensitive attribute names) document types.
Status: ISSUE-41 (decentralized extensibility) blocks progress to Last Call
This section is normative.
There are a few changes that are required to the HTML5 specification in order to fully support RDFa. The following sub-sections outline the necessary modifications to the base HTML5 specification.
All RDFa attributes and valid values (including CURIEs), as listed in Section 2.1: The RDFa Attributes, are conforming when used in an HTML5 or XHTML5 document.
xmlns:
-Prefixed
AttributesWhile this section outlines xmlns: conformance criteria for HTML+RDFa, the support for distributed extensibility in non-XML mode HTML5 (using xmlns and xmlns:) is still an open issue. This section may be further modified before Last Call based on progress made on the distributed extensibility issue.
Since RDFa uses attributes starting with xmlns:
to specify
CURIE prefixes, it is important that any attribute starting with a
case-insensitive match on the text string "xmlns:
" be
preserved in the DOM or other tree-like model that is passed to the RDFa
Processor. While it is specified that HTML5 must preserve these attributes
in the DOM, it must also accept these attributes as conforming in non-XML
HTML5. For documents conforming to this specification, attributes with
names that have the case insensitive prefix "xmlns:
" are
conforming in both HTML5 and XHTML5.
This section needs feedback from the user agent vendors to ensure that this feature does not conflict with user agent architecture and has no technical reason that it cannot be implemented.
RDFa is currently dependent on the xmlns:
pattern to
declare prefix mappings, it is imperative that namespace information that
is declared in non-XML mode HTML5 documents are mapped to an Infoset
correctly. In order to ensure this mapping is performed correctly, the
"Coercing an HTML DOM into an infoset" rules defined in [HTML5] must be modified to include the following rule:
If the XML API is namespace-aware, the tool must ensure that proper ([namespace name], [local name], [normalized value]) namespace tuples are created when converting the non-XML mode DOM into an Infoset.
For example, given the following input text:
<div xmlns:audio="http://purl.org/media/audio#">The
div
element above, when coerced from an HTML DOM into
an Infoset, should contain an attribute in the [namespace
attributes] list with a [namespace name] set to
"http://www.w3.org/2000/xmlns/
", a [local name] set to
audio
, and a [normalized value] of
"http://purl.org/media/audio#
".
This section is informative
While the intent of the RDFa processing instructions were to provide a set of rules that are as language and toolchain agnostic as possible, for the sake of clarity, detailed methods of extracting RDFa content from processors operating on an XML Information Set are provided below.
Extracting namespaced RDFa attributes while operating from within an Infoset-based RDFa processor can be achieved using the following algorithm:
While processing an element as described in [XHTML+RDFA], Section 5.5, Step #2:
xmlns:
, create a [URI mapping] by
storing the [local name] part with the xmlns:
characters
removed as the value to be mapped, and the [normalized
value] as the value to map.To demonstrate, assume that the following markup is processed by an Infoset-based RDFa processor:
<div xmlns:audio="http://purl.org/media/audio#" ...After the markup is processed, there should exist a [URI mapping] in the [local list of URI mappings] that contains a mapping from
audio
to http://purl.org/media/audio#
.
There are a number of non-prefixed attributes that are associated with RDFa Processing in HTML5. If an XML Information Set based RDFa processor is used to process these attributes, the following algorithm should be used to detect and extract the values of the attributes.
While processing an element as described in [XHTML+RDFA], Section 5.5, Step #4 through Step #9:
http://www.w3.org/1999/xhtml
, extract and use the [normalized
value].This section is informative
This mechanism should be double-checked against all of the RDFa Javascript implementations to ensure correctness.
While the intent of the RDFa processing instructions were to provide a set of rules that are as language and toolchain agnostic as possible, for the sake of clarity, detailed methods of extracting RDFa content from processors operating in a DOM2 environment are provided below.
Extracting namespaced RDFa attributes while operating from within a DOM Level 2 based RDFa processor can be achieved using the following algorithm:
While processing each [Element] as described in [XHTML+RDFA], Section 5.5, Step #2:
xmlns
, create a [URI mapping] by
storing the [local
name] as the value to be mapped, and the [Node.nodeValue]
as the value to map.xmlns:
, create a [URI mapping] by
storing the [local name] part with the xmlns:
characters
removed as the value to be mapped, and the [Node.nodeValue]
as the value to map.To demonstrate, assume that the following markup is processed by a DOM2-based RDFa processor:
<div xmlns:audio="http://purl.org/media/audio#" ...After the markup is processed, there should exist a [URI mapping] in the [local list of URI mappings] that contains a mapping from
audio
to http://purl.org/media/audio#
.
There are a number of non-prefixed attributes that are associated with RDFa processing in HTML5. If an DOM2-based RDFa processor is used to process these attributes, the following algorithm should be used to detect and extract the values of the attributes.
While processing an element as described in [XHTML+RDFA], Section 5.5, Step #4 through Step #9:
http://www.w3.org/1999/xhtml
, extract and use the [Node.nodeValue]
as the value.