Copyright ©2003, 2004, 2005 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document presents GRDDL, a mechanism for Gleaning Resource Descriptions from Dialects of Languages; that is, for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT.
The previous version of this work was released in April 2004 as a W3C Coordination Group Note by the Semantic Web Coordination Group, as it was relevant to issues that were postponed by the RDF Core Working Group: rdfms-validating-embedded-rdf and faq-html-compliance. It turns out to be relevant to Web Architecture issues such as RDFinXHTML-35 and namespaceDocument-8 as well. A related design history and rationale discusses contribution of this design to those TAG issues. This 16 May 2005 version is released as a W3C Team Submission for consideration by the community.
This design started with a sketch in May 2003. There are now multiple implementations including an online service and a growing test suite. A log of changes is appended.
Please send review comments, implementation experience reports, etc. to [email protected], the mailing list of the RDF in XHTML task-force of the the Semantic Web Best Practices and Deployment Working Group and the HTML Working Group; the mailing list has a public archive.
By publishing this document, Dan Connolly and Dominique Hazaël-Massieux have made a formal submission to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. Please consult the complete list of acknowledged W3C Team Submissions.
Data formats like XML and XHTML are used in the Web for a large spectrum of purposes, from poetry and drama to spreadsheets and databases. The information in a poem may be rich and subtle; we might use a computer pick out the author's name, but themes and opposing forces are not readily computable. When extracting data from documents, preserving meaning is important: if a document says "It is highly unlikely that the king was over twenty years old" and a computation returns "the king was over twenty years old," that computation does not preserve meaning.
The Resource Description Framework[RDFC04] codifies certain forms of data—simple logical statements like age(king, 20)—and specifies basic rules for preserving meaning. The framework includes a constrained XML concrete syntax, but it also includes an abstract syntax. GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages; that is, for getting RDF data out of XML and XHTML documents.
For example, Dublin Core meta-data can be written in an HTML dialect[RFC2731] that has a clear correspondence to an encoding in RDF/XML[DCRDF]. The correspondence can be expressed in an XSLT transformation, dc-extract.xsl:
Transforming HTML meta-data to RDF/XML (svg)
The transformation preserves the author's meaning, provided the author understood the conventions of this dialect. But an author may have accidentally conformed to the syntactic conventions without any knowledge of Dublin Core at all. In that case, the mapping most likely does not preserve the author's meaning. In GRDDL, documents contain explicit references to the conventions that the author used to encode data.
A reference to http://www.w3.org/2003/g/data-view from the profile attribute (c.f. section 7.4.4.3 Meta data profiles of [HTML4]) of an XHTML document[XHTML] indicates that links of type transformation relate the document to transformations that preserve its meaning.
For example, this document not only follows the conventions of [RFC2731], but it explicitly uses the GRDDL profile and links to a transformation that extracts the meta-data in RDF/XML in a way that preserves the meaning of the document:
<html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title> <link rel="transformation" href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" /> <meta name="DC.Subject" content="ADAM; Simple Search; Index+; prototype" /> ... </head> ... </html>
In the figure below, the arrow labelled info relates a document to an abstract notion of the information contained in the document. It shows that the RDF data extracted via the dc-extract.xsl transformation is part of the information contained in the document:
This is what the data looks like in RDF/XML:
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > <rdf:Description rdf:about=""> <dc:subject>ADAM; Simple Search; Index+; prototype</dc:subject> </rdf:Description> </rdf:RDF>
Note that an XHTML document may conform to a number of dialects simultaneously and link to more than one decoding algorithm:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Joe Lambda's Home page [an example of RDF in XHTML]</title> <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl" /> <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl" /> <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl" /> ...
The GRDDL profile mechanism is a special case of GRDDL designed to fit within the DTD-based syntax of XHTML. The general form of GRDDL is an attribute suitable for use with a wide variety of XML dialects.
The transformation
attribute in the
http://www.w3.org/2003/g/data-view#
namespace on the root
element of an XML document refers to a list of transformations
that preserve the document's meaning.
The value of the grddl:transformation
attribute
designates a list of algorithms by URI reference (c.f. section 4.4.1. URI
references in [WEBARCH]).
In some dialect of XHTML not constrained by DTD syntax, the above example can be written:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:data-view="http://www.w3.org/2003/g/data-view#" data-view:transformation=http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Joe Lambda's Home page [an example of RDF in XHTML]</title> ...
Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace or XHTML profile. Consider this privacy policy written in P3Q, a contrived analog to P3P[P3P]:
<POLICIES xmlns="http://www.w3.org/2004/01/rdxh/p3q-ns-example"> <EXPIRY max-age="604800"/> ...
The namespace document for P3Q relates the grokP3Q.xsl transformation to all P3Q documents:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dataview="http://www.w3.org/2003/g/data-view#"> <rdf:Description rdf:about="http://www.w3.org/2004/01/rdxh/p3q-ns-example"> <dataview:namespaceTransformation rdf:resource="http://www.w3.org/2004/01/rdxh/grokP3Q.xsl"/> </rdf:Description> </rdf:RDF>
That is an example of the general case:
Likewise for XHTML profiles:
Note that statements gleaned from namespace documents and profile documents are a part of their meaning; these documents need not be written in RDF/XML directly
Consider a purchase order whose namespace document is an XML Schema, where the XML Schema bears a data-view:transformation attribute licensing extraction of statements that include namespaceTransformation statements:
Analogously, consider a profile document whose information content includes, by way of a GRDDL transformation, a profileTransformation relationship:
The transformation link type refers to a transformation algorithm that should have a available representations in widely-supported formats. We expect most consumers to support XSLT version 1[XSLT1] for the foreseeable future, though XSLT2[XSLT2] deployment is increasing. While javascript, C, or any other programming language technically expresses the relevant information, XSLT is specifically designed to express XML to XML transformations and has some good safety characteristics.
Transformation algorithms should be well-defined functions whose
only input is the source document. The use of the XSLT
document()
function to incorporate other data at transformation
time is an error.
RFC 2046, in section 9. Security Considerations says:
Implementors should pay special attention to the security implications of any media types that can cause the remote execution of any actions in the recipient's environment. In such cases, the discussion of the "application/postscript" type may serve as a model for considering other media types with remote execution capabilities.
Given the expressive power of XSLT, and the possibility to access external
resources from a XSLT style sheet (e.g. through the document
function or the xsl:import
mechanism), implementors should take
the appropriate measures to prevent malicious usage of this mechanism.
Informative references
An example homepage with Dublin Core, GeoURL, RSS, Creative Commons, etc. demonstrates several transformations and dialects.
The authors provide pair of online services on an experimental, best-effort basis:
Client-side implementations are also in development:
Implementation experience to date suggests investigating the following issues:
A collection of test cases is in development. The original announcement was 02 Feb 2005. As of this writing ($Revision: 1.9 $ of $Date: 2005/05/16 20:37:52 $) they include:
Changes since the Apr 2004 release:
$Log: Overview.html,v $ Revision 1.9 2005/05/16 20:37:52 connolly SOTD tweak w.r.t. prev ver Revision 1.8 2005/05/16 20:36:33 connolly added previous version link noted author's draft in changes section Revision 1.7 2005/05/16 20:32:49 connolly - figure markup tweak - SOTD CG to WG - hid bib fodder Revision 1.6 2005/05/16 20:25:34 connolly SOTD Revision 1.5 2005/05/16 20:15:34 connolly copyright years Revision 1.4 2005/05/16 20:14:52 connolly standard TeamSubmission icon markup, copyright markup Revision 1.3 2005/05/16 20:12:57 connolly - pubrules: - this version/latest version - move CVS keywords to meta - added team submission stylesheet Revision 1.2 2005/05/16 20:02:29 connolly copy of http://www.w3.org/2004/01/rdxh/spec.html 1.74 2005/04/20 20:54:10 Revision 1.74 2005/04/20 20:54:10 connolly tm fix Revision 1.73 2005/04/20 20:43:31 connolly - a computation, not an Revision 1.72 2005/04/20 20:37:44 connolly - revised abstract - added P3P ref - spell-check Revision 1.71 2005/04/20 17:53:59 connolly "Implementation Experience" section becomes "Software and Services" Revision 1.70 2005/04/20 17:50:31 connolly - re-worked namespace doc section - moved GRDDL transformations section down near security considerations - moved open issues to implementation experience section - noted May 2003 sketch in change history - reduced scope of "Example Use Cases" section - added missing . in SOTD; removed extra blank line in example Revision 1.69 2005/04/20 17:25:15 connolly brought grddl-xml section inline with revised intro etc. added example Revision 1.68 2005/04/20 17:13:54 connolly re-worked "GRDDL for XHTML" section w.r.t. information content separated "GRDDL transformations" out as its own section Revision 1.67 2005/04/20 16:27:54 connolly smoothed out intro a bit Revision 1.66 2005/04/20 16:07:59 connolly re-worked SOTD in preparation for release as team submission Revision 1.65 2005/03/25 23:03:14 connolly - reworking intro... - not done... changing platforms to work on another figure - reduced indentation for examples - succeeded in usinig object for svg/png illustration Revision 1.64 2005/03/24 16:05:55 connolly - TR base stylesheet - note some issues Revision 1.63 2005/03/24 05:08:19 connolly merged a couple paras in SOTD Revision 1.62 2005/03/24 04:47:59 connolly oops; figure was inside example div; fixed removed extra paren in test case appendix Revision 1.61 2005/03/24 04:35:59 connolly added several illustrations which clarify quite a bit and suggest different ways of explaining/specifying things Revision 1.60 2005/03/24 03:23:49 connolly getting feedback on diagrams Revision 1.59 2005/03/23 23:50:28 connolly working on examples, figures Revision 1.58 2005/03/23 18:53:51 connolly removed overly specific reference to HTML in the XML section Revision 1.57 2005/03/22 23:49:22 connolly - moved supplementary material from under TOC to appendixes Revision 1.56 2005/03/22 23:34:46 connolly - dropped issue about namespace-qualifying the rel value; conflicting profiles doesn't seem like a big concern - dropped issue about xsl:import; seems covered elsewhere - specified URI by reference to webarch - added References section - removed "a few issues remain" from SOTD - added a class for editorial issues Revision 1.55 2005/03/22 23:07:05 connolly - specified profileTransformation along with namespaceTransformation - added test cases appendix (thought about linking testable assertions to test cases, but didn't follow thru) - linked changelog from "This Version" - demangled Dom's name (again. argh. wish nxml-mode and mule would get along) - copyright 2005 too. - moved bulk of implementation stuff from SOTD to an appendix Revision 1.54 2005/03/22 22:17:42 connolly added changelog ---------------------------- revision 1.53 date: 2004/12/07 23:19:58; author: connolly; state: Exp; lines: +5 -5 interpreter renamed transformation ---------------------------- revision 1.52 date: 2004/06/09 14:06:20; author: connolly; state: Exp; lines: +3 -3 demangle Dom's name ---------------------------- revision 1.51 date: 2004/06/09 13:59:00; author: connolly; state: Exp; lines: +3 -3 typo ---------------------------- revision 1.50 date: 2004/06/09 13:57:46; author: connolly; state: Exp; lines: +10 -10 rework abstract ---------------------------- revision 1.49 date: 2004/06/09 13:47:20; author: connolly; state: Exp; lines: +27 -8 - beefed up abstract - pointed to client-side implementations ---------------------------- revision 1.48 date: 2004/04/13 20:54:41; author: connolly; state: Exp; lines: +4 -17 remove some SOTD boilerplate ---------------------------- revision 1.47 date: 2004/04/13 20:53:37; author: connolly; state: Exp; lines: +11 -8 now that the TR version is published, revert status, stylesheet, changelog ----------------------------