Copyright © 2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
GRDDL is a mechanism for Gleaning
Resource Descriptions from
Dialects of Languages. It is a technique
for obtaining RDF data from XML
documents and in particular XHTML pages. Authors may
explicitly associate documents with transformation algorithms, typically
represented in XSLT, using a link
element in
the head
of the document. Alternatively the information needed
to obtain the transformation may be held in an associated metadata profile
document or namespace document. Clients reading the document can follow their
nose using techniques described in the GRDDL specification to discover the
appropriate transformations. This document uses a number of examples from the
GRDDL Use Cases document to illustrate in detail the
techniques GRDDL provides for associating documents with appropriate
instructions for extracting any embedded data.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a First Public Working Draft of the GRDDL Primer. The GRDDL design was first released as a W3C technical report in April 2004. This document was developed by the GRDDL Working Group, which was chartered in July 2006 to review the specification and develop use cases, tutorial materials, and tests. The first few examples in this draft have been worked out in detail, though the examples later in the document are still under discussion. The Working Group expects to advance GRDDL to Recommendation Status, though this primer may end up as a separate Working Group Note.
GRDDL is intended to contribute to addressing Web Architecture issues such as RDFinXHTML-35 and namespaceDocument-8 as well as issues postponed by the RDF Core working group such as rdfms-validating-embedded-rdf and faq-html-compliance.
Please send comments about this document to [email protected] (with public archive). A log of changes is maintained for the convenience of editors and reviewers.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
GRDDL provides a relatively inexpensive set of mechanisms for bootstrapping RDF content from uniform XML dialects in such a way as to shift the burden of formulating RDF to transformation algorithms written specifically for these dialects. XML Transformation languages such as XSLT are quite versatile in their ability to process, manipulate, and generate XML and the use of XSLT to generate XHTML from single-purpose XML vocabularies is historically celebrated as a powerful idiom for separating structured content from presentation.
GRDDL shifts this idiom to a different end: separating structured content from its authoritative meaning (or semantics). The way in which GRDDL empowers authors of web content can be considered somewhat analogous to allowing a non-native speaker to learn the spoken form of a new language first, before attempting to master its written form - rather than trying to learn both simultaneously.
GRDDL works through associating transformations with an individual document either through direct inclusion of references or indirectly through profile documents. Content authors can nominate the transformations for producing RDF from their content and use GRDDL to refer to them. For XML formats the transformations are commonly expressed using XSLT 1.0, although other methods are permissible. Generally, if the transformation can be fully expressed in XSLT 1.0 then it is preferable to use that format since all GRDDL processors should be capable of interpreting an XSLT 1.0 document.
This document may be read in conjunction with the GRDDL Use Cases which describes a series of common scenarios for which GRDDL may be suitable. Readers desiring complete technical detail on the GRDDL mechanism should refer to the GRDDL Working Draft.
In this document the term HTML is used to refer to the XHTML dialect of HTML.
To introduce GRDDL concepts, the following section explores how GRDDL can be used to satisfy the scheduling use case. In this use case Jane, a frequent traveller, is trying to schedule a meeting with three of her friends.
GRDDL provides a number of ways for GRDDL Transformations to be associated with content, each of which is appropriate in different situations. The simplest method for authors of HTML content is to embed a reference to the transformations using a link element in the head of the document.
Microformats are simple conventions for embedding semantic markup for a specific domain in human-readable documents. In our example one of Jane's friends has marked up their schedule using the hCalendar microformat. The hCalendar microformat uses HTML class attributes to associate event related semantics with elements in the markup:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Robin's Schedule</title> </head> <body> <ol class="schedule"> <li>2006 <ol> <li class="vevent"> <strong class="summary">Fashion Expo</strong> in <span class="location">Paris, France</span>: <abbr class="dtstart" title="2006-10-20">Oct 20</abbr> to <abbr class="dtend" title="2006-10-22">22</abbr> </li> <li class="vevent"> <strong class="summary">New line review</strong> in <span class="location">Köln, Germany</span>: <abbr class="dtstart" title="2006-10-26">Oct 26</abbr> to <abbr class="dtend" title="2006-10-27">27</abbr> </li> <li class="vevent"> <strong class="summary">Clothing 2006</strong> in <span class="location">Rome, Italy</span>: <abbr class="dtstart" title="2006-12-1">Dec 1</abbr> to <abbr class="dtend" title="2006-12-5">5</abbr> </li> </ol> </li> <li>2007 <ol> <li class="vevent"> <strong class="summary">Diva Awards</strong> in <span class="location">Los Angeles, USA</span>: <abbr class="dtstart" title="2007-01-6">Jan 6</abbr> to <abbr class="dtend" title="2007-01-8">8</abbr> </li> <li class="vevent"> <strong class="summary">Board Review</strong> in <span class="location">New York, USA</span>: <abbr class="dtstart" title="2007-02-23">Feb 23</abbr> to <abbr class="dtend" title="2007-02-24">24</abbr> </li> </ol> </li> </ol> </body> </html>
To explicitly relate the data in this document to the RDF data model the
author needs to make two changes. First she needs to add a profile attribute
to the head element to denote that her document contains GRDDL metadata. In
HTML, profiles are used to link documents to descriptions of the metadata
schemes they employ. The profile URI for GRDDL is http://www.w3.org/2003/g/data-view
and by including this URI in her document Robin is declaring that the
metadata in her markup can be interpreted using GRDDL.
The resulting HTML might look like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Robin's Schedule</title> </head> <body> ...
Then she needs to add a link
element containing the reference
to the specific instructions for converting HTML containing hCalendar
patterns into RDF. She can either write her own instructions or re-use an
existing set. The link
element contains the token
transformation
in the rel
attribute and the URI of
the instructions for extracting RDF in the href
attribute
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Robin's Schedule</title> <link rel="transformation" href="http://www.w3.org/2002/12/cal/glean-hcal"/> </head> <body> ...
The profile URI in the resulting
document signals that the receiver of the document may look for link
elements with a rel
attribute containing the token
transformation
and use any or all of those links to determine
how to extract the data as RDF.
Another way to associate GRDDL instructions with a document is by referencing those transformations from a profile document referenced in the head of the HTML. This method can be more convenient for the content author but requires that the profile document contains GRDDL metadata and be accessible to the GRDDL client.
In our example another of Jane's friends, David, has chosen to mark up his schedule using Embedded RDF:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head profile="http://purl.org/NET/erdf/profile"> <title>Where Am I</title> <link rel="schema.cal" href="http://www.w3.org/2002/12/cal#" /> </head> <body> <p class="-cal-Vevent" id="tiddlywinks"> From <span class="cal-dtstart" title="2006-10-07">7 October, 2006</span> to <span class="cal-dtend" title="2006-10-13">12 October, 2006</span> I will be attending the <span class="cal-summary">National Tiddlywinks Championship</span> in <span class="cal-location">Bognor Regis, England</span> </p> <p class="-cal-Vevent" id="holiday"> Then I'm <span class="cal-summary">on holiday</span> in the <span class="cal-location">Cayman Islands</span> between <span class="cal-dtstart" title="2006-11-14">14 November, 2006</span> and <span class="cal-dtend" title="2007-01-02">1 January, 2007</span> </p> <p class="-cal-Vevent" id="award"> I'm back in the US on <span class="cal-dtstart" title="2007-01-08">the 8th January</span> to <span class="cal-summary">pick up a lifetime achievement award from the world gamers association</span>. This time the ceremony is in <span class="cal-location">Los Angeles</span>. I'll be flying home on the <span class="cal-dtend" title="2007-01-11">10th</span> </p> </body> </html>
Note that in this document the profile attribute does not contain a reference to the GRDDL profile. Instead it references the standard profile URI for Embedded RDF which does contain the GRDDL metadata. Anyone wishing to get the RDF data out of David's page can fetch the Embedded RDF profile URI to obtain the following profile document:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Embedded RDF HTML Profile</title> <link rel="transformation" href="http://www.w3.org/2003/g/glean-profile" /> </head> <body> <p> <a rel="profileTransformation" href="http://purl.org/NET/erdf/extract-rdf">GRDDL transform</a> </p> </body> </html>
This document contains a reference to the GRDDL profile which again
indicates that it may contain link elements with references to GRDDL
instructions that can be applied. Note that these instructions are applied to
this profile document, not David's document. Because the client is inspecting
a profile document it expects that the instructions identified by http://www.w3.org/2003/g/glean-profile
are for producing a list of URIs identifying instructions to be applied to
David's HTML document. Those instructions are identified in the profile
document using links with a rel
attribute of
profileTransformation
.
In this case the profile transformation refers to a a stylesheet that can convert HTML containing Embedded RDF into RDF/XML. This stylesheet can be applied to David's document to obtain the equivalent RDF triples.
This section is not worked out in as much detail as the sections above. In particular, the relationship between XFN and FOAF is still under study. Stay tuned for future drafts, or better yet, send us suggested improvements.
In this section the guitar review use case is used to explain more fully the role of GRDDL in aggregating data from a variety of different sources.
Stephen is an avid guitar player. Stephan wishes to buy a new guitar, so he decides to check reviews. There are various special interest publications online which feature musical instrument reviews and could be blogs which contain reviews by individuals. Among the reviewers there may be friends of Stephan and people whose opinion Stephan values (e.g. well-known musicians and people whose reviews Stephan has found useful in the past). There may also be reviews planted by instrument manufacturers which offer very biased views.
First, Stephan needs to get a list of people he considers trusted sources into some sort of machine readable document. One choice would be FOAF (Friend of a Friend), a popular RDF vocabulary for describing social networks of friends and personal data. Other choices include vCard/RDF. The question is how to get these values? Microformats define simple formats which can easily convert between HTML and RDF through the use of GRDDL. To extract vCard/RDF from HTML he uses an XSLT stylesheet to transform the hCard encoded HTML document.
<address class="vcard" id="smith-stephan">
<a href="http://example.org/ssmith" class="fn url">Stephan Smith</a>
</address>
This snippet of HTML is converted into RDF with the use of the XSLT:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#">
<rdf:Description rdf:about="http://example.org/ssmith">
<vCard:FN>Stephan Smith</vCard:FN>
<vCard:URL>http://example.org/ssmith</vCard:URL>
</rdf:Description>
</rdf:RDF>
Another microformat that allows for more information to be gleaned from
the document is XFN. XFN is the XHTML Friends Network. XFN outlines relationships between individuals using a controlled set of values in the rel
attributes of links. Examples of such relationships are friends, colleagues, co-workers, etc.
<ul>
<li><a href="http://peter.example.org/" rel="met friend collegue">Peter Smith</a></li>
<li><a href="http://john.example.org/" rel="met">John Doe</a></li>
<li><a href="http://paul.example.org/" rel="met">Paul Revere</a></li>
</ul>
Since XFN relationships are embedded in anchor (a
) elements, they can be expressed in RDF in a variety of ways. Given a document with XFN data, a GRDDL transformation can extract RDF data about his friends from a document marked up with XFN. These descriptions would allow an RDF spider (a scutter) to follow links to additional RDF content that may include vCard and FOAF descriptions.
On the Guitar site, there are product reviews for each guitar. The guitars are also marked up with microformats, so using GRDDL we can extract machine-readable data about each item. Along with manufacturer data, each member of the site can also leave feedback about the item in the form of a review, using a microformat like hReview that we can also convert to RDF.
With all of these tools we can find Stephan's friends and find the guitar reviews that those friends created. Using GRDDL we can glean information about the guitar in the form of product specifications supplied by the manufacturer and reviews from site members. Once we have this data as RDF we can run queries can be run on it using SPARQL. SPARQL (The SPARQL Protocol and RDF Query Language) is a query language for RDF.
If Stephan was looking for a guitar with a specific review rating or higher from a his group of friends, we now have enough data in RDF to do just that:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rev: <http:/www.purl.org/stuff/rev#>
SELECT DISTINCT ?name ?rating
FROM <http://example.org/guitar/1234/>
WHERE {
?x rev:reviewer ?reviewer ;
rev:rating ?rating .
FILTER (?rating > "2") .
?reviewer foaf:name ?name .
}
The first restriction on the data can be a check on review data to make sure it includes reviewers and ratings. Once we have all the matching reviews, we can then restrict the data so that the reviews are all those by Stephan's friends. From the XFN links in Stephan's page which identify people Stephan trusts, we can match URIs to other locations where they have been asserted (the guitar review page for instance).
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rev: <http:/www.purl.org/stuff/rev#>
PREFIX xfn: <http://gmpg.org/xfn/11#>
SELECT DISTINCT ?name ?rating ?xfnhomepage ?foafhomepage
FROM <http://example.org/guitar/1234/>
FROM <http://stephans-homepage.org/blogroll/xfn/>
WHERE {
?x rev:reviewer ?reviewer ;
rev:rating ?rating .
FILTER (?rating > "2") .
?reviewer foaf:name ?name ;
foaf:homepage ?foafhomepage .
?y xfn:friend ?xperson .
?xperson foaf:homepage ?xfnhomepage .
FILTER (?xfnhomepage = ?foafhomepage)
}
SPARQL results can be obtained as XML or JSON and can easily be consumed by another application. This can display the results on screen, email them to Stephan or it can be pulled into another application to search the web for the best prices on the short list of guitars.
This concludes the GRDDL Primer. Full technical detail of the GRDDL mechanism may be found in the corresponding Gleaning Resource Descriptions from Dialects of Languages (GRDDL) Working Draft.
The editor would like to thank the following Working Group members for authoring this document:
This document is a product of the GRDDL Working Group.
Changes since the WG decision to publish on 27 Sep include
$Log: Overview.html,v $ Revision 1.6 2018/10/09 13:29:22 denis fix validation of xhtml documents Revision 1.5 2017/10/02 10:32:21 denis add fixup.js to old specs Revision 1.4 2006/10/03 19:51:59 jean-gui Fixed an encoding issue with Dom's name Revision 1.3 2006/10/03 16:16:05 connolly removed editor's draft blurb from status section Revision 1.2 2006/10/03 15:42:34 jean-gui Removed some editor's draft markup Revision 1.1 2006/10/03 15:13:04 jean-gui Renamed primer.html to Overview.html Revision 1.1 2006/10/03 15:11:54 jean-gui /TR/2006/WD-grddl-primer-20061002/ Revision 1.16 2006/10/02 22:51:19 connolly turned public-grddl-comments mailbox into a link Revision 1.15 2006/09/30 00:38:47 connolly note in the status section that some examples are incomplete Revision 1.14 2006/09/30 00:35:01 connolly removed some links to the glossary that were copied from the use cases document updated link to suda.co.uk Revision 1.13 2006/09/30 00:27:26 connolly fix link from title page to acknowledgements section Revision 1.12 2006/09/30 00:26:10 connolly update parts of the status section that are different between use cases and primer Revision 1.11 2006/09/30 00:24:34 connolly - remove "previous version" link to talis copy from title page - move pubrules check to status section - expand change log to give full audit trail since WG decision - remove XHTML 1.1 icon, since pubrules requires 1.0 :-/ Revision 1.10 2006/09/29 23:54:08 hhalpin fixed minor errors and links revision 1.9 date: 2006/09/29 23:20:05; author: hhalpin; state: Exp; lines: +5 -90 primer chnages for pubrules ---------------------------- revision 1.8 date: 2006/09/29 23:10:58; author: hhalpin; state: Exp; lines: +1 -1 primer changes again ---------------------------- revision 1.7 date: 2006/09/29 23:07:42; author: hhalpin; state: Exp; lines: +170 -42 primer changes again ---------------------------- revision 1.6 date: 2006/09/29 22:43:53; author: hhalpin; state: Exp; lines: +2 -2 primer changes again spelling errors ---------------------------- revision 1.5 date: 2006/09/29 22:35:39; author: hhalpin; state: Exp; lines: +6 -7 primer changes again ---------------------------- revision 1.4 date: 2006/09/29 22:33:00; author: hhalpin; state: Exp; lines: +33 -70 primer changes ---------------------------- Revision 1.3 2006/09/29 22:05:17 connolly "under construction" sign atop the section with XFN in it Revision 1.2 2006/09/29 19:49:46 connolly copied from devcvs v 1.4 2006/09/29 19:00:43 idavis Revision 1.4 2006/09/29 19:00:43 idavis Fixed formatting of CVS log at end of document ---------------------------- revision 1.3 date: 2006/09/29 18:58:18; author: idavis; state: Exp; lines: +22 -13 Revised abstract to align more with use cases; checked in supporting HTML and PNG files ---------------------------- revision 1.2 date: 2006/09/29 18:22:17; author: idavis; state: Exp; lines: +591 -437 Inserted current, latest and previous version links; revised abstract completely; normalised to linefeed line endings ---------------------------- revision 1.1 date: 2006/09/29 16:38:15; author: connolly; state: Exp; 6180 2006-09-27 13:29:57Z http://research.talis.com/2006/grddl-wg/primer.html