Steven Pemberton, CWI/W3C, Amsterdam
These slides are in XHTML. They use the CSS media 'projection' to allow them to be displayed full-screen.
Senior researcher at CWI, the Dutch National Research Centre for Mathematics and Computer Science.
Involved in the Web from the beginning: organised two workshops at the first Web Conference in 1994
Chair of the HTML and Forms working groups at W3C
Co-author of CSS, HTML4, XHTML1, XForms, XML Events, XHTML2, etc.
Using a clever mutation of the <link> and <meta> elements, and the addition of a 'role' attribute, XHTML2 allows authors to layer real semantics on top of documents, semantics with a clear relationship to RDF, so that XHTML2 can be properly integrated into the semantic web.
But by layering semantics on top of XHTML in this way, a lot of special-purpose formats are rendered unnecessary.
This talk discusses the XHTML2 approach to Metadata.
In the NITF Tutorial, it states:
Web authors use HTML to describe the display of their pages. NITF, on the other hand, is designed to describe the substance of news article,
This is actually not true: HTML was designed as a structure defining language.
The browser manufacturers in classic Marking Behaviour, not understanding the structure defining design of HTML went and added presentation features.
XHTML2 is the next iteration in the HTML family.
XHTML1 addressed the problems of turning HTML into an XML application.
XHTML2 addresses the remaining identified problems in HTML4/XHTML1
In designing XHTML2, a number of design aims were kept in mind to help direct the design. These included:
As generic XML as possible: if a facility exists in XML, try to use that rather than duplicating it. This means that it already works to a large extent in existing browsers (main missing functionality XForms and XML Events).
More structure, less presentation: use stylesheets for defining presentation.
More usability: within the constraints of XML, try to make the language easy to write, and make the resulting documents easy to use.
More accessibility: 'designing for our future selves' – the design should be as inclusive as possible.
Better internationalization.
Better forms: after a decade of experience, we now know how to make forms a better experience.
Less scripting: achieving functionality through scripting is difficult for the author and restricts the type of user agent you can use to view the document. We have tried to identify current typical usage, and include those usages in markup.
More device independence: new devices coming online, such as telephones, PDAs, tablets, televisions and so on mean that it is imperative to have a design that allows you to author once and render in different ways on different devices, rather than authoring new versions of the document for each type of device.
Better semantics: integrate XHTML into the Semantic Web.
Keep old communities happy
Keep new communities happy
Integration with RDF/Semantic Web
Readable and writable by the HTML community
Flexible, extensible
News distribution is all about content and metadata.
NewsML for instance is essentially a big metadata wrapper round XHTML.
The question is: where should the metadata go, and how should it be expressed?
What we have done is craftily mutated <meta> and <link> so that they look more or less the same to the HTML author, but now have a clear relationship to RDF.
Then we generalised.
This was originally proposed in a white paper RDF/A (warning: details have changed since this was published), and after much work in a joint semantic web/HTML WG task force, was adopted into XHTML2 (that work is still not quite finished, since a detail (bnodes) still has to be finalised).
Extend the meta element:
meta
elementname
attribute is now called property
,
and can hold a namespaced value (a QName)about
attribute, that defaults to the current
documentExample:
<meta property="dc:creator">Steven Pemberton</meta>
This is also still allowed:
<meta property="dc:creator" content="Steven Pemberton"/>
Extend the link element slightly:
rel
and rev
attributes to hold
namespaced values.about
attributeExample:
<link rel="dc:rights" href="http://example.com/terms/contract123"/>
Add a role
attribute applicable to any element, that
specifies a semantic role for that element
Examples
<p role="nitf:byline">By Joseph P. Reporter</p>
<p role="prism:copyright">© Copyright 2001, Wanderlust Publications. All rights reserved.</p>
Having done that, we then allow all the attributes of
<link>
and <meta>
on any element.
This was already allowed:
This work is licensed under the <a rel="dc:rights" href="http://creativecommons.org/licenses/by/2.0/"> Creative Commons Attribution License</a>.
but you can also say things like this:
<body> <h property="title">My Life and Times</h> ...
which makes the top level heading and the title of the document the same thing, so they never get out of step.
The about
attribute allows you to describe other documents,
but also parts of the current document
Example
<meta about="#p123" ...
One usage of many is to allow richer metadata than the title
attribute allows. Now we can just say that
<p id="p123" title="whatever">
is equivalent to:
<p id="p123"> <meta about="#p123" property="title">whatever</meta>
<meta property="newsml:Identification"> <meta property="newsML:ProviderId">Reuters.com</meta> <meta property="newsML:DateId">20050524</meta> ...
<p><span content="2005-05-23">Yesterday</span>, <span rel="references" href="..." property="foaf:fullName" content="Tony Blair" >the prime minister</span> travelled to ... </p>
Because of the layered semantics, some formats are now strictly speaking unnecessary.
For instance, the RSS format is used to describe something else. You have to dual author or ensure that both forms are mutually up-to-date.
However, RSS is just a simple hypertext language. You could get the same effect by just marking up the very document you are describing:
<h role="rss:title">... <p role="rss:description">...
Finally I should say something about media, in particular images.
In XHTML2, the src
attribute (and its related attributes) may
be applied to any element, not just <img>
, with the
implication that they should be considered equivalent:
<p src="map.png">Turn left out of the station, walk straight on to the High Street, and turn right</p>
<img type="image/jpeg" src="gates.jpg"> Bill Gates makes speech. </img>
We can now say that <meta> and <link> define RDF triples:
The URL for the predicate is obtained by concatenating the namespace URL from the prefix to the other part of the value.
A parallel development, GRDDL, can be used to extract the RDF triples from an XHTML2 document.
You can explain it using HTML concepts.
If you don't care, you can just ignore it.
It doesn't require you to learn how to use RDF to be able to benefit from it.
You can build up layers of semantics, and slowly add them to existing content.
The RDF community get their triples without the HTML community having to learn RDF.
You can layer new semantics on top of XHTML2 without having to define a new document type.
XHTML2 is going to last call Really Soon
More details: www.w3.org/TR/xhtml2, www.w3.org/MarkUp (and add /Group to the end if you are a member company).
This talk: www.w3.org/2005/Talks/05-steven-Metadata-in-XHTML2