- From: Henry S. Thompson <[email protected]>
- Date: Tue, 24 Oct 2006 22:17:24 +0100
- To: [email protected]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On its telephone conference earlier today, the TAG agreed to open a
new issue, TagSoupIntegration-54. This message contains a first draft
of the description of this issue for the issues list [1]. Comments and
suggested changes are invited, as experience to date suggests that
getting a satisfactory definition of exactly what's at issue here is
tricky.
- --------
Is the indefinite persistence of 'tag soup' HTML* consistent with a
sound architecture for the Web? If so, (and the going-in assumption
is that it _is_ so), what changes, if any, to fundamental Web
technologies are necessary to integrate 'tag soup' with SGML-valid
HTML and well-formed XML?
Heretofore W3C official policy has been not only to encourage the
'withering away' of non-XML content on the Web, but to insist on it.
The possibility of a change in this policy, towards one of at least
tolerance of, and perhaps even support for, SGLM-valid HTML and even
'tag soup', has recently been advocated and taken seriously in various
quarters. The TAG does not make policy, and it is off-topic for this
list to discuss policy issues, but the TAG definitely _does_ consider
architectural issues, and such a change would undoubtedly ask a number
of questions of Web archicture.
The TAG is interested in exploring ways in which 'tag soup' HTML and
SGML-valid HTML can be thoroughly integrated with the XML-orientated
Web, enjoying its many benefits as much as is possible. Among the
topics to be explored in this connection are:
* Can we standardize a series of "as if" propositions for non-XML HTML:
1) Treat it "as if" it had been processed by [some formalization
of] 'tidy -asxhtml';
2) Treat it "as if" it had a default namespace declaration
determined by its media type;
3) Treat it "as if" it was the serialization of the DOM produced by
[some formalization of] common browser error recovery
strategies?
* Can we successfully apply to non-XML web content the modularization
and composition stories under development for well-formed XML
documents which mix namespaces (e.g. SVG, MathML, RDF) with XHTML?
* In particular, can our rather more tentative understanding of what
is meant by "self-describing documents" likewise be applied to
non-XML plus SVG. . .?
* Should "as if" number (2) above be extended (contra recent TAG
finding Authoritative Metadata [2]) to include some form of
'sniffing'?
* Can we leverage the common-sense understanding of the phrase "the
HTML P element" as some kind of abstraction over language/version
details and exploit some of our developing understanding of
versioning to manage the relationship between 'tag soup',
SGML-valid HTML and well-formed XHTML?
*By 'tag soup' HTML is meant documents which are not well-formed
XHTML, or even SGML-valid HTML, but which none-the-less are
more-or-less successfully and consistently rendered by some HTML
browsers. Estimates of the percentage of HTML-family web-pages
currently being served which are neither well-formed XML nor
SGML-valid HTML vary widely: a quick sample of reports gives 1.5%,
80%, 82%, 91%, 97.8%, 99% and 99.3% for different sample spaces and
different times!
- ------
Please note that in so far as it's appropriate to discuss on this
_public_ list the relationship of the TAG's interests in this area to
the ongoing discussion about the W3C's stewardship of HTML (see e.g.
[3]), please do so only with reference to information which is
likewise public. Having said that, I'd much prefer to see discussion
about the architectural issue itself. . .
ht
[1] http://www.w3.org/2001/tag/issues.html
[2] http://www.w3.org/2001/tag/doc/mime-respect.html
[3] http://lists.w3.org/Archives/Public/www-forms/2006Aug/0153.html
- --
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
Half-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: [email protected]
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
iD8DBQFFPoLpkjnJixAXWBoRAoctAJkBQT1OPqErOs3dH4EJUl4ll3zFQACcDsDq
0ILyAAsg0HQVXXw/wNM5H2Q=
=L+yI
-----END PGP SIGNATURE-----
Received on Tuesday, 24 October 2006 21:17:39 UTC