This document is also available in these non-normative formats: http://www.w3.org/TR/2005/NOTE-xml11schema10-20050511/11sp.xml.
Copyright © 2005 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, and document use rules apply.
XML Schema 1.0 did not anticipate new versions of XML, and mandated XML 1.0 documents as the starting point for schema-validity assessment. Some users and specifications would like to use XML Schema processors which process XML 1.1 documents, and some implementors of XML Schema processors would like to provide XML 1.1 support.
This Note suggests an implementation strategy for implementors to adopt to enable users and specifications to get such support in a consistent way. All aspects of XML Schema which are liable to re-interpretation as a result of changes in XML 1.1 are discussed.
An implementation of schema-validity assessment employing such a strategy is strictly speaking non-conformant to the current version of the XML Schema specification. The XML Schema WG none-the-less believes that interoperability will best be served by such non-conformant processors being made available to users, until such time as a subsequent version of XML Schema addressing this issue normatively is approved.
This section describes the status of this document at the
time of its publication. Other documents may supersede this
document. A list of current W3C publications and the latest revision
of this technical report can be found in the W3C technical reports index at
http://www.w3.org/TR/
.
This document is a Working Group Note prepared by the W3C XML Schema Working Group, as part of the W3C XML Activity, and published on 11 May 2005. It describes methods of supporting XML 1.1 documents with schema processors designed to support XML Schema 1.0.
XML Schema 1.0 parts 1 and 2 refer normatively to XML 1.0 and makes no explicit provision for support of later versions of the XML specification; this lack is sometimes advanced as a reason for W3C specifications which depend on XML Schema not to support XML 1.1. But there are strong reasons to encourage the wide adoption of XML 1.1, which is more successfully internationalized than XML 1.0. At the time this Note is published, the question of how best to support XML 1.1 in XML Schema is still open.
This Note offers strategies for supporting XML 1.1, based on the implementation experience of some members of the XML Schema Working Group. It is hoped that the techniques described here will be helpful to other implementors and to users. Equally, the Working Group hopes that this Note will elicit discussion in the larger XML community concerning the best way for the XML Schema Working Group to balance the competing demands of flexibility in references to other specifications, stability, and interoperability. This Note is published with the full consensus of the XML Schema Working Group.
Comments on this document and the issues it raises are welcome; please send comments on this document to [email protected] (archive).
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This document may be updated, replaced or obsoleted by other documents at any time. The XML Schema Working Group does not currently expect to produce further versions or revisions of this document, but experience with the subject matter of this Note may lead to changes in the normative text of future versions of the XML Schema specification.
1 Introduction
2 Survey of XML 1.1 challenges for XML Schema 1.0
3 First step towards XML 1.1: the parser
4 Recommended strategy: Move to 1.1-compatible type definitions
5 The details
6 Backward incompatibilities
7 Summary of Recommendations for Interoperability
As published the XML Schema specification references XML 1.0and XML Namespaces 1.0 explicitly,
and incorporates by reference certain key definitions, in particular those of
the Char
, Name
, QName and S
character classes.
The contents of these classes has changed in XML 1.1and XML Namespaces 1.1, so although nothing in
the existing XML Schema specification specifically bars the processing of
infosets produced by XML 1.1 conformant parsers, such infosets, if they exploit
any of the relevant changes in XML 1.1, will not be accepted as valid by
conformant XML Schema 1.0 processors.
The XML Schema WG has judged that any changes to the existing specification to support XML 1.1 go beyond what could be considered as errata, and so will have to wait for a new version of the specification. As this may take some time, this Note addresses the question of what should be done in the interim to best serve the XML community.
In the sections which follow, a non-normative strategy is set out suggesting a number of changes which processors implementing the XML Schema specification can make to enable sensible and interoperable support for XML 1.1. Any implementation of XML Schema employing such a strategy is strictly speaking non-conformant to the current version of the XML Schema specification. The XML Schema WG none-the-less believes that interoperability will best be served by the availability of such non-conformant processors until such time as a subsequent version of XML Schema addressing this issue normatively is approved.
Consider the following four cases:
C1 vs. C0 in content, e.g. #x83 vs. #x03
Old vs. new name chars in element names, e.g. y
(25th letter in English alphabet) vs.
ij
(25th letter in Dutch alphabet)
Old vs. new name chars in ID-typed content, e.g. y
vs. ij
LF vs NEL in length-specified list-typed content
(ij == U+0133 (#x133) is common in Dutch, e.g. in the word ijs == English ice-cream. It's a good example of something arbitrarily and irritatingly not allowed as a name character in XML 1.0 which is allowed as a name character in 1.1).
In each of the above cases, the first alternative is OK and has the same behaviour with respect to Schema validation in both XML 1.0 and XML 1.1, whereas the second alternative either is not Schema-valid under the strict XML 1.0 interpretation (1-3) or might be expected to have different behaviour between XML 1.0 and XML 1.1 (4).
In other words, if you used a conformant XML Schema validator on the following four instances (Figure 1), using the same schema document (Figure 2) each time, all four would have validity problems.
<?xml version='1.0'?> <root>There's an &#3; here: </root>
<?xml version='1.0'?> <ijs/>
<?xml version='1.0'?> <root id="ij"/>
<?xml version='1.0'?> <!-- There's a NEL character (U+0085) between the 'a' and the 'b' below --> <root list="a b"/>
Note:
<?xml version='1.0'?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="root"> <xs:annotation> <xs:documentation>String content, id attr of type ID, list attr of type [list of token], length 2 </xs:documentation> </xs:annotation> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="list"> <xs:simpleType> <xs:restriction> <xs:simpleType> <xs:list itemType="xs:token"/> </xs:simpleType> <xs:length value="2"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="ijs"/> </xs:schema>
Schema for use with XML documents in Figure 1
The first obvious step for anyone considering modifying an existing XML
Schema processor of any kind to allow XML 1.1 documents is replacing its front
end, presumably currently an XML 1.0 parser, i.e. a parser which converts
only documents with a version='1.0'
XML declaration
(or none), and enforces XML 1.0 well-formedness, with an XML 1.1 parser, i.e.
one which enforces either XML 1.0 or XML 1.1
well-formedness, depending on the version
stated in the XML declaration.
The resulting behaviour will be as follows:
XML 1.0 Declaration | XML 1.1 Declaration | |||||||||||||||||||||
XML 1.0 Content |
|
| ||||||||||||||||||||
XML 1.1 Content |
|
|
Note that by "XML 1.0 Content" is meant documents exemplifying the first member of each of the four pairs of differences introduced above, and by "XML 1.1 Content" is meant documents exemplifying the second member thereof. The top two cells then require no explanation -- these are just the existing XML Schema processor, using an XML 1.1 parser front end, behaving correctly on data it already should be processing correctly.
The bottom two cells are the interesting ones. The bottom-left cell is characterised by what I'll call misaligned XML versions. Let's consider the outcomes here one at a time. Note that these cases cover not only what our putative XML Schema 1.0 processor with an XML 1.1 parser would do, but also what an unmodified 1.0/1.0 processor should do today.
These cases are (correctly) rejected as ill-formed by the front-end XML parser, because they break the 1.0 rules for CDATA content (A) and element names (B).
This case is (correctly) rejected as schema-invalid by the XML Schema processor -- a string with an ij in it is not an NCName per XML 1.0.
This case is (correctly) rejected as schema-invalid by the XML Schema processor -- a 'list' with only NEL separators is a single token when considered as XML 1.0 content.
Moving on to the final, lower-right, cell, this is of course where things get interesting:
The behaviour of this case depends on an implementation choice. Some
processors, which take their input only in the form of encoded
character streams and always use an XML parser as a front end,
depend on that front end to enforce the basic constraint that all
xs:string
s consist of XML 1.0 Chars. Other XML Schema processors,
particularly those which also accept synthetic infosets as input,
enforce that constraint explicitly. It follows that a processor of
the first kind, simply by changing to use an XML 1.1 front-end, will
thereby accept case A documents, but processors of the second kind
will not, because they will still be explicitly checking instances
of xs:string
using its XML Schema 1.0 definition."
This case is (correctly) accepted -- a 'list' with a NEL separator will have been normalized to have a space (#x20) separator by the XML 1.1 front-end parser, and so the XML Schema processor will find two tokens.
This case is (incorrectly) rejected as schema-invalid by the XML
Schema processor -- because the ID
type is derived from the
Name
type, which in turn has a pattern
facet based on
the XML 1.0 definition for Names, which does not allow the ij.
This case is actually very similar to the previous one, but with
respect to a different document, that is, the schema document.
That document is (incorrectly) rejected as schema-invalid by the XML
Schema processor -- because the relevant element name turns up as the value of
the name
attribute on the xs:element
element, and
that attributes type in the schema for schema documents is
NCName
, which is derived from the
Name
type, which in turn has a pattern
facet based on
the XML 1.0 definition for Names, which does not allow the ij.
What does it mean to say the last two results are incorrect? It means that type definitions which enforce XML-1.0-appropriate constraints are being applied to self-identified XML 1.1 data.
The simplest resolution is to simply change the XML Schema processor itself so that the relevant built-in type definitions enforce the XML 1.1 contraints. This will make all the entries in the lower-right quadrant 'OK'.
The XML Schema 1.0 type definitions which include either direct dependencies on XML 1.0 productions (that is, xsd:Name, which depends on XML 1.0 Name, xsd:NMTOKEN, which depends on XML Nmtoken, xsd:QName, which depends on XML 1.0 Letter, Digit, CombiningChar and Extender via XML Namespaces QName and xsd:string, which depends on XML 1.0 Char), as well as those type definitions which inherit from them (that is, xsd:NCName, xsd:ID, xsd:IDREF, xsd:IDREFS, xsd:ENTITY, xsd:ENTITIES, xsd:NMTOKENS, xsd:normalizedString, xsd:token and xsd:language), must use the XML 1.1 productions.
This change will fix the B
and C
results by using the XML 1.1
definition of Name. For processors which don't depend on their XML front-end
parser to check CDATA, it will also fix the incorrect result they get for the
A
example by using the XML 1.1 definition of Char.
The approach selected here isn't perfect. The unconditional switch to 1.1-appropriate type definitions means that version 1.0 XML documents with 1.1-only Name characters in e.g. ID-typed attributes will be valid, where an unmodified Schema 1.0 processor would find them invalid.
The immediate negative consequences of this are presumably small, since anyone already schema-validating their XML 1.0 documents will presumably have corrected any examples of this. But as and when processors implementing this Note are widespread, it may be that documents with such attribute type definitions and values will be created, identified as version 1.0 and validated by modified processors, only to be (correctly) rejected by unmodified processors. We judge the risk of this having serious negative consequences are small enough to be discounted, but it is of course open to implementors to detect this case and issue a warning.
The other weakness is with respect to cases where no front-end XML parser is involved, that is where schema validity assessment is carried out on what are sometimes called "synthetic infosets".
Since on this proposal enforcement of XML 1.0 conformance for
element names and character content is the responsibility of the
front-end parser, it follows that for a synthetic infoset to contain
for example an element with an XML-1.1-only element name will never
be a problem solely because of its name, even if it has a document
information item [version] property with value 1.0
.
Again we judge the likelihood of this causing a problem to be vanishingly small, particularly as any attempt to serialize such a synthetic infoset should raise an error.
To produce an XML-1.1-friendly version of an XML Schema 1.0 processor:
Replace its XML 1.0 front-end parser with an XML 1.1 front-end parser;
Change its implementations of the XML Schema types Name
,
NMTOKEN
, QName
and string
, to use the relevant XML (Namespaces) 1.1 productions;