W3C

An XSD datatype for IEEE floating-point decimal

W3C Working Group Note 9 June 2011

This version:
http://www.w3.org/TR/2011/NOTE-xsd-precisionDecimal-20110609/
Latest version:
http://www.w3.org/TR/xsd-precisionDecimal/
Editors:
David Peterson, invited expert (SGMLWorks!) <[email protected]>
C. M. Sperberg-McQueen, Black Mesa Technologies LLC <[email protected]>

Abstract

This document defines a datatype designed for compatibility with IEEE 754 floating-point decimal data, which can be supported by XSD 1.1 processors as an implementation-defined datatype.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a W3C Working Group Note as described in the World Wide Web Consortium Process Document. It contains a definition of a precisionDecimal datatype designed for compatibility with IEEE 754 floating-point decimal numbers.

In its current state, this document contains all the material specific to the precisionDecimal datatype that has appeared in working drafts of [XSD 1.1 Part 2: Datatypes], including some revisions made since the most recent public working draft. It is substantially complete as a specification of the datatype, though some further changes (listed in To-do list (non-normative) (§D)) may be made in a future revision of this document.

Comments on this document should be sent to the W3C XML Schema comments mailing list, [email protected] (archive). Each email message should contain only one comment.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced by the W3C XML Schema Working Group as part of the W3C XML Activity. The authors of this document are the members of the XML Schema Working Group.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Definitions
3 The precisionDecimal datatype
    3.1 Value Space
    3.2 Lexical Mapping
    3.3 Facets
4 Facets for constraining precisionDecimal values
    4.1 totalDigits
totalDigits Validation Rules
    4.2 maxScale
The maxScale Schema Component · XML Representation of maxScale Schema Components · maxScale Validation Rules · Constraints on maxScale Schema Components
    4.3 minScale
The minScale Schema Component · XML Representation of minScale Schema Components · minScale Validation Rules · Constraints on minScale Schema Components
5 Implementation issues
    5.1 Implementation limits
    5.2 Interfacing with XPath
6 Mapping functions

Appendices

A Normative references
B Non-normative references
C Acknowledgements (non-normative)
D To-do list (non-normative)


1 Introduction

This document defines an XSD datatype intended to support the floating-point decimal defined by IEEE 754.

IEEE 754 defines both floating-point binary and floating-point decimal formats. The binary formats have been widely adopted since their initial introduction; if the floating-point decimal formats are also widely adopted for applications, it will be convenient to be able to represent values of that type in XML documents or in other contexts where XSD datatypes are used.

2 Definitions

The following terms are used in this specification with the meanings indicated.

Except as specified below, in this specification terms defined in [XSD 1.1 Part 2: Datatypes] have the meanings given there.

[Definition:]   constraining facet
A schema component whose value may be set or changed during derivation (subject to facet-specific constraints) to control various aspects of a derived datatype.
[Definition:]   fundamental facet
A schema component that provides a limited piece of information about some aspect of a datatype.
[Definition:]  special value
One of the possible values of the ·numericalValue· property of precisionDecimal values, whose only relevent property, for purposes of this document and of [XSD 1.1 Part 2: Datatypes], lie in its being distinct from the other possible values; specifically, positiveInfinity, negativeInfinity, and notANumber.
Informally, any precisionDecimal value whose ·numericalValue· property is such a special value.
Note: The names of these special values are shared with the special values of the float and double datatypes.

3 The precisionDecimal datatype

[Definition:]  The precisionDecimal datatype represents decimal numbers which retain precision; it also includes values for positive and negative infinity and for "not a number", and it differentiates between "positive zero" and "negative zero".  This datatype is introduced to provide a variant of decimal from which may be derived datatypes closely corresponding to the floating-point decimal datatypes described by [IEEE 754-2008]

Note: The precisionDecimal datatype also permits derivation of a datatype closely corresponding to Java BigDecimal, although implementation is permitted to be confined to a much smaller value space.
Note: Users wishing to implement useful operations for this datatype (beyond the equality and order specified herein) are urged to consult [IEEE 754-2008].

The datatype precisionDecimal draws its name from a common usage of the term 'precision' to mean the degree of accuracy with which a quantity is recorded. In this usage, writing a number as '2' or as '2.00' is taken as recording the value with less or more precision.

Note: See the conformance note in [XSD 1.1 Part 2: Datatypes], which applies to this datatype.

next sub-section3.1 Value Space

Properties of precisionDecimal Values
a decimal number, positiveInfinity, negativeInfinity or notANumber
an integer or absent; absent if and only if ·numericalValue· is a special value.
positive, negative, or absent; must be positive if ·numericalValue· is positive or positiveInfinity, must be negative if ·numericalValue· is negative or negativeInfinity, must be absent if and only if ·numericalValue· is notANumber
Note: The ·sign· property is redundant except when ·numericalValue· is zero; in other cases, the ·sign· value is fully determined by the ·numericalValue· value.
Note: As explained below, 'NaN' is the lexical representation of the precisionDecimal value whose ·numericalValue· property has the special value notANumber.  Accordingly, in English text we use 'NaN' to refer to that value.  Similarly we use 'INF' and '−INF' to refer to the two values whose ·numericalValue· properties have the special values positiveInfinity and negativeInfinity.  These three precisionDecimal values are also informally called "not-a-number", "positive infinity", and "negative infinity". The latter two together are called "the infinities".
Note: The datatype defined here is intended to allow the derivation of less general datatypes corresponding to the decimal formats defined by [IEEE 754-2008]. Those formats can be viewed as representing values other than the infinities and NaN as triples (s, q, m), where s is the sign bit, q is the exponent (an integer), and m is the significand (also an integer). (The "q" form of IEEE's exponent is used when treating the significand as an integer.) The precisionDecimal ·numericalValue· is (–1 ^ s) × (10 ^ q) × m and the ·scale· is q. Conversely, for nonzero finite values the sign s is the ·numericalValue· divided by its absolute value, the integer significand m is |·numericalValue·| / (10 ^ ·scale·), and, of course, the exponent q is the ·scale·.
The single NaN of precisionDecimal corresponds both to the signaling NaN and to the quiet NaN of [IEEE 754-2008], which permits the use of a single NaN when values are being transmitted from one system to another via lexical representations.
The individual decimal formats defined by [IEEE 754-2008] are characterized by the range of values allowed for the exponent and by the number of decimal digits available for the significand, which IEEE terms the precision of the format. In any given IEEE 754 decimal format there may be multiple representations representing the same numerical value, one with its significand using the maximum number of significant digits available in the format, and the others with fewer significant digits (when that is possible). Note that in some cases these distinct representations will result in distinct results in operations defined by IEEE 754.
For datatypes derived from precisionDecimal, setting the facet is equivalent to restricting the number of decimal digits available for the significand, and setting the ·minScale· and ·maxScale· facets amounts to controlling the possible values of the exponent q.
Note: precisionDecimal also allows the derivation of the Java BigDecimal class, which corresponds in essentials to a datatype derived from precisionDecimal by limiting the allowed scale, by eliminating the special values, and by ignoring the difference between +0 and –0.
Equality and order for precisionDecimal are defined as follows:
  • Two numerical precisionDecimal values are ordered (or equal) as their ·numericalValue· values are ordered (or equal).  (This means that two zeroes with different ·sign· properties are equal; negative zeroes are not ordered less than positive zeroes.)
  • INF is equal only to itself, and is greater than −INF and all numerical precisionDecimal values.
  • −INF is equal only to itself, and is less than INF and all numerical precisionDecimal values.
  • NaN is incomparable with all values, including itself.

previous sub-section next sub-section3.2 Lexical Mapping

The lexical space of precisionDecimal is the set of all decimal numerals with or without a decimal point, numerals in scientific (exponential) notation, and the character strings 'INF', '+INF', '-INF', and 'NaN'.
Note: The four non-terminals referred to on the right-hand side of the pDecimalRep are defined in [XSD 1.1 Part 2: Datatypes].
The pDecimalRep production is equivalent (after whitespace is removed) to the following regular expression:

(\+|-)?([0-9]+(\.[0-9]*)?|\.[0-9]+)([Ee](\+|-)?[0-9]+)?
|(\+|-)?INF|NaN

The lexical mapping for precisionDecimal is ·precisionDecimalLexicalMap·.  The canonical mapping is ·precisionDecimalCanonicalMap·.

For example, each of the lexical representations shown below is followed by its corresponding value triple (·numericalValue·, ·scale·, and ·sign·) and canonical representation:
  • '3'   ( 3 ,  0 ,  positive )   '3'
  • '3.00'   ( 3 ,  2 ,  positive )   '3.00'
  • '03.00'   ( 3 ,  2 ,  positive )   '3.00'
  • '300'   ( 300 ,  0 ,  positive )   '300'
  • '3.00e2'   ( 300 ,  0 ,  positive )   '300'
  • '3.0e2'   ( 300 ,  −1 ,  positive )   '3.0E2'
  • '30e1'   ( 300 ,  −1 ,  positive )   '3.0E2'
  • '.30e3'   ( 300 ,  −1 ,  positive )   '3.0E2'
Note that the last three examples not only show different lexical representations for the same value, but are of particular interest because values with negative precision can only have lexical representations in scientific notation.
Note: [IEEE 754-2008] expects lexical representations whose exact value is not in the value space to be mapped to the nearest value that is in the value space. When precisionDecimal is restricted, all lexical representations of values dropped from the value space are dropped from the lexical space. One result is that when precisionDecimal is restricted using the or ·maxScale· facets, non-zero digits beyond those required to exactly represent the intended value are not permitted by this specification.
[IEEE 754-2008] permits all case variants of 'INF' and 'NaN', as well as those of 'INFINITY'; in many cases it permits language definitions to prescribe which variants are used. This specification explicitly chooses only 'INF' and 'NaN'. 754 also permits language definitions to prescribe whether '+' shall be used with positive values; this specification makes the '+' optional.
Note: Note: The lexical representations with "unnecessary least significant digits" representations are the only ones lost when precisionDecimal is restricted; shorter and simpler lexical representations will not be eliminated by use of or ·maxScale·. (In contrast, if facets for controlling total digits and scale were available for the floating-point binary types, the effect of restriction would often be inconvenient. In the floating-point binary types, a simple decimal numeral will sometimes have no exact value and so the number will be rounded to a binary approximation. Any restriction of the value space which dropped that approximate value would automatically also drop the simple decimal numeral from the lexical space. For example, the number one-tenth is in the value space neither of float nor of double; the string '0.1' maps, in double, to 0.1000000000000000055511151231257827021181583404541015625. If the , ·minScale·, and ·maxScale· facets were available for double (they are not) and were used to define the float type (again, they are not), the value just mentioned would be dropped, and the literal '0.1' would be dropped along with it, instead of mapping (as in fact it does) to the value 0.100000001490116119384765625.
This interaction between datatype restriction and rounding has as a consequence that it will typically be more convenient for users if restricted-precision numeric types are derived from precisionDecimal than it would be if they were derived from float or double.

previous sub-section 3.3 Facets

The precisionDecimal datatype and all datatypes derived from it by restriction have the following ·constraining facets· with fixed values; these facets must not be changed from the values shown:

Datatypes derived by restriction from precisionDecimal may also specify values for the following ·constraining facets·:

The precisionDecimal datatype has the following values for its ·fundamental facets·:

4 Facets for constraining precisionDecimal values

The assertions, enumeration, maxInclusive, maxExclusive, minExclusive, minInclusive, and pattern facets defined by [XSD 1.1 Part 2: Datatypes] can be used in deriving new types from precisionDecimal by restriction; their meaning and use are as documented in [XSD 1.1 Part 2: Datatypes].

The totalDigits facet defined by [XSD 1.1 Part 2: Datatypes] can also be used. Its meaning, when applied to values of type precisionDecimal, is described in totalDigits (§4.1). Except as otherwise specified in totalDigits (§4.1), all the constraints on the use of the totalDigits facet described in [XSD 1.1 Part 2: Datatypes] continue to apply when the facet is used with precisionDecimal values.

In addition, two facets not defined by [XSD 1.1 Part 2: Datatypes] can be used when restricting precisionDecimal. They are described in maxScale (§4.2) and minScale (§4.3).

next sub-section4.1 totalDigits

For precisionDecimal values with ·numericalValue· of nV and ·scale· of aP, if the value of is t, the effect of the facet is to require that (aP + 1 + log10(| nV |) div 1) ≤ t, for values other than zero, NaN, and the infinities. This means in effect that values are expressible in scientific notation using at most t digits for the coefficient.

4.1.1 totalDigits Validation Rules

Validation Rule: totalDigits Valid
A precisionDecimal value v is facet-valid with respect to a facet with a value of t if and only if one of the following is true:
1 v is a precisionDecimal value with ·numericalValue· of positiveInfinity, negativeInfinity, notANumber, or zero.
2 v is a precisionDecimal value with ·numericalValue· of nV and ·scale· of aP, and v is not NaN, INF, -INF, or zero, and (aP + 1 + log10(| nV |) div 1) ≤ t.

previous sub-section next sub-section4.2 maxScale

[Definition:]   maxScale places an upper limit on the ·scale· of precisionDecimal values: if the {value} of maxScale = m, then only values with ·scale·m are retained in the value space. As a consequence, every value in the value space will have ·numericalValue· equal to i / 10n for some integers i and n, with nm. The {value} of maxScale must be an integer. If it is negative, the numeric values of the datatype are restricted to multiples of 10 (or 100, or …).

The term 'maxScale' is chosen to reflect the fact that it restricts the value space to those values that can be represented lexically in scientific notation using an integer coefficient and a scale (or negative exponent) no greater than maxScale. (It has nothing to do with the use of the term 'scale' to denote the radix or base of a notation.) Note that maxScale does not restrict the lexical space directly; a lexical representation that adds non-significant leading or trailing zero digits, or that uses a lower exponent with a non-integer coefficient is still permitted.

Example
The following is the definition of a user-defined datatype which could be used to represent a floating-point decimal datatype which allows seven decimal digits for the coefficient and exponents between −95 and 96. Note that the scale is −1 times the exponent.
<simpleType name='decimal32'>
  <restriction base='precisionDecimal'>
    <totalDigits value='7'/>
    <maxScale value='95'/>
    <minScale value='-96'/>
  </restriction>
</simpleType>

4.2.1 The maxScale Schema Component

Schema Component: maxScale
{annotations}
A sequence of Annotation components.
{value}
An xs:integer value. Required.
{fixed}
An xs:boolean value. Required.

If {fixed} is true, then types for which the current type is the {base type definition} must not specify a value for maxScale other than {value}.

4.2.2 XML Representation of maxScale Schema Components

The XML representation for a maxScale schema component is a <maxScale> element information item. The correspondences between the properties of the information item and properties of the component are as follows:

XML Representation Summary: maxScale Element Information Item

maxScale Schema Component
Property
Representation
 
The actual value of the value [attribute]
 
The actual value of the fixed [attribute], if present, otherwise false
 

4.2.3 maxScale Validation Rules

Validation Rule: maxScale Valid
A precisionDecimal value v is facet-valid with respect to maxScale if and only if one of the following is true:
1 v has ·scale· less than or equal to the {value} of maxScale.
2 The ·scale· of v is absent.

4.2.4 Constraints on maxScale Schema Components

Schema Component Constraint: maxScale valid restriction
It is an error if maxScale is among the members of {facets} of {base type definition} and {value} is greater than the {value} of that maxScale.

previous sub-section 4.3 minScale

[Definition:]   minScale places a lower limit on the ·scale· of precisionDecimal values. If the {value} of minScale is m, then the value space is restricted to values with ·scale·m. As a consequence, every value in the value space will have ·numericalValue· equal to i / 10n for some integers i and n, with nm.

The term minScale is chosen to reflect the fact that it restricts the value space to those values that can be represented lexically in exponential form using an integer coefficient and a scale (negative exponent) at least as large as minScale. Note that it does not restrict the lexical space directly; a lexical representation that adds additional leading zero digits, or that uses a larger exponent (and a correspondingly smaller coefficient) is still permitted.

Example
The following is the definition of a user-defined datatype which could be used to represent amounts in a decimal currency; it corresponds to a SQL column definition of DECIMAL(8,2). The effect is to allow values between -999,999.99 and 999,999.99, with a fixed interval of 0.01 between values.
<simpleType name='price'>
  <restriction base='precisionDecimal'>
    <totalDigits value='8'/>
    <minScale value='2'/>
    <maxScale value='2'/>
  </restriction>
</simpleType>

4.3.1 The minScale Schema Component

Schema Component: minScale
{annotations}
A sequence of Annotation components.
{value}
An xs:integer value. Required.
{fixed}
An xs:boolean value. Required.

If {fixed} is true, then types for which the current type is the {base type definition} must not specify a value for minScale other than {value}.

4.3.2 XML Representation of minScale Schema Components

The XML representation for a minScale schema component is a <minScale> element information item. The correspondences between the properties of the information item and properties of the component are as follows:

XML Representation Summary: minScale Element Information Item

minScale Schema Component
Property
Representation
 
The actual value of the value [attribute]
 
The actual value of the fixed [attribute], if present, otherwise false
 

4.3.3 minScale Validation Rules

Validation Rule: minScale Valid
A precisionDecimal value v is facet-valid with respect to minScale if and only if one of the following is true:
1 v has ·scale· greater than or equal to the {value} of minScale.
2 The ·scale· of v is absent.

4.3.4 Constraints on minScale Schema Components

Schema Component Constraint: minScale less than or equal to maxScale
It is an error for minScale to be greater than maxScale.

Note that it is not an error for minScale to be greater than .

Schema Component Constraint: minScale valid restriction
It is an error if minScale is among the members of {facets} of {base type definition} and {value} is less than the {value} of that minScale.

5 Implementation issues

next sub-section5.1 Implementation limits

All minimally conforming processors must support all precisionDecimal values in the value space of the otherwise unconstrained derived datatype for which is set to sixteen, maxScale to 369, and minScale to −398.

Note: The conformance limits given in the text correspond to those of the decimal64 type defined in [IEEE 754-2008], which can be stored in a 64-bit field. The XML Schema Working Group recommends that implementors support limits corresponding to those of the decimal128 type. This entails supporting the values in the value space of the otherwise unconstrained datatype for which is set to 34, maxScale to 6111, and minScale to −6176.

previous sub-section 5.2 Interfacing with XPath

[XPath 2.0] does not currently require support for the precisionDecimal datatype, but conforming XPath processors are allowed to support additional primitive data types, including precisionDecimal.

For interoperability, it is recommended that XPath processors intending to support precisionDecimal as an additional primitive data type follow the recommendations in [Chamberlin 2006]. If the XPath processor used to evaluate XPath expressions supports precisionDecimal, then any precisionDecimal values in the post-schema-validation infoset should be labeled as xs:precisionDecimal in the data model instance and handled accordingly in XPath.

If the XPath processor does not support precisionDecimal, then any precisionDecimal values in the post-schema-validation infoset should be mapped into decimal, unless the ·numericalValue· is not a decimal number (for example, it is positiveInfinity, negativeInfinity, or notANumber), in which case they should be mapped to float. Whether this is done by altering the type information in the partial post-schema-validation infoset, or by altering the usual rules for mapping from a post-schema-validation infoset to an [XDM] data model instance, or by treating precisionDecimal as an unknown type which is coerced as appropriate into decimal or float by the XPath processor, is implementation-defined and out of scope for this specification.

As a consequence of the above variability, it is possible that XPath expressions that perform various kinds of type introspections will produce different results when different XPath processors are used. If the schema author wishes to ensure interoperable results, such introspections will need to be avoided.

6 Mapping functions

The functions defined below make frequent reference to functions defined in [XSD 1.1 Part 2: Datatypes].

Auxiliary Functions for Reading Instances of pDecimalRep
·decimalPtPrecision· (LEX) → integer
Maps a decimalPtNumeral onto an integer; used in calculating the ·scale· of a precisionDecimal value.
Arguments:
LEXmatches decimalPtNumeral
Result:
an integer
Algorithm:
LEX necessarily contains a decimal point ('.') and may optionally contain a following fracFrag F consisting of some number n of digits.
Return
  • n   when F is present, and
  • 0   otherwise.
·scientificPrecision· (LEX) → integer
Maps a scientificNotationNumeral onto an integer; used in calculating the ·scale· of a precisionDecimal value.
Arguments: Result:
an integer
Algorithm:
LEX necessarily contains a noDecimalPtNumeral or decimalPtNumeral C preceding an exponent indicator ('E' or 'e', and a following noDecimalPtNumeral E.
Return
Lexical Mapping
Arguments:
LEXmatches pDecimalRep
Result: Algorithm:
Let pD be a complete precisionDecimal value.
  1. Set pD's ·numericalValue· to
  2. Set pD's ·scale· to
  3. Set pD's ·sign· to
    • absent   when LEX is 'NaN'
    • negative   when the first character of LEX is '-', and
    • positive   otherwise.
  4. Return pD.
Canonical Mapping
Arguments:
pDa precisionDecimal value
Result:
a literal matching pDecimalRep
Algorithm:
  1. Let nV be the ·numericalValue· of pD.
    Let aP be the ·scale· of pD.
  2. If pD is one of NaN, INF, or -INF, then return specialRepCanonicalMap(nV).
  3. Otherwise, if nV is an integer and aP is zero and 1E-6 ≤ nV ≤ 1E6, then return noDecimalPtCanonicalMap(nV).
  4. Otherwise, if aP is greater than zero and 1E-6 ≤ nV ≤ 1E6, then let s be decimalPtCanonicalMap(nV). Let f be the number of fractional digits in s; f will invariably be less than or equal to aP. Return the concatenation of s with aP − f occurrences of the digit '0'.
  5. Otherwise, it will be the case that nV is less than 1E−6 or greater than 1E6, or that aP is less than zero.  Let
    • m be the part of s which precedes the "E".
    • n be the part of s which follows the "E".
    • p be the integer denoted by n.
    • f be the number of fractional digits in m; note that f will invariably be less than or equal to aP + p.
    • t be a string consisting of aP + p − f occurrences of the digit '0', preceded by a decimal point if and only if m contains no decimal point and aP + p − f is greater than zero.
    Return the concatenation m & t & 'E' & n.

A Normative references

IEEE 754-2008
IEEE. IEEE Standard for Floating-Point Arithmetic. 29 August 2008. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4610933
XSD 1.1 Part 2: Datatypes
World Wide Web Consortium. W3C XML Schema Definition Language (XSD) 1.1 Part 2: Structures, ed. David Peterson et al. W3C Working Draft 3 December 2009. Available at: http://www.w3.org/TR/xmlschema11-2/

B Non-normative references

Chamberlin 2006
Chamberlin, Don. Impact of precisionDecimal on XPath and XQuery Email to the W3C XML Query and W3C XSL Working Groups, 16 May 2006. Available online at http://www.w3.org/XML/2007/dc.pd.xml and http://www.w3.org/XML/2007/dc.pd.html
XDM
World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Data Model (XDM), ed. Mary Fernández et al. W3C Recommendation 23 January 2007. Available at: http://www.w3.org/TR/xpath-datamodel/.
XPath 2.0
World Wide Web Consortium. XML Path Language 2.0, ed. Anders Berglund et al. 23 January 2007. Available at: http://www.w3.org/TR/2007/REC-xpath20-20070123/
XSD 1.1 Part 1: Structures
World Wide Web Consortium. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures, ed. Shudi (Sandy) Gao 高殊镝, C. M. Sperberg-McQueen, and Henry S. Thompson. W3C Working Draft 3 December 2009. Available at: http://www.w3.org/TR/xmlschema11-1/

C Acknowledgements (non-normative)

This document was prepared by the W3C XML Schema Working Group. The members at the time of publication were:

D To-do list (non-normative)

Some changes are expected to be made in future work on this document: