Copyright © 2016 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This specification defines various APIs for programmatic access to HTML and generic XML parsers by web applications for use in parsing and serializing DOM nodes.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This specification is based on the original work of the DOM Parsing and Serialization Living Specification, though it has diverged in terms of supported features, normative requirements, and algorithm specificity. As appropriate, relevant fixes from the living specification are incorporated into this document.
This document was published by the Web Platform Working Group as a Working Draft.
This document is intended to become a W3C Recommendation.
If you wish to make comments regarding this document, please send them to
[email protected]
(subscribe,
archives)
with DOM-Parsing
at the start of your email's subject.
All comments are welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 September 2015 W3C Process Document.
This specification will not advance to Proposed Recommendation before the spec's test suite is completed and two or more independent implementations pass each test, although no single implementation must pass each test. We expect to meet this criteria no sooner than 24 October 2014. The group will also create an Implementation Report.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can't change the behavior by overriding attributes or methods with custom properties or functions in ECMAScript.
Unless otherwise stated, string comparisons are done in a case-sensitive manner.
If an algorithm calls into another algorithm, any exception that is thrown by the latter (unless it is explicitly caught), must cause the former to terminate, and the exception to be propagated up to its caller.
The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]
Some of the terms used in this specification are defined in [DOM4], [HTML5], and [XML10].
Vendor-specific proprietary extensions to this specification are strongly discouraged. Authors must not use such extensions, as doing so reduces interoperability and fragments the user base, allowing only users of specific user agents to access the content in question.
If vendor-specific extensions are needed, the members should be prefixed by vendor-specific strings to prevent clashes with future versions of this specification. Extensions must be defined so that the use of extensions neither contradicts nor causes the non-conformance of functionality defined in the specification.
When vendor-neutral extensions to this specification are needed, either this specification can be updated accordingly, or an extension specification can be written that overrides the requirements in this specification. When someone applying this specification to their activities decides that they will recognise the requirements of such an extension specification, it becomes an applicable specification for the purposes of conformance requirements in this specification.
The term context object means the object on which the method or attribute being discussed was called.
The HTML namespace is http://www.w3.org/1999/xhtml
.
The XML namespace is http://www.w3.org/XML/1998/namespace
.
The XMLNS namespace is http://www.w3.org/2000/xmlns/
.
The following steps form the fragment parsing algorithm, whose arguments are a markup string and a context element:
If the context element's node document is an HTML document: let algorithm be the HTML fragment parsing algorithm.
If the context element's node document is an XML document: let algorithm be the XML fragment parsing algorithm.
DocumentFragment
whose
node document
is context element's
node document.
This ensures the node document for the new nodes is correct.
The following steps form the
fragment serializing algorithm,
whose arguments are a Node
node and a flag require well-formed:
The XML serialization defined in this document conforms to the requirements of the XML fragment serialization algorithm defined in [HTML5].
To produce an HTML serialization of a
Node
node, the user agent
must run the
HTML
fragment serialization algorithm [HTML5] on node and return the string produced.
To produce an XML serialization of a
Node
node given a
flag require well-formed, run the following steps:
null
.
The context namespace is changed when a
node serializes a different default namespace definition from its parent. The
algorithm assumes no namespace to start.
namespaceURI
and namespace prefix
pairs, where
namespaceURI
values are the map's keys, and prefix
values are
the map's key values. The namespace prefix map
will be populated by previously seen namespaceURIs and their most recent prefix associations
for a subtree. Note: the
namespace prefix map only associates a single
prefix value with a given namespaceURI. During serialization, if different namespace prefixes
are found that map to the same namespaceURI, the last one encountered "wins" by replacing the
existing key value in the map with the new prefix value.
xml
" as the key value.
1
. The generated namespace
prefix index is used to generate a new unique prefix value when no suitable existing
namespace prefix is available to serialize a node's namespaceURI (or the namespaceURI
of one of node's attributes). See the
generate a prefix algorithm.
DOMException
with
name "InvalidStateError
".
An XML serialization differs from an HTML serialization in the following ways:
EmptyElemTag
production of [XML10]).
Otherwise, the algorithm for producing an XML serialization is designed to produce a serialization that is compatible with the HTML parser. For example, elements in the HTML namespace that contain no child nodes are serialized with an explicit begin and end tag rather than using the self-closing tag syntax [XML10].
Per [DOM4], Attr
objects do not inherit from Node
, and thus cannot
be serialized by the XML serialization algorithm.
An attempt to serialize an Attr
object will result in a TypeError
exception [WEBIDL].
To run the XML serialization algorithm on a node given a context namespace namespace, a namespace prefix map prefix map, a generated namespace prefix index prefix index, and a flag require well-formed, the user agent must run the appropriate steps, depending on node's interface:
Element
Run the following algorithm:
true
), and this node's
localName
attribute contains the character ":
" (U+003A COLON) or does not match the
XML Name
production [XML10], then throw an
exception; the serialization of this node would not be a well-formed element.
<
" (U+003C LESS-THAN SIGN).
false
.
false
.
null
.
This above step will update the map with any found namespace prefix
definitions, add the found prefix definitions to the element prefixes list, optionally
set the duplicate prefix definition value, and return a local default namespace
value defined by a default namespace attribute if one exists. Otherwise it returns
null
.
namespaceURI
attribute.
null
, then set ignore
namespace definition attribute to true
.
xml:
" and the value of node's
localName
.
localName
.
The node's prefix is always dropped.
prefix
attribute.
null
.
null
(a suitable namespace prefix is defined
which maps to ns), then:
:
" (U+003A COLON), and node's
localName
.
There exists on this node or the node's ancestry a
namespace prefix definition that defines the node's namespace.
null
(there exists a locally-defined
default namespace declaration attribute), then let inherited ns get the value of
ns.
null
and local default namespace is
null
, then:
:
" (U+003A COLON), and node's
localName
.
" (U+0020 SPACE);
xmlns:
";
="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
"
" (U+0022 QUOTATION MARK).
null
, or local default
namespace is not null
and its value is not equal to ns, then:
true
.
localName
.
" (U+0020 SPACE);
xmlns
";
="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
"
" (U+0022 QUOTATION MARK).
localName
,
let the value of inherited ns be ns, and append the value of
qualified name to markup.
localName
matches any one of the following
void elements:
"area
",
"base
",
"basefont
",
"bgsound
",
"br
",
"col
",
"embed
",
"frame
",
"hr
",
"img
",
"input
",
"keygen
",
"link
",
"menuitem
",
"meta
",
"param
",
"source
",
"track
",
"wbr
";
then append the following to markup, in the order listed:
" (U+0020 SPACE);
/
" (U+002F SOLIDUS).
true
.
/
" (U+002F SOLIDUS) to markup
and set the skip end tag flag to true
.
>
" (U+003E GREATER-THAN SIGN) to markup.
true
, then return
the value of markup and skip the remaining steps. The
node is a leaf-node.
localName
matches the string "template
", then this is a
template
element.
Append to markup the result of running the
XML serialization algorithm on the
template element's
template contents
(a DocumentFragment
),
providing the value of inherited ns for the
context namespace,
map for the namespace prefix map,
prefix index for the
generated namespace prefix index, and the value
of the require well-formed flag. This allows
template content to
round-trip , given the rules for
parsing XHTML documents
[HTML5].
</
" (U+003C LESS-THAN SIGN, U+002F SOLIDUS);
>
" (U+003E GREATER-THAN SIGN).
Document
If the require well-formed flag is set (its value is
true
), and this node has no
documentElement
(the documentElement
attribute's value is null
), then throw an
exception; the serialization of this node would not be a well-formed document.
Otherwise, run the following steps:
doctype
attribute provided the require well-formed flag if node's
doctype
attribute is not null
.
Comment
If the require well-formed flag is set (its value is
true
), and node's
data
contains characters that are not matched by the XML Char
production [XML10] or
contains "--
" (two adjacent U+002D HYPHEN-MINUS characters) or that ends with
a "-
" (U+002D HYPHEN-MINUS) character, then
throw an exception; the serialization of this
node's
data
would not be well-formed.
Return the concatenation of "<!--
", node's
data
, and
"-->
".
Text
true
), and node's
data
contains characters that are not matched by the XML Char
production [XML10],
then throw an exception; the serialization of this
node's
data
would not be well-formed.
data
.
&
" in markup by
"&
".
<
" in markup by
"<
".
>
" in markup by
">
".
DocumentFragment
DocumentType
ProcessingInstruction
true
), and node's
target
contains a ":
" (U+003A COLON) character or is an
ASCII case-insensitive
match for the string "xml
", then throw an
exception; the serialization of this node's
target
would not be well-formed.
true
), and node's
data
contains characters that are not matched by the XML Char
production [XML10] or
contains the string "?>
" (U+003F QUESTION MARK, U+003E GREATER-THAN SIGN),
then throw an exception; the serialization of this
node's
data
would not be well-formed.
To produce a DocumentType serialization of a
Node
node, given a
require well-formed flag, the user agent must return
the result of the following algorithm:
true
and the node's
publicId
attribute contains characters that are not matched by the XML PubidChar
production
[XML10], then throw an exception; the serialization
of this node would not be a well-formed document type declaration.
true
and the node's
systemId
attribute contains characters that are not matched by the XML Char
production
[XML10] or that contains both a ""
" (U+0022 QUOTATION MARK) and a "'
"
(U+0027 APOSTROPHE), then throw an exception; the
serialization of this node would not be a well-formed document type declaration.
<!DOCTYPE
" to markup.
" (U+0020 SPACE) to markup.
name
attribute to markup. For a node belonging to an
HTML document,
the value will be all lowercase.
publicId
is not the empty string then append the following, in the order listed, to markup:
" (U+0020 SPACE);
PUBLIC
";
" (U+0020 SPACE);
"
" (U+0022 QUOTATION MARK);
publicId
attribute;
"
" (U+0022 QUOTATION MARK).
systemId
is not the empty string and the node's
publicId
is set to the empty string, then append the following, in the order listed, to markup:
" (U+0020 SPACE);
SYSTEM
".
systemId
is not the empty string then append the following, in the order listed, to markup:
" (U+0020 SPACE);
"
" (U+0022 QUOTATION MARK);
systemId
attribute;
"
" (U+0022 QUOTATION MARK).
>
" (U+003E GREATER-THAN SIGN) to markup.
To record the namespace information for an
Element
element, given a
namespace prefix map map, an
element prefixes list (initially empty), and a duplicate prefix
definition reference, the user agent must run the following steps:
null
.
attributes
,
in the order they are specified in the element's
attribute list:
The following conditional steps add namespace prefixes
into the element prefixes list and add or replace them in the map.
Only attributes in the XMLNS namespace are
considered (e.g., attributes made to look like namespace declarations via
setAttribute("xmlns:pretend-prefix",
"pretend-namespace")
are not included).
namespaceURI
value.
prefix
.
null
, then attr is a
default namespace declaration. Set the default namespace attr value to
attr's value
and stop running these steps, returning to
Main to visit the next attribute.
null
and attr
is a namespace prefix definition. Run the following steps:
localName
.
value
.
To generate a prefix given a namespace prefix map map, a string new namespace, and a reference to a generated namespace prefix index prefix index, the user agent must run the following steps:
ns
" and
the current numerical value of prefix index.
The XML serialization of the attributes
of an Element
element together with a namespace prefix
map map, a generated prefix index
prefix index reference, a flag ignore namespace definition attribute, a
duplicate prefix definition value, and a flag require well-formed,
is the result of the following algorithm:
namespaceURI
and localName
pairs, and is populated as each attr is processed.
This set is used to [optionally] enforce the well-formed constraint that an
element cannot have two attributes with the same
namespaceURI
and localName
.
This can occur when two otherwise identical attributes on the same element differ only by their
prefix values.
attributes
,
in the order they are specified in the element's
attribute list:
true
), and the localname set contains a tuple whose values match those
of a new tuple consisting of attr's
namespaceURI
attribute and localName
attribute, then throw an exception; the serialization of this
attr would fail to produce a well-formed element serialization.
namespaceURI
attribute and localName
attribute, and add it to the localname set.
namespaceURI
value.
null
.
null
, then run these sub-steps:
prefix
is null
and the ignore namespace definition
attribute flag is true
or the attr's
prefix
is not null
and the attr's
localName
matches the value of duplicate prefix definition, then stop running
these steps and goto Main to visit the next attribute.
" (U+0020 SPACE);
xmlns:
";
="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
"
" (U+0022 QUOTATION MARK).
" (U+0020 SPACE) to result.
null
, then append to result
the concatenation of candidate prefix with ":
" (U+003A COLON).
true
), and this attr's
localName
attribute contains the character ":
" (U+003A COLON) or does not match the XML
Name
production [XML10] or equals "xmlns
" and attribute
namespace is null
, then throw an
exception; the serialization of this attr would not be a well-formed attribute.
localName
;
="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
value
attribute and the require well-formed flag as input;
"
" (U+0022 QUOTATION MARK).
To serialize an attribute value given an attribute value and require well-formed flag, the user agent must run the following steps:
true
), and attribute value contains characters that are not matched
by the XML Char
production [XML10], then
throw an exception; the serialization of this
attribute value would fail to produce a well-formed element serialization.
null
, then return the empty string.
"
" with ""
"
&
" with "&
"
<
" with "<
"
>
" with ">
"
This matches behavior present in browsers, and goes above
and beyond the grammar requirement in the XML specification's AttValue
production [XML10] by also replacing ">
" characters.
DOMParser
interfaceenum SupportedType {
"text/html",
"text/xml",
"application/xml",
"application/xhtml+xml",
"image/svg+xml"
};
The DOMParser()
constructor
must return a new DOMParser
object.
[Constructor]
interface DOMParser {
[NewObject]
Document parseFromString (DOMString str, SupportedType type);
};
parseFromString
The
parseFromString(str, type)
method must run these steps, depending on type:
text/html
"
Parse str with an
HTML parser
, and return the newly
created document.
The scripting flag must be set to "disabled".
meta
elements are not
taken into account for the encoding used, as a Unicode stream is passed into
the parser.
text/xml
"
application/xml
"
application/xhtml+xml
"
image/svg+xml
"
XML parser
.
For all XHTML script
elements parsed using the XML parser
,
the equivalent of the scripting flag must
be set to "disabled".
Document
interface
rather than the XMLDocument
interface.
Let root be a new
Element
, with its
local name
set to "parsererror
" and its
namespace
set to
"http://www.mozilla.org/newlayout/xml/parsererror.xml
".
At this point user agents may append nodes to root, for example to describe the nature of the error.
In any case, the returned
document's
content type
must be the type argument. Additionally, the
document must have a
URL value equal to
the URL of the
active document, a
location value of null
.
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
str | DOMString | ✘ | ✘ | |
type | SupportedType | ✘ | ✘ |
Document
XMLSerializer
interfaceThe XMLSerializer()
constructor must return a new XMLSerializer
object.
[Constructor]
interface XMLSerializer {
DOMString serializeToString (Node root);
};
serializeToString
serializeToString(root)
method must produce an XML serialization of root passing
a value of false
for the require well-formed parameter, and return the result.Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
root | Node | ✘ | ✘ |
DOMString
Element
interfacepartial interface Element {
[CEReactions, TreatNullAs=EmptyString]
attribute DOMString innerHTML;
[CEReactions, TreatNullAs=EmptyString]
attribute DOMString outerHTML;
[CEReactions]
void insertAdjacentHTML (DOMString position, DOMString text);
};
innerHTML
of type DOMStringThe innerHTML
IDL
attribute represents the markup of the
Element
's contents.
innerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element's contents.
Can be set, to replace the contents of the element with nodes parsed from the given string.
In the case of an XML document,
will throw a
DOMException
with name
"InvalidStateError
"
if the Element
cannot be serialized
to XML, and a
DOMException
with name
"SyntaxError
"
if the given string is not well-formed.
On getting, return the result of invoking the
fragment serializing algorithm on the
context object providing true
for the
require well-formed flag (this might throw an exception
instead of returning a string).
On setting, these steps must be run:
outerHTML
of type DOMStringThe outerHTML
IDL
attribute represents the markup of the
Element
and its contents.
outerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element and its contents.
Can be set, to replace the element with nodes parsed from the given string.
In the case of an XML document,
will throw a
DOMException
with name
"InvalidStateError
"
if the element cannot be serialized to XML, and a
DOMException
with name
"SyntaxError
"
if the given string is not well-formed.
Throws a
DOMException
with name
"NoModificationAllowedError
"
if the parent of the element is the
Document
node.
On getting, return the result of invoking the
fragment serializing algorithm on a
fictional node whose only child is the context object
providing true
for the require well-formed
flag (this might throw an exception instead of returning a string).
On setting, the following steps must be run:
Document
, throw a
DOMException
with name
"NoModificationAllowedError
"
exception.
DocumentFragment
, let
parent be a new
Element
with
body
as its
local name,
insertAdjacentHTML
insertAdjacentHTML
(position, text)
Parses the given string text as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:
Throws a DOMException
with name
"SyntaxError
"
if the arguments have invalid values (e.g., in the case of an
XML document, if the given string is
not well-formed).
Throws a
DOMException
with name
"NoModificationAllowedError
"
if the given position isn't possible (e.g. inserting elements
after the root element of a Document
).
The
insertAdjacentHTML(position, text)
method must run these steps:
Let context be the context object's parent.
If context is null or a
document, throw
a
DOMException
with name
"NoModificationAllowedError
".
Throw a DOMException
with name "SyntaxError
".
Element
or the following are all true:
html
", and
let context be a new
Element
with
body
as its
local name,
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
position | DOMString | ✘ | ✘ | |
text | DOMString | ✘ | ✘ |
void
Range
interfacepartial interface Range {
[CEReactions, NewObject]
DocumentFragment createContextualFragment (DOMString fragment);
};
createContextualFragment
createContextualFragment
(markupString)
DocumentFragment
, created
from the markup string given.
The
createContextualFragment(fragment)
method must run these steps:
Let element be as follows, depending on node's interface:
Document
DocumentFragment
Element
Text
Comment
DocumentType
ProcessingInstruction
html
", and
let element be a new element with
body
" as its
local name,
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
fragment | DOMString | ✘ | ✘ |
DocumentFragment
The following is an informative summary of the changes since the last publication of this specification. A complete revision history of the Editor's Drafts of this specification can be found here.
Thanks to Ms2ger [Mozilla] for maintaining the initial drafts of this specification and for its continued improvement in the Living Specification.
Thanks to Victor Costan, Aryeh Gregor, Anne van Kesteren, Arkadiusz Michalski, Simon Pieters, Henri Sivonen, Josh Soref and Boris Zbarsky, for their useful comments.
Special thanks to Ian Hickson for defining the
innerHTML
and
outerHTML
attributes, and the
insertAdjacentHTML()
method in
[HTML5] and his useful comments.