Sun, 31 Jan 2016 15:42:31 -0800
change //dev.w3.org/csswg/ urls to //drafts.csswg.org/
1 <h1>CSS Syntax Module Level 3</h1>
3 <pre class='metadata'>
4 Shortname: css-syntax
5 Level: 3
6 Status: ED
7 Work Status: Testing
8 Group: csswg
9 ED: https://drafts.csswg.org/css-syntax/
10 TR: http://www.w3.org/TR/css-syntax-3/
11 Previous Version: http://www.w3.org/TR/2014/CR-css-syntax-3-20140220/
12 Previous Version: http://www.w3.org/TR/2013/WD-css-syntax-3-20131105/
13 Previous Version: http://www.w3.org/TR/2013/WD-css-syntax-3-20130919/
14 Editor: Tab Atkins Jr., Google, http://xanthir.com/contact/
15 Editor: Simon Sapin, Mozilla, http://exyr.org/about/
16 Abstract: This module describes, in general terms, the basic structure and syntax of CSS stylesheets. It defines, in detail, the syntax and parsing of CSS - how to turn a stream of bytes into a meaningful stylesheet.
17 Ignored Terms: <keyframes-name>, <keyframe-rule>, <keyframe-selector>, <translation-value>, <media-query-list>, <unicode-range-token>
18 Ignored Vars: +b, -b, foo
19 Link Defaults: css-text-decor-3 (property) text-decoration, css-color-3 (property) color, css-transforms-1 (function) translatex()
20 </pre>
22 <h2 id="intro">
23 Introduction</h2>
25 <em>This section is not normative.</em>
27 This module defines the abstract syntax and parsing of CSS stylesheets
28 and other things which use CSS syntax
29 (such as the HTML <code>style</code> attribute).
31 It defines algorithms for converting a stream of Unicode <a>code points</a>
32 (in other words, text)
33 into a stream of CSS tokens,
34 and then further into CSS objects
35 such as stylesheets, rules, and declarations.
37 <h3 id="placement">
38 Module interactions</h3>
40 This module defines the syntax and parsing of CSS stylesheets.
41 It supersedes the lexical scanner and grammar defined in CSS 2.1.
43 <h2 id='syntax-description'>
44 Description of CSS's Syntax</h2>
46 <em>This section is not normative.</em>
48 A CSS document is a series of <a>style rules</a>--
49 which are <a>qualified rules</a> that apply styles to elements in a document--
50 and <a>at-rules</a>--
51 which define special processing rules or values for the CSS document.
53 A <a>qualified rule</a> starts with a prelude
54 then has a {}-wrapped block containing a sequence of declarations.
55 The meaning of the prelude varies based on the context that the rule appears in--
56 for <a>style rules</a>, it's a selector which specifies what elements the declarations will apply to.
57 Each declaration has a name,
58 followed by a colon and the declaration value.
59 Declarations are separated by semicolons.
61 <div class='example'>
63 A typical rule might look something like this:
65 <pre>
66 p > a {
67 color: blue;
68 text-decoration: underline;
69 }
70 </pre>
72 In the above rule, "<code>p > a</code>" is the selector,
73 which, if the source document is HTML,
74 selects any <code><a></code> elements that are children of a <code><p></code> element.
76 "<code>color: blue</code>" is a declaration specifying that,
77 for the elements that match the selector,
78 their 'color' property should have the value ''blue''.
79 Similarly, their 'text-decoration' property should have the value ''underline''.
80 </div>
82 <a>At-rules</a> are all different, but they have a basic structure in common.
83 They start with an "@" <a>code point</a> followed by their name as a CSS keyword.
84 Some <a>at-rules</a> are simple statements,
85 with their name followed by more CSS values to specify their behavior,
86 and finally ended by a semicolon.
87 Others are blocks;
88 they can have CSS values following their name,
89 but they end with a {}-wrapped block,
90 similar to a <a>qualified rule</a>.
91 Even the contents of these blocks are specific to the given <a>at-rule</a>:
92 sometimes they contain a sequence of declarations, like a <a>qualified rule</a>;
93 other times, they may contain additional blocks, or at-rules, or other structures altogether.
95 <div class='example'>
97 Here are several examples of <a>at-rules</a> that illustrate the varied syntax they may contain.
99 <pre>@import "my-styles.css";</pre>
101 The ''@import'' <a>at-rule</a> is a simple statement.
102 After its name, it takes a single string or ''url()'' function to indicate the stylesheet that it should import.
104 <pre>
105 @page :left {
106 margin-left: 4cm;
107 margin-right: 3cm;
108 }
109 </pre>
111 The ''@page'' <a>at-rule</a> consists of an optional page selector (the '':left'' pseudoclass),
112 followed by a block of properties that apply to the page when printed.
113 In this way, it's very similar to a normal style rule,
114 except that its properties don't apply to any "element",
115 but rather the page itself.
117 <pre>
118 @media print {
119 body { font-size: 10pt }
120 }
121 </pre>
123 The ''@media'' <a>at-rule</a> begins with a media type
124 and a list of optional media queries.
125 Its block contains entire rules,
126 which are only applied when the ''@media''s conditions are fulfilled.
127 </div>
129 Property names and <a>at-rule</a> names are always <a>identifiers</a>,
130 which have to start with a letter or a hyphen followed by a letter,
131 and then can contain letters, numbers, hyphens, or underscores.
132 You can include any <a>code point</a> at all,
133 even ones that CSS uses in its syntax,
134 by <a>escaping</a> it.
136 The syntax of selectors is defined in the <a href="http://www.w3.org/TR/selectors/">Selectors spec</a>.
137 Similarly, the syntax of the wide variety of CSS values is defined in the <a href="http://www.w3.org/TR/css3-values/">Values & Units spec</a>.
138 The special syntaxes of individual <a>at-rules</a> can be found in the specs that define them.
140 <h3 id="escaping">
141 Escaping</h3>
143 <em>This section is not normative.</em>
145 Any Unicode <a>code point</a> can be included in an <a>identifier</a> or quoted string
146 by <dfn id="escape-codepoint">escaping</dfn> it.
147 CSS escape sequences start with a backslash (\), and continue with:
149 <ul>
150 <li>
151 Any Unicode <a>code point</a> that is not a <a>hex digits</a> or a <a>newline</a>.
152 The escape sequence is replaced by that <a>code point</a>.
153 <li>
154 Or one to six <a>hex digits</a>, followed by an optional <a>whitespace</a>.
155 The escape sequence is replaced by the Unicode <a>code point</a>
156 whose value is given by the hexadecimal digits.
157 This optional whitespace allow hexadecimal escape sequences
158 to be followed by "real" hex digits.
160 <p class=example>
161 An <a>identifier</a> with the value "&B"
162 could be written as ''\26 B'' or ''\000026B''.
164 <p class=note>
165 A "real" space after the escape sequence must be doubled.
166 </ul>
168 <h3 id="error-handling">
169 Error Handling</h3>
171 <em>This section is not normative.</em>
173 When errors occur in CSS,
174 the parser attempts to recover gracefully,
175 throwing away only the minimum amount of content
176 before returning to parsing as normal.
177 This is because errors aren't always mistakes--
178 new syntax looks like an error to an old parser,
179 and it's useful to be able to add new syntax to the language
180 without worrying about stylesheets that include it being completely broken in older UAs.
182 The precise error-recovery behavior is detailed in the parser itself,
183 but it's simple enough that a short description is fairly accurate.
185 <ul>
186 <li>
187 At the "top level" of a stylesheet,
188 an <<at-keyword-token>> starts an at-rule.
189 Anything else starts a qualified rule,
190 and is included in the rule's prelude.
191 This may produce an invalid selector,
192 but that's not the concern of the CSS parser--
193 at worst, it means the selector will match nothing.
195 <li>
196 Once an at-rule starts,
197 nothing is invalid from the parser's standpoint;
198 it's all part of the at-rule's prelude.
199 Encountering a <<semicolon-token>> ends the at-rule immediately,
200 while encountering an opening curly-brace <a href="#tokendef-open-curly"><{-token></a> starts the at-rule's body.
201 The at-rule seeks forward, matching blocks (content surrounded by (), {}, or [])
202 until it finds a closing curly-brace <a href="#tokendef-close-curly"><}-token></a> that isn't matched by anything else
203 or inside of another block.
204 The contents of the at-rule are then interpreted according to the at-rule's own grammar.
206 <li>
207 Qualified rules work similarly,
208 except that semicolons don't end them;
209 instead, they are just taken in as part of the rule's prelude.
210 When the first {} block is found,
211 the contents are always interpreted as a list of declarations.
213 <li>
214 When interpreting a list of declarations,
215 unknown syntax at any point causes the parser to throw away whatever declaration it's currently building,
216 and seek forward until it finds a semicolon (or the end of the block).
217 It then starts fresh, trying to parse a declaration again.
219 <li>
220 If the stylesheet ends while any rule, declaration, function, string, etc. are still open,
221 everything is automatically closed.
222 This doesn't make them invalid,
223 though they may be incomplete
224 and thus thrown away when they are verified against their grammar.
225 </ul>
227 After each construct (declaration, style rule, at-rule) is parsed,
228 the user agent checks it against its expected grammar.
229 If it does not match the grammar,
230 it's <dfn export for=css>invalid</dfn>,
231 and gets <dfn export for=css>ignored</dfn> by the UA,
232 which treats it as if it wasn't there at all.
234 <!--
235 ââââââââ âââââââ ââ ââ ââââââââ ââ ââ ââââ ââââââââ ââââ ââ ââ ââââââ
236 ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ ââ âââ ââ ââ ââ
237 ââ ââ ââ ââ ââ ââ ââââ ââ ââ ââ ââ ââââ ââ ââ
238 ââ ââ ââ âââââ ââââââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââââ
239 ââ ââ ââ ââ ââ ââ ââ ââââ ââ ââ ââ ââ ââââ ââ ââ
240 ââ ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ ââ âââ ââ ââ
241 ââ âââââââ ââ ââ ââââââââ ââ ââ ââââ ââââââââ ââââ ââ ââ ââââââ
242 -->
244 <h2 id="tokenizing-and-parsing">
245 Tokenizing and Parsing CSS</h2>
247 User agents must use the parsing rules described in this specification
248 to generate the CSSOM trees from text/css resources.
249 Together, these rules define what is referred to as the CSS parser.
251 This specification defines the parsing rules for CSS documents,
252 whether they are syntactically correct or not.
253 Certain points in the parsing algorithm are said to be a <dfn lt="parse error">parse errors</dfn>.
254 The error handling for parse errors is well-defined:
255 user agents must either act as described below when encountering such problems,
256 or must abort processing at the first error that they encounter for which they do not wish to apply the rules described below.
258 Conformance checkers must report at least one parse error condition to the user
259 if one or more parse error conditions exist in the document
260 and must not report parse error conditions
261 if none exist in the document.
262 Conformance checkers may report more than one parse error condition if more than one parse error condition exists in the document.
263 Conformance checkers are not required to recover from parse errors,
264 but if they do,
265 they must recover in the same way as user agents.
267 <h3 id="parsing-overview">
268 Overview of the Parsing Model</h3>
270 The input to the CSS parsing process consists of a stream of Unicode <a>code points</a>,
271 which is passed through a tokenization stage followed by a tree construction stage.
272 The output is a CSSStyleSheet object.
274 Note: Implementations that do not support scripting do not have to actually create a CSSOM CSSStyleSheet object,
275 but the CSSOM tree in such cases is still used as the model for the rest of the specification.
277 <h3 id="input-byte-stream">
278 The input byte stream</h3>
280 When parsing a stylesheet,
281 the stream of Unicode <a>code points</a> that comprises the input to the tokenization stage
282 might be initially seen by the user agent as a stream of bytes
283 (typically coming over the network or from the local file system).
284 If so, the user agent must decode these bytes into <a>code points</a> according to a particular character encoding.
286 To decode the stream of bytes into a stream of <a>code points</a>,
287 UAs must use the <dfn><a href="http://encoding.spec.whatwg.org/#decode">decode</a></dfn> algorithm
288 defined in [[!ENCODING]],
289 with the fallback encoding determined as follows.
291 Note: The <a>decode</a> algorithm
292 gives precedence to a byte order mark (BOM),
293 and only uses the fallback when none is found.
295 To <dfn>determine the fallback encoding</dfn>:
297 <ol>
298 <li>
299 If HTTP or equivalent protocol defines an encoding (e.g. via the charset parameter of the Content-Type header),
300 <dfn export><a href="http://encoding.spec.whatwg.org/#concept-encoding-get">get an encoding</a></dfn> [[!ENCODING]]
301 for the specified value.
302 If that does not return failure,
303 use the return value as the fallback encoding.
305 <li>
306 Otherwise, check the byte stream.
307 If the first 1024 bytes of the stream begin with the hex sequence
309 <pre>40 63 68 61 72 73 65 74 20 22 XX* 22 3B</pre>
311 where each <code>XX</code> byte is a value between 0<sub>16</sub> and 21<sub>16</sub> inclusive
312 or a value between 23<sub>16</sub> and 7F<sub>16</sub> inclusive,
313 then <a>get an encoding</a>
314 for the sequence of <code>XX</code> bytes,
315 interpreted as <code>ASCII</code>.
317 <details class='why'>
318 <summary>What does that byte sequence mean?</summary>
320 The byte sequence above,
321 when decoded as ASCII,
322 is the string "<code>@charset "â¦";</code>",
323 where the "â¦" is the sequence of bytes corresponding to the encoding's label.
324 </details>
326 If the return value was <code>utf-16be</code> or <code>utf-16le</code>,
327 use <code>utf-8</code> as the fallback encoding;
328 if it was anything else except failure,
329 use the return value as the fallback encoding.
331 <details class='why'>
332 <summary>Why use utf-8 when the declaration says utf-16?</summary>
334 The bytes of the encoding declaration spell out â<code>@charset "â¦";</code>â in ASCII,
335 but UTF-16 is not ASCII-compatible.
336 Either you've typed in complete gibberish (like <code>ä£æ¡¡ç³æ´â¢utf-16beâ»</code>) to get the right bytes in the document,
337 which we don't want to encourage,
338 or your document is actually in an ASCII-compatible encoding
339 and your encoding declaration is lying.
341 Either way, defaulting to UTF-8 is a decent answer.
343 As well, this mimics the behavior of HTML's <code><meta charset></code> attribute.
344 </details>
346 Note: Note that the syntax of an encoding declaration <em>looks like</em> the syntax of an <a>at-rule</a> named ''@charset'',
347 but no such rule actually exists,
348 and the rules for how you can write it are much more restrictive than they would normally be for recognizing such a rule.
349 A number of things you can do in CSS that would produce a valid <a>@charset</a> rule (if one existed),
350 such as using multiple spaces, comments, or single quotes,
351 will cause the encoding declaration to not be recognized.
352 This behavior keeps the encoding declaration as simple as possible,
353 and thus maximizes the likelihood of it being implemented correctly.
355 <li>
356 Otherwise, if an <a>environment encoding</a> is provided by the referring document,
357 use that as the fallback encoding.
359 <li>
360 Otherwise, use <code>utf-8</code> as the fallback encoding.
361 </ol>
363 <div class='note'>
365 Though UTF-8 is the default encoding for the web,
366 and many newer web-based file formats assume or require UTF-8 encoding,
367 CSS was created before it was clear which encoding would win,
368 and thus can't automatically assume the stylesheet is UTF-8.
370 Stylesheet authors <em>should</em> author their stylesheets in UTF-8,
371 and ensure that either an HTTP header (or equivalent method) declares the encoding of the stylesheet to be UTF-8,
372 or that the referring document declares its encoding to be UTF-8.
373 (In HTML, this is done by adding a <code><meta charset=utf-8></code> element to the head of the document.)
375 If neither of these options are available,
376 authors should begin the stylesheet with a UTF-8 BOM
377 or the exact characters
379 <pre>@charset "utf-8";</pre>
380 </div>
382 Document languages that refer to CSS stylesheets that are decoded from bytes
383 may define an <dfn export>environment encoding</dfn> for each such stylesheet,
384 which is used as a fallback when other encoding hints are not available or can not be used.
386 The concept of <a>environment encoding</a> only exists for compatibility with legacy content.
387 New formats and new linking mechanisms <b>should not</b> provide an <a>environment encoding</a>,
388 so the stylesheet defaults to UTF-8 instead in the absence of more explicit information.
390 Note: [[HTML]] defines <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-stylesheet">the environment encoding for <code><link rel=stylesheet></code></a>.
392 Note: [[CSSOM]] defines <a href="https://drafts.csswg.org/cssom/#requirements-on-user-agents-implementing-the-xml-stylesheet-processing-instruction">the environment encoding for <code><xml-stylesheet?></code></a>.
394 Note: [[CSS3CASCADE]] defines <a href="https://drafts.csswg.org/css-cascade/#at-ruledef-import">the environment encoding for <code>@import</code></a>.
397 <h3 id="input-preprocessing">
398 Preprocessing the input stream</h3>
400 The input stream consists of the <a>code points</a>
401 pushed into it as the input byte stream is decoded.
403 Before sending the input stream to the tokenizer,
404 implementations must make the following <a>code point</a> substitutions:
406 <ul>
407 <li>
408 Replace any U+000D CARRIAGE RETURN (CR) <a>code points</a>,
409 U+000C FORM FEED (FF) <a>code points</a>,
410 or pairs of U+000D CARRIAGE RETURN (CR) followed by U+000A LINE FEED (LF),
411 by a single U+000A LINE FEED (LF) <a>code point</a>.
413 <li>
414 Replace any U+0000 NULL <a>code point</a> with U+FFFD REPLACEMENT CHARACTER (�).
415 </ul>
418 <h2 id="tokenization">
419 Tokenization</h2>
421 Implementations must act as if they used the following algorithms to tokenize CSS.
422 To transform a stream of <a>code points</a> into a stream of tokens,
423 repeatedly <a>consume a token</a>
424 until an <<EOF-token>> is reached,
425 collecting the returned tokens into a stream.
426 Each call to the <a>consume a token</a> algorithm
427 returns a single token,
428 so it can also be used "on-demand" to tokenize a stream of <a>code points</a> <em>during</em> parsing,
429 if so desired.
431 The output of the tokenization step is a stream of zero or more of the following tokens:
432 <dfn><ident-token></dfn>,
433 <dfn><function-token></dfn>,
434 <dfn><at-keyword-token></dfn>,
435 <dfn><hash-token></dfn>,
436 <dfn><string-token></dfn>,
437 <dfn><bad-string-token></dfn>,
438 <dfn><url-token></dfn>,
439 <dfn><bad-url-token></dfn>,
440 <dfn><delim-token></dfn>,
441 <dfn><number-token></dfn>,
442 <dfn><percentage-token></dfn>,
443 <dfn><dimension-token></dfn>,
444 <dfn><include-match-token></dfn>,
445 <dfn><dash-match-token></dfn>,
446 <dfn><prefix-match-token></dfn>,
447 <dfn><suffix-match-token></dfn>,
448 <dfn><substring-match-token></dfn>,
449 <dfn><column-token></dfn>,
450 <dfn><whitespace-token></dfn>,
451 <dfn><CDO-token></dfn>,
452 <dfn><CDC-token></dfn>,
453 <dfn><colon-token></dfn>,
454 <dfn><semicolon-token></dfn>,
455 <dfn><comma-token></dfn>,
456 <dfn id="tokendef-open-square"><[-token></dfn>,
457 <dfn id="tokendef-close-square"><]-token></dfn>,
458 <dfn id="tokendef-open-paren"><(-token></dfn>,
459 <dfn id="tokendef-close-paren"><)-token></dfn>,
460 <dfn id="tokendef-open-curly"><{-token></dfn>,
461 and <dfn id="tokendef-close-curly"><}-token></dfn>.
463 <ul>
464 <li>
465 <<ident-token>>, <<function-token>>, <<at-keyword-token>>, <<hash-token>>, <<string-token>>, and <<url-token>> have a value composed of zero or more <a>code points</a>.
466 Additionally, hash tokens have a type flag set to either "id" or "unrestricted". The type flag defaults to "unrestricted" if not otherwise set.
468 <li>
469 <<delim-token>> has a value composed of a single <a>code point</a>.
471 <li>
472 <<number-token>>, <<percentage-token>>, and <<dimension-token>> have a representation composed of one or more <a>code points</a>, and a numeric value.
473 <<number-token>> and <<dimension-token>> additionally have a type flag set to either "integer" or "number". The type flag defaults to "integer" if not otherwise set.
474 <<dimension-token>> additionally have a unit composed of one or more <a>code points</a>.
475 </ul>
477 Note: The type flag of hash tokens is used in the Selectors syntax [[SELECT]].
478 Only hash tokens with the "id" type are valid <a href="http://www.w3.org/TR/selectors/#id-selectors">ID selectors</a>.
480 Note: As a technical note,
481 the tokenizer defined here requires only three <a>code points</a> of look-ahead.
482 The tokens it produces are designed to allow Selectors to be parsed with one token of look-ahead,
483 and additional tokens may be added in the future to maintain this invariant.
486 <!--
487 ââââââââ âââ ââââ ââ ââââââââ âââââââ âââ ââââââââ
488 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
489 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
490 ââââââââ ââ ââ ââ ââ ââââââââ ââ ââ ââ ââ ââ ââ
491 ââ ââ âââââââââ ââ ââ ââ ââ ââ ââ âââââââââ ââ ââ
492 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
493 ââ ââ ââ ââ ââââ ââââââââ ââ ââ âââââââ ââ ââ ââââââââ
494 -->
496 <h3 id='token-diagrams'>
497 Token Railroad Diagrams</h3>
499 <em>This section is non-normative.</em>
501 This section presents an informative view of the tokenizer,
502 in the form of railroad diagrams.
503 Railroad diagrams are more compact than an explicit parser,
504 but often easier to read than an regular expression.
506 These diagrams are <em>informative</em> and <em>incomplete</em>;
507 they describe the grammar of "correct" tokens,
508 but do not describe error-handling at all.
509 They are provided solely to make it easier to get an intuitive grasp of the syntax of each token.
511 Diagrams with names such as <em><foo-token></em> represent tokens.
512 The rest are productions referred to by other diagrams.
514 <dl>
515 <dt id="comment-diagram">comment
516 <dd>
517 <pre class='railroad'>
518 T: /*
519 Star:
520 N: anything but * followed by /
521 T: */
522 </pre>
524 <dt id="newline-diagram">newline
525 <dd>
526 <pre class='railroad'>
527 Choice:
528 T: \n
529 T: \r\n
530 T: \r
531 T: \f
532 </pre>
534 <dt id="whitespace-diagram">whitespace
535 <dd>
536 <pre class='railroad'>
537 Choice:
538 T: space
539 T: \t
540 N: newline
541 </pre>
543 <dt id="hex-digit-diagram">hex digit
544 <dd>
545 <pre class='railroad'>
546 N: 0-9 a-f or A-F
547 </pre>
549 <dt id="escape-diagram">escape
550 <dd>
551 <pre class='railroad'>
552 T: \
553 Choice:
554 N: not newline or hex digit
555 Seq:
556 Plus:
557 N: hex digit
558 C: 1-6 times
559 Opt: skip
560 N: whitespace
561 </pre>
563 <dt id="whitespace-token-diagram"><<whitespace-token>>
564 <dd>
565 <pre class='railroad'>
566 Plus:
567 N: whitespace
568 </pre>
570 <dt id="ws*-diagram">ws*
571 <dd>
572 <pre class='railroad'>
573 Star:
574 N: <whitespace-token>
575 </pre>
577 <dt id="ident-token-diagram"><<ident-token>>
578 <dd>
579 <pre class='railroad'>
580 Or: 1
581 T: --
582 Seq:
583 Opt: skip
584 T: -
585 Or:
586 N: a-z A-Z _ or non-ASCII
587 N: escape
588 Star:
589 Or:
590 N: a-z A-Z 0-9 _ - or non-ASCII
591 N: escape
592 </pre>
594 <dt id="function-token-diagram"><<function-token>>
595 <dd>
596 <pre class='railroad'>
597 N: <ident-token>
598 T: (
599 </pre>
601 <dt id="at-keyword-token-diagram"><<at-keyword-token>>
602 <dd>
603 <pre class='railroad'>
604 T: @
605 N: <ident-token>
606 </pre>
608 <dt id="hash-token-diagram"><<hash-token>>
609 <dd>
610 <pre class='railroad'>
611 T: #
612 Plus:
613 Choice:
614 N:a-z A-Z 0-9 _ - or non-ASCII
615 N: escape
616 </pre>
618 <dt id="string-token-diagram"><<string-token>>
619 <dd>
620 <pre class='railroad'>
621 Choice:
622 Seq:
623 T: "
624 Star:
625 Choice:
626 N: not " \ or newline
627 N: escape
628 Seq:
629 T: \
630 N: newline
631 T: "
632 Seq:
633 T: '
634 Star:
635 Choice:
636 N: not ' \ or newline
637 N: escape
638 Seq:
639 T: \
640 N: newline
641 T: '
642 </pre>
644 <dt id="url-token-diagram"><<url-token>>
645 <dd>
646 <pre class='railroad'>
647 N: <ident-token "url">
648 T: (
649 N: ws*
650 Star:
651 Choice:
652 N: not " ' ( ) \ ws or non-printable
653 N: escape
654 N: ws*
655 T: )
656 </pre>
658 <dt id="number-token-diagram"><<number-token>>
659 <dd>
660 <pre class='railroad'>
661 Choice: 1
662 T: +
663 Skip:
664 T: -
665 Choice:
666 Seq:
667 Plus:
668 N: digit
669 T: .
670 Plus:
671 N: digit
672 Plus:
673 N: digit
674 Seq:
675 T: .
676 Plus:
677 N: digit
678 Opt: skip
679 Seq:
680 Choice:
681 T: e
682 T: E
683 Choice: 1
684 T: +
685 S:
686 T: -
687 Plus:
688 N: digit
689 </pre>
691 <dt id="dimension-token-diagram"><<dimension-token>>
692 <dd>
693 <pre class='railroad'>
694 N: <number-token>
695 N: <ident-token>
696 </pre>
698 <dt id="percentage-token-diagram"><<percentage-token>>
699 <dd>
700 <pre class='railroad'>
701 N: <number-token>
702 T: %
703 </pre>
705 <dt id="include-match-token-diagram"><<include-match-token>>
706 <dd>
707 <pre class='railroad'>
708 T: ~=
709 </pre>
711 <dt id="dash-match-token-diagram"><<dash-match-token>>
712 <dd>
713 <pre class='railroad'>
714 T: |=
715 </pre>
717 <dt id="prefix-match-token-diagram"><<prefix-match-token>>
718 <dd>
719 <pre class='railroad'>
720 T: ^=
721 </pre>
723 <dt id="suffix-match-token-diagram"><<suffix-match-token>>
724 <dd>
725 <pre class='railroad'>
726 T: $=
727 </pre>
729 <dt id="substring-match-token-diagram"><<substring-match-token>>
730 <dd>
731 <pre class='railroad'>
732 T: *=
733 </pre>
735 <dt id="column-token-diagram"><<column-token>>
736 <dd>
737 <pre class='railroad'>
738 T: ||
739 </pre>
741 <dt id="CDO-token-diagram"><<CDO-token>>
742 <dd>
743 <pre class='railroad'>
744 T: <!--
745 </pre>
747 <dt id="CDC-token-diagram"><<CDC-token>>
748 <dd>
749 <pre class='railroad'>
750 T: -->
751 </pre>
752 </dl>
754 <!--
755 ââââââââ ââââââââ ââ ââ ââââââ
756 ââ ââ ââ âââ ââ ââ ââ
757 ââ ââ ââ ââââ ââ ââ
758 ââ ââ ââââââ ââ ââ ââ ââââââ
759 ââ ââ ââ ââ ââââ ââ
760 ââ ââ ââ ââ âââ ââ ââ
761 ââââââââ ââ ââ ââ ââââââ
762 -->
764 <h3 id="tokenizer-definitions">
765 Definitions</h3>
767 This section defines several terms used during the tokenization phase.
769 <dl export>
770 <dt><dfn>code point</dfn>
771 <dd>
772 A <a href="http://unicode.org/glossary/#code_point">Unicode code point</a>. [[!UNICODE]]
773 Any value in the Unicode codespace; that is, the range of integers from 0 to (hexadecimal) 10FFFF.
775 <dt><dfn>next input code point</dfn>
776 <dd>
777 The first <a>code point</a> in the input stream that has not yet been consumed.
779 <dt><dfn>current input code point</dfn>
780 <dd>
781 The last <a>code point</a> to have been consumed.
783 <dt><dfn>reconsume the current input code point</dfn>
784 <dd>
785 Push the <a>current input code point</a> back onto the front of the input stream,
786 so that the next time you are instructed to consume the <a>next input code point</a>,
787 it will instead reconsume the <a>current input code point</a>.
789 <dt><dfn>EOF code point</dfn>
790 <dd>
791 A conceptual <a>code point</a> representing the end of the input stream.
792 Whenever the input stream is empty,
793 the <a>next input code point</a> is always an EOF code point.
795 <dt><dfn export>digit</dfn>
796 <dd>
797 A <a>code point</a> between U+0030 DIGIT ZERO (0) and U+0039 DIGIT NINE (9).
799 <dt><dfn export>hex digit</dfn>
800 <dd>
801 A <a>digit</a>,
802 or a <a>code point</a> between U+0041 LATIN CAPITAL LETTER A (A) and U+0046 LATIN CAPITAL LETTER F (F),
803 or a <a>code point</a> between U+0061 LATIN SMALL LETTER A (a) and U+0066 LATIN SMALL LETTER F (f).
805 <dt><dfn export>uppercase letter</dfn>
806 <dd>
807 A <a>code point</a> between U+0041 LATIN CAPITAL LETTER A (A) and U+005A LATIN CAPITAL LETTER Z (Z).
809 <dt><dfn export>lowercase letter</dfn>
810 <dd>
811 A <a>code point</a> between U+0061 LATIN SMALL LETTER A (a) and U+007A LATIN SMALL LETTER Z (z).
813 <dt><dfn export>letter</dfn>
814 <dd>
815 An <a>uppercase letter</a>
816 or a <a>lowercase letter</a>.
818 <dt><dfn export>non-ASCII code point</dfn>
819 <dd>
820 A <a>code point</a> with a value equal to or greater than U+0080 <control>.
822 <dt><dfn export>name-start code point</dfn>
823 <dd>
824 A <a>letter</a>,
825 a <a>non-ASCII code point</a>,
826 or U+005F LOW LINE (_).
828 <dt><dfn export>name code point</dfn>
829 <dd>
830 A <a>name-start code point</a>,
831 a <a>digit</a>,
832 or U+002D HYPHEN-MINUS (-).
834 <dt><dfn export>non-printable code point</dfn>
835 <dd>
836 A <a>code point</a> between U+0000 NULL and U+0008 BACKSPACE,
837 or U+000B LINE TABULATION,
838 or a <a>code point</a> between U+000E SHIFT OUT and U+001F INFORMATION SEPARATOR ONE,
839 or U+007F DELETE.
841 <dt><dfn export>newline</dfn>
842 <dd>
843 U+000A LINE FEED.
844 <span class='note'>
845 Note that U+000D CARRIAGE RETURN and U+000C FORM FEED are not included in this definition,
846 as they are converted to U+000A LINE FEED during <a href="#input-preprocessing">preprocessing</a>.
847 </span>
849 <dt><dfn export>whitespace</dfn>
850 <dd>A <a>newline</a>, U+0009 CHARACTER TABULATION, or U+0020 SPACE.
852 <dt><dfn export>surrogate code point</dfn>
853 <dd>
854 A <a>code point</a> between U+D800 and U+DFFF inclusive.
856 <dt><dfn export>maximum allowed code point</dfn>
857 <dd>The greatest <a>code point</a> defined by Unicode: U+10FFFF.
859 <dt><dfn export>identifier</dfn>
860 <dd>
861 A portion of the CSS source that has the same syntax as an <<ident-token>>.
862 Also appears in <<at-keyword-token>>,
863 <<function-token>>,
864 <<hash-token>> with the "id" type flag,
865 and the unit of <<dimension-token>>.
867 </dl>
869 <!--
870 ââââââââ âââââââ ââ ââ ââââââââ ââ ââ ââââ ââââââââ ââââââââ ââââââââ
871 ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ ââ ââ ââ
872 ââ ââ ââ ââ ââ ââ ââââ ââ ââ ââ ââ ââ ââ
873 ââ ââ ââ âââââ ââââââ ââ ââ ââ ââ ââ ââââââ ââââââââ
874 ââ ââ ââ ââ ââ ââ ââ ââââ ââ ââ ââ ââ ââ
875 ââ ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ ââ ââ
876 ââ âââââââ ââ ââ ââââââââ ââ ââ ââââ ââââââââ ââââââââ ââ ââ
877 -->
879 <h3 id="tokenizer-algorithms">
880 Tokenizer Algorithms</h3>
882 The algorithms defined in this section transform a stream of <a>code points</a> into a stream of tokens.
884 <h4 id="consume-token">
885 Consume a token</h4>
887 This section describes how to <dfn>consume a token</dfn> from a stream of <a>code points</a>.
888 It will return a single token of any type.
890 <a>Consume comments</a>.
892 Consume the <a>next input code point</a>.
894 <dl>
895 <dt><a>whitespace</a>
896 <dd>
897 Consume as much <a>whitespace</a> as possible.
898 Return a <<whitespace-token>>.
900 <dt>U+0022 QUOTATION MARK (")
901 <dd>
902 <a>Consume a string token</a>
903 and return it.
905 <dt>U+0023 NUMBER SIGN (#)
906 <dd>
907 If the <a>next input code point</a> is a <a>name code point</a>
908 or the <a lt="next input code point">next two input code points</a>
909 <a>are a valid escape</a>,
910 then:
912 <ol>
913 <li>
914 Create a <<hash-token>>.
916 <li>
917 If the <a lt="next input code point">next 3 input code points</a> <a>would start an identifier</a>,
918 set the <<hash-token>>âs type flag to "id".
920 <li>
921 <a>Consume a name</a>,
922 and set the <<hash-token>>âs value to the returned string.
924 <li>
925 Return the <<hash-token>>.
926 </ol>
928 Otherwise,
929 return a <<delim-token>>
930 with its value set to the <a>current input code point</a>.
932 <dt>U+0024 DOLLAR SIGN ($)
933 <dd>
934 If the <a>next input code point</a> is
935 U+003D EQUALS SIGN (=),
936 consume it
937 and return a <<suffix-match-token>>.
939 Otherwise,
940 emit a <<delim-token>>
941 with its value set to the <a>current input code point</a>.
943 <dt>U+0027 APOSTROPHE (')
944 <dd>
945 <a>Consume a string token</a>
946 and return it.
948 <dt>U+0028 LEFT PARENTHESIS (()
949 <dd>
950 Return a <a href="#tokendef-open-paren"><(-token></a>.
952 <dt>U+0029 RIGHT PARENTHESIS ())
953 <dd>
954 Return a <a href="#tokendef-close-paren"><)-token></a>.
956 <dt>U+002A ASTERISK (*)
957 <dd>
958 If the <a>next input code point</a> is
959 U+003D EQUALS SIGN (=),
960 consume it
961 and return a <<substring-match-token>>.
963 Otherwise,
964 return a <<delim-token>>
965 with its value set to the <a>current input code point</a>.
967 <dt>U+002B PLUS SIGN (+)
968 <dd>
969 If the input stream <a>starts with a number</a>,
970 <a>reconsume the current input code point</a>,
971 <a>consume a numeric token</a>
972 and return it.
974 Otherwise,
975 return a <<delim-token>>
976 with its value set to the <a>current input code point</a>.
978 <dt>U+002C COMMA (,)
979 <dd>
980 Return a <<comma-token>>.
982 <dt>U+002D HYPHEN-MINUS (-)
983 <dd>
984 If the input stream <a>starts with a number</a>,
985 <a>reconsume the current input code point</a>,
986 <a>consume a numeric token</a>,
987 and return it.
989 Otherwise,
990 if the <a lt="next input code point">next 2 input code points</a> are
991 U+002D HYPHEN-MINUS
992 U+003E GREATER-THAN SIGN
993 (->),
994 consume them
995 and return a <<CDC-token>>.
997 Otherwise,
998 if the input stream <a>starts with an identifier</a>,
999 <a>reconsume the current input code point</a>,
1000 <a>consume an ident-like token</a>,
1001 and return it.
1003 Otherwise,
1004 return a <<delim-token>>
1005 with its value set to the <a>current input code point</a>.
1007 <dt>U+002E FULL STOP (.)
1008 <dd>
1009 If the input stream <a>starts with a number</a>,
1010 <a>reconsume the current input code point</a>,
1011 <a>consume a numeric token</a>,
1012 and return it.
1014 Otherwise,
1015 return a <<delim-token>>
1016 with its value set to the <a>current input code point</a>.
1018 <dt>U+003A COLON (:)
1019 <dd>
1020 Return a <<colon-token>>.
1022 <dt>U+003B SEMICOLON (;)
1023 <dd>
1024 Return a <<semicolon-token>>.
1026 <dt>U+003C LESS-THAN SIGN (<)
1027 <dd>
1028 If the <a lt="next input code point">next 3 input code points</a> are
1029 U+0021 EXCLAMATION MARK
1030 U+002D HYPHEN-MINUS
1031 U+002D HYPHEN-MINUS
1032 (!--),
1033 consume them
1034 and return a <<CDO-token>>.
1036 Otherwise,
1037 return a <<delim-token>>
1038 with its value set to the <a>current input code point</a>.
1040 <dt>U+0040 COMMERCIAL AT (@)
1041 <dd>
1042 If the <a lt="next input code point">next 3 input code points</a>
1043 <a>would start an identifier</a>,
1044 <a>consume a name</a>,
1045 create an <<at-keyword-token>> with its value set to the returned value,
1046 and return it.
1048 Otherwise,
1049 return a <<delim-token>>
1050 with its value set to the <a>current input code point</a>.
1052 <dt>U+005B LEFT SQUARE BRACKET ([)
1053 <dd>
1054 Return a <a href="#tokendef-open-square"><[-token></a>.
1056 <dt>U+005C REVERSE SOLIDUS (\)
1057 <dd>
1058 If the input stream <a>starts with a valid escape</a>,
1059 <a>reconsume the current input code point</a>,
1060 <a>consume an ident-like token</a>,
1061 and return it.
1063 Otherwise,
1064 this is a <a>parse error</a>.
1065 Return a <<delim-token>>
1066 with its value set to the <a>current input code point</a>.
1068 <dt>U+005D RIGHT SQUARE BRACKET (])
1069 <dd>
1070 Return a <a href="#tokendef-close-square"><]-token></a>.
1072 <dt>U+005E CIRCUMFLEX ACCENT (^)
1073 <dd>
1074 If the <a>next input code point</a> is
1075 U+003D EQUALS SIGN (=),
1076 consume it
1077 and return a <<prefix-match-token>>.
1079 Otherwise,
1080 return a <<delim-token>>
1081 with its value set to the <a>current input code point</a>.
1083 <dt>U+007B LEFT CURLY BRACKET ({)
1084 <dd>
1085 Return a <a href="#tokendef-open-curly"><{-token></a>.
1087 <dt>U+007D RIGHT CURLY BRACKET (})
1088 <dd>
1089 Return a <a href="#tokendef-close-curly"><}-token></a>.
1091 <dt><a>digit</a>
1092 <dd>
1093 <a>Reconsume the current input code point</a>,
1094 <a>consume a numeric token</a>,
1095 and return it.
1097 <dt><a>name-start code point</a>
1098 <dd>
1099 <a>Reconsume the current input code point</a>,
1100 <a>consume an ident-like token</a>,
1101 and return it.
1103 <dt>U+007C VERTICAL LINE (|)
1104 <dd>
1105 If the <a>next input code point</a> is
1106 U+003D EQUALS SIGN (=),
1107 consume it
1108 and return a <<dash-match-token>>.
1110 Otherwise,
1111 if the <a>next input code point</a> is
1112 U+007C VERTICAL LINE (|),
1113 consume it
1114 and return a <<column-token>>.
1116 Otherwise,
1117 return a <<delim-token>>
1118 with its value set to the <a>current input code point</a>.
1120 <dt>U+007E TILDE (~)
1121 <dd>
1122 If the <a>next input code point</a> is
1123 U+003D EQUALS SIGN (=),
1124 consume it
1125 and return an <<include-match-token>>.
1127 Otherwise,
1128 return a <<delim-token>>
1129 with its value set to the <a>current input code point</a>.
1131 <dt>EOF
1132 <dd>
1133 Return an <<EOF-token>>.
1135 <dt>anything else
1136 <dd>
1137 Return a <<delim-token>>
1138 with its value set to the <a>current input code point</a>.
1139 </dl>
1142 <h4 id="consume-comment">
1143 Consume comments</h4>
1145 This section describes how to <dfn>consume comments</dfn> from a stream of <a>code points</a>.
1146 It returns nothing.
1148 If the <a lt="next input code point">next two input code point</a> are
1149 U+002F SOLIDUS (/) followed by a U+002A ASTERISK (*),
1150 consume them
1151 and all following <a>code points</a> up to and including
1152 the first U+002A ASTERISK (*) followed by a U+002F SOLIDUS (/),
1153 or up to an EOF code point.
1154 Return to the start of this step.
1156 If the preceding paragraph ended by consuming an EOF code point,
1157 this is a <a>parse error</a>.
1159 Return nothing.
1162 <h4 id="consume-numeric-token">
1163 Consume a numeric token</h4>
1165 This section describes how to <dfn>consume a numeric token</dfn> from a stream of <a>code points</a>.
1166 It returns either a <<number-token>>, <<percentage-token>>, or <<dimension-token>>.
1168 <a>Consume a number</a>.
1170 If the <a lt="next input code point">next 3 input code points</a> <a>would start an identifier</a>,
1171 then:
1173 <ol>
1174 <li>Create a <<dimension-token>> with the same representation, value, and type flag as the returned number,
1175 and a unit set initially to the empty string.
1177 <li><a>Consume a name</a>.
1178 Set the <<dimension-token>>âs unit to the returned value.
1180 <li>Return the <<dimension-token>>.
1181 </ol>
1183 Otherwise,
1184 if the <a>next input code point</a> is U+0025 PERCENTAGE SIGN (%),
1185 consume it.
1186 Create a <<percentage-token>> with the same representation and value as the returned number,
1187 and return it.
1189 Otherwise,
1190 create a <<number-token>> with the same representation, value, and type flag as the returned number,
1191 and return it.
1194 <h4 id="consume-ident-like-token">
1195 Consume an ident-like token</h4>
1197 This section describes how to <dfn>consume an ident-like token</dfn> from a stream of <a>code points</a>.
1198 It returns an <<ident-token>>, <<function-token>>, <<url-token>>, or <<bad-url-token>>.
1200 <a>Consume a name</a>.
1202 If the returned string's value is an <a>ASCII case-insensitive</a> match for "url",
1203 and the <a>next input code point</a> is U+0028 LEFT PARENTHESIS ((),
1204 consume it.
1205 While the <a lt="next input code point">next two input code points</a> are <a>whitespace</a>,
1206 consume the <a>next input code point</a>.
1207 If the <a lt="next input code point">next one or two input code points</a> are U+0022 QUOTATION MARK ("),
1208 U+0027 APOSTROPHE ('),
1209 or <a>whitespace</a> followed by U+0022 QUOTATION MARK (") orU+0027 APOSTROPHE ('),
1210 then create a <<function-token>>
1211 with its value set to the returned string
1212 and return it.
1213 Otherwise,
1214 <a>consume a url token</a>,
1215 and return it.
1217 Otherwise,
1218 if the <a>next input code point</a> is U+0028 LEFT PARENTHESIS ((),
1219 consume it.
1220 Create a <<function-token>>
1221 with its value set to the returned string
1222 and return it.
1224 Otherwise,
1225 create an <<ident-token>>
1226 with its value set to the returned string
1227 and return it.
1230 <h4 id="consume-string-token">
1231 Consume a string token</h4>
1233 This section describes how to <dfn>consume a string token</dfn> from a stream of <a>code points</a>.
1234 It returns either a <<string-token>> or <<bad-string-token>>.
1236 This algorithm may be called with an <var>ending code point</var>,
1237 which denotes the <a>code point</a> that ends the string.
1238 If an <var>ending code point</var> is not specified,
1239 the <a>current input code point</a> is used.
1241 Initially create a <<string-token>> with its value set to the empty string.
1243 Repeatedly consume the <a>next input code point</a> from the stream:
1245 <dl>
1246 <dt><var>ending code point</var>
1247 <dt>EOF
1248 <dd>
1249 Return the <<string-token>>.
1251 <dt><a>newline</a>
1252 <dd>
1253 This is a <a>parse error</a>.
1254 <a>Reconsume the current input code point</a>,
1255 create a <<bad-string-token>>, and return it.
1257 <dt>U+005C REVERSE SOLIDUS (\)
1258 <dd>
1259 If the <a>next input code point</a> is EOF,
1260 do nothing.
1262 Otherwise,
1263 if the <a>next input code point</a> is a newline,
1264 consume it.
1266 Otherwise,
1267 <span class=note>(the stream <a>starts with a valid escape</a>)</span>
1268 <a>consume an escaped code point</a>
1269 and append the returned <a>code point</a> to the <<string-token>>âs value.
1271 <dt>anything else
1272 <dd>
1273 Append the <a>current input code point</a> to the <<string-token>>âs value.
1274 </dl>
1277 <h4 id="consume-url-token">
1278 Consume a url token</h4>
1280 This section describes how to <dfn>consume a url token</dfn> from a stream of <a>code points</a>.
1281 It returns either a <<url-token>> or a <<bad-url-token>>.
1283 Note: This algorithm assumes that the initial "url(" has already been consumed.
1285 <ol>
1286 <li>
1287 Initially create a <<url-token>> with its value set to the empty string.
1289 <li>
1290 Consume as much <a>whitespace</a> as possible.
1292 <li>
1293 If the <a>next input code point</a> is EOF,
1294 return the <<url-token>>.
1296 <li>
1297 Repeatedly consume the <a>next input code point</a> from the stream:
1299 <dl>
1300 <dt>U+0029 RIGHT PARENTHESIS ())
1301 <dt>EOF
1302 <dd>
1303 Return the <<url-token>>.
1305 <dt><a>whitespace</a>
1306 <dd>
1307 Consume as much <a>whitespace</a> as possible.
1308 If the <a>next input code point</a> is U+0029 RIGHT PARENTHESIS ()) or EOF,
1309 consume it and return the <<url-token>>;
1310 otherwise,
1311 <a>consume the remnants of a bad url</a>,
1312 create a <<bad-url-token>>,
1313 and return it.
1315 <dt>U+0022 QUOTATION MARK (")
1316 <dt>U+0027 APOSTROPHE (')
1317 <dt>U+0028 LEFT PARENTHESIS (()
1318 <dt><a>non-printable code point</a>
1319 <dd>
1320 This is a <a>parse error</a>.
1321 <a>Consume the remnants of a bad url</a>,
1322 create a <<bad-url-token>>,
1323 and return it.
1325 <dt>U+005C REVERSE SOLIDUS
1326 <dd>
1327 If the stream <a>starts with a valid escape</a>,
1328 <a>consume an escaped code point</a>
1329 and append the returned <a>code point</a> to the <<url-token>>âs value.
1331 Otherwise,
1332 this is a <a>parse error</a>.
1333 <a>Consume the remnants of a bad url</a>,
1334 create a <<bad-url-token>>,
1335 and return it.
1337 <dt>anything else
1338 <dd>
1339 Append the <a>current input code point</a>
1340 to the <<url-token>>âs value.
1341 </dl>
1342 </ol>
1345 <h4 id="consume-escaped-code-point">
1346 Consume an escaped code point</h4>
1348 This section describes how to <dfn>consume an escaped code point</dfn>.
1349 It assumes that the U+005C REVERSE SOLIDUS (\) has already been consumed
1350 and that the next input code point has already been verified
1351 to not be a <a>newline</a>.
1352 It will return a <a>code points</a>.
1354 Consume the <a>next input code point</a>.
1356 <dl>
1357 <dt><a>hex digit</a>
1358 <dd>
1359 Consume as many <a>hex digits</a> as possible, but no more than 5.
1360 <span class='note'>Note that this means 1-6 hex digits have been consumed in total.</span>
1361 If the <a>next input code point</a> is
1362 <a>whitespace</a>,
1363 consume it as well.
1364 Interpret the <a>hex digits</a> as a hexadecimal number.
1365 If this number is zero,
1366 or is for a <a>surrogate code point</a>,
1367 or is greater than the <a>maximum allowed code point</a>,
1368 return U+FFFD REPLACEMENT CHARACTER (�).
1369 Otherwise, return the <a>code point</a> with that value.
1371 <dt>EOF code point
1372 <dd>
1373 Return U+FFFD REPLACEMENT CHARACTER (�).
1375 <dt>anything else
1376 <dd>
1377 Return the <a>current input code point</a>.
1378 </dl>
1381 <h4 id="starts-with-a-valid-escape">
1382 Check if two code points are a valid escape</h4>
1384 This section describes how to <dfn lt="check if two code points are a valid escape|are a valid escape|starts with a valid escape">check if two code points are a valid escape</dfn>.
1385 The algorithm described here can be called explicitly with two <a>code points</a>,
1386 or can be called with the input stream itself.
1387 In the latter case, the two <a>code points</a> in question are
1388 the <a>current input code point</a>
1389 and the <a>next input code point</a>,
1390 in that order.
1392 Note: This algorithm will not consume any additional <a>code point</a>.
1394 If the first <a>code point</a> is not U+005C REVERSE SOLIDUS (\),
1395 return false.
1397 Otherwise,
1398 if the second <a>code point</a> is a <a>newline</a>,
1399 return false.
1401 Otherwise, return true.
1404 <h4 id="would-start-an-identifier">
1405 Check if three code points would start an identifier</dfn></h4>
1407 This section describes how to <dfn lt="check if three code points would start an identifier|starts with an identifier|start with an identifier|would start an identifier">check if three code points would start an <a>identifier</a></dfn>.
1408 The algorithm described here can be called explicitly with three <a>code points</a>,
1409 or can be called with the input stream itself.
1410 In the latter case, the three <a>code points</a> in question are
1411 the <a>current input code point</a>
1412 and the <a lt="next input code point">next two input code points</a>,
1413 in that order.
1415 Note: This algorithm will not consume any additional <a>code points</a>.
1417 Look at the first <a>code point</a>:
1419 <dl>
1420 <dt>U+002D HYPHEN-MINUS
1421 <dd>
1422 If the second <a>code point</a> is a <a>name-start code point</a>
1423 or a U+002D HYPHEN-MINUS,
1424 or the second and third <a>code points</a> <a>are a valid escape</a>,
1425 return true.
1426 Otherwise, return false.
1428 <dt><a>name-start code point</a>
1429 <dd>
1430 Return true.
1432 <dt>U+005C REVERSE SOLIDUS (\)
1433 <dd>
1434 If the first and second <a>code points</a> <a>are a valid escape</a>,
1435 return true.
1436 Otherwise, return false.
1438 <dt>anything else
1439 <dd>
1440 Return false.
1441 </dl>
1443 <h4 id="starts-with-a-number">
1444 Check if three code points would start a number</h4>
1446 This section describes how to <dfn lt="check if three code points would start a number|starts with a number|start with a number|would start a number">check if three code points would start a number</dfn>.
1447 The algorithm described here can be called explicitly with three <a>code points</a>,
1448 or can be called with the input stream itself.
1449 In the latter case, the three <a>code points</a> in question are
1450 the <a>current input code point</a>
1451 and the <a lt="next input code point">next two input code points</a>,
1452 in that order.
1454 Note: This algorithm will not consume any additional <a>code points</a>.
1456 Look at the first <a>code point</a>:
1458 <dl>
1459 <dt>U+002B PLUS SIGN (+)
1460 <dt>U+002D HYPHEN-MINUS (-)
1461 <dd>
1462 If the second <a>code point</a>
1463 is a <a>digit</a>,
1464 return true.
1466 Otherwise,
1467 if the second <a>code point</a>
1468 is a U+002E FULL STOP (.)
1469 and the third <a>code point</a>
1470 is a <a>digit</a>,
1471 return true.
1473 Otherwise, return false.
1475 <dt>U+002E FULL STOP (.)
1476 <dd>
1477 If the second <a>code point</a>
1478 is a <a>digit</a>,
1479 return true.
1480 Otherwise, return false.
1482 <dt><a>digit</a>
1483 <dd>
1484 Return true.
1486 <dt>anything else
1487 <dd>
1488 Return false.
1489 </dl>
1492 <h4 id="consume-name">
1493 Consume a name</h4>
1495 This section describes how to <dfn>consume a name</dfn> from a stream of <a>code points</a>.
1496 It returns a string containing
1497 the largest name that can be formed from adjacent <a>code points</a> in the stream, starting from the first.
1499 Note: This algorithm does not do the verification of the first few <a>code points</a>
1500 that are necessary to ensure the returned <a>code points</a> would constitute an <<ident-token>>.
1501 If that is the intended use,
1502 ensure that the stream <a>starts with an identifier</a>
1503 before calling this algorithm.
1505 Let <var>result</var> initially be an empty string.
1507 Repeatedly consume the <a>next input code point</a> from the stream:
1509 <dl>
1510 <dt><a>name code point</a>
1511 <dd>
1512 Append the <a>code point</a> to <var>result</var>.
1514 <dt>the stream <a>starts with a valid escape</a>
1515 <dd>
1516 <a>Consume an escaped code point</a>.
1517 Append the returned <a>code point</a> to <var>result</var>.
1519 <dt>anything else
1520 <dd>
1521 <a>Reconsume the current input code point</a>.
1522 Return <var>result</var>.
1523 </dl>
1526 <h4 id="consume-number">
1527 Consume a number</h4>
1529 This section describes how to <dfn>consume a number</dfn> from a stream of <a>code points</a>.
1530 It returns a 3-tuple of
1531 a string representation,
1532 a numeric value,
1533 and a type flag which is either "integer" or "number".
1535 Note: This algorithm does not do the verification of the first few <a>code points</a>
1536 that are necessary to ensure a number can be obtained from the stream.
1537 Ensure that the stream <a>starts with a number</a>
1538 before calling this algorithm.
1540 Execute the following steps in order:
1542 <ol>
1543 <li>
1544 Initially set <var>repr</var> to the empty string
1545 and <var>type</var> to "integer".
1547 <li>
1548 If the <a>next input code point</a> is U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS (-),
1549 consume it and append it to <var>repr</var>.
1551 <li>
1552 While the <a>next input code point</a> is a <a>digit</a>,
1553 consume it and append it to <var>repr</var>.
1555 <li>
1556 If the <a lt="next input code point">next 2 input code points</a> are
1557 U+002E FULL STOP (.) followed by a <a>digit</a>,
1558 then:
1560 <ol>
1561 <li>Consume them.
1562 <li>Append them to <var>repr</var>.
1563 <li>Set <var>type</var> to "number".
1564 <li>While the <a>next input code point</a> is a <a>digit</a>, consume it and append it to <var>repr</var>.
1565 </ol>
1567 <li>
1568 If the <a lt="next input code point">next 2 or 3 input code points</a> are
1569 U+0045 LATIN CAPITAL LETTER E (E) or U+0065 LATIN SMALL LETTER E (e),
1570 optionally followed by U+002D HYPHEN-MINUS (-) or U+002B PLUS SIGN (+),
1571 followed by a <a>digit</a>,
1572 then:
1574 <ol>
1575 <li>Consume them.
1576 <li>Append them to <var>repr</var>.
1577 <li>Set <var>type</var> to "number".
1578 <li>While the <a>next input code point</a> is a <a>digit</a>, consume it and append it to <var>repr</var>.
1579 </ol>
1581 <li>
1582 <a lt="convert a string to a number">Convert <var>repr</var> to a number</a>,
1583 and set the <var>value</var> to the returned value.
1585 <li>
1586 Return a 3-tuple of <var>repr</var>, <var>value</var>, and <var>type</var>.
1587 </ol>
1590 <h4 id="convert-string-to-number">
1591 Convert a string to a number</h4>
1593 This section describes how to <dfn>convert a string to a number</dfn>.
1594 It returns a number.
1596 Note: This algorithm does not do any verification to ensure that the string contains only a number.
1597 Ensure that the string contains only a valid CSS number
1598 before calling this algorithm.
1600 Divide the string into seven components,
1601 in order from left to right:
1603 <ol>
1604 <li>A <b>sign</b>:
1605 a single U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS (-),
1606 or the empty string.
1607 Let <var>s</var> be the number -1 if the sign is U+002D HYPHEN-MINUS (-);
1608 otherwise, let <var>s</var> be the number 1.
1610 <li>An <b>integer part</b>:
1611 zero or more <a>digits</a>.
1612 If there is at least one digit,
1613 let <var>i</var> be the number formed by interpreting the digits as a base-10 integer;
1614 otherwise, let <var>i</var> be the number 0.
1616 <li>A <b>decimal point</b>:
1617 a single U+002E FULL STOP (.),
1618 or the empty string.
1620 <li>A <b>fractional part</b>:
1621 zero or more <a>digits</a>.
1622 If there is at least one digit,
1623 let <var>f</var> be the number formed by interpreting the digits as a base-10 integer
1624 and <var>d</var> be the number of digits;
1625 otherwise, let <var>f</var> and <var>d</var> be the number 0.
1627 <li>An <b>exponent indicator</b>:
1628 a single U+0045 LATIN CAPITAL LETTER E (E) or U+0065 LATIN SMALL LETTER E (e),
1629 or the empty string.
1631 <li>An <b>exponent sign</b>:
1632 a single U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS (-),
1633 or the empty string.
1634 Let <var>t</var> be the number -1 if the sign is U+002D HYPHEN-MINUS (-);
1635 otherwise, let <var>t</var> be the number 1.
1637 <li>An <b>exponent</b>:
1638 zero or more <a>digits</a>.
1639 If there is at least one digit,
1640 let <var>e</var> be the number formed by interpreting the digits as a base-10 integer;
1641 otherwise, let <var>e</var> be the number 0.
1642 </ol>
1644 Return the number <code>s·(i + f·10<sup>-d</sup>)·10<sup>te</sup></code>.
1647 <h4 id="consume-remnants-of-bad-url">
1648 Consume the remnants of a bad url</h4>
1650 This section describes how to <dfn>consume the remnants of a bad url</dfn> from a stream of <a>code points</a>,
1651 "cleaning up" after the tokenizer realizes that it's in the middle of a <<bad-url-token>> rather than a <<url-token>>.
1652 It returns nothing;
1653 its sole use is to consume enough of the input stream to reach a recovery point
1654 where normal tokenizing can resume.
1656 Repeatedly consume the <a>next input code point</a> from the stream:
1658 <dl>
1659 <dt>U+0029 RIGHT PARENTHESIS ())
1660 <dt>EOF
1661 <dd>
1662 Return.
1664 <dt>the input stream <a>starts with a valid escape</a>
1665 <dd>
1666 <a>Consume an escaped code point</a>.
1667 <span class='note'>This allows an escaped right parenthesis ("\)") to be encountered without ending the <<bad-url-token>>.
1668 This is otherwise identical to the "anything else" clause.</span>
1670 <dt>anything else
1671 <dd>
1672 Do nothing.
1673 </dl>
1676 <!--
1677 ââââââââ âââ ââââââââ ââââââ ââââââââ ââââââââ
1678 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
1679 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
1680 ââââââââ ââ ââ ââââââââ ââââââ ââââââ ââââââââ
1681 ââ âââââââââ ââ ââ ââ ââ ââ ââ
1682 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
1683 ââ ââ ââ ââ ââ ââââââ ââââââââ ââ ââ
1684 -->
1686 <h2 id="parsing">
1687 Parsing</h2>
1689 The input to the parsing stage is a stream or list of tokens from the tokenization stage.
1690 The output depends on how the parser is invoked,
1691 as defined by the entry points listed later in this section.
1692 The parser output can consist of at-rules,
1693 qualified rules,
1694 and/or declarations.
1696 The parser's output is constructed according to the fundamental syntax of CSS,
1697 without regards for the validity of any specific item.
1698 Implementations may check the validity of items as they are returned by the various parser algorithms
1699 and treat the algorithm as returning nothing if the item was invalid according to the implementation's own grammar knowledge,
1700 or may construct a full tree as specified
1701 and "clean up" afterwards by removing any invalid items.
1703 The items that can appear in the tree are:
1705 <dl>
1706 <dt><dfn export>at-rule</dfn>
1707 <dd>
1708 An at-rule has a name,
1709 a prelude consisting of a list of component values,
1710 and an optional block consisting of a simple {} block.
1712 Note: This specification places no limits on what an at-rule's block may contain.
1713 Individual at-rules must define whether they accept a block,
1714 and if so,
1715 how to parse it
1716 (preferably using one of the parser algorithms or entry points defined in this specification).
1718 <dt><dfn export>qualified rule</dfn>
1719 <dd>
1720 A qualified rule has
1721 a prelude consisting of a list of component values,
1722 and a block consisting of a simple {} block.
1724 Note: Most qualified rules will be style rules,
1725 where the prelude is a selector [[SELECT]]
1726 and the block a <a lt="parse a list of declarations">list of declarations</a>.
1728 <dt><dfn export>declaration</dfn>
1729 <dd>
1730 A declaration has a name,
1731 a value consisting of a list of component values,
1732 and an <var>important</var> flag which is initially unset.
1734 Declarations are further categorized as "properties" or "descriptors",
1735 with the former typically appearing in <a>qualified rules</a>
1736 and the latter appearing in <a>at-rules</a>.
1737 (This categorization does not occur at the Syntax level;
1738 instead, it is a product of where the declaration appears,
1739 and is defined by the respective specifications defining the given rule.)
1741 <dt><dfn export>component value</dfn>
1742 <dd>
1743 A component value is one of the preserved tokens,
1744 a function,
1745 or a simple block.
1747 <dt><dfn>preserved tokens</dfn>
1748 <dd>
1749 Any token produced by the tokenizer
1750 except for <<function-token>>s,
1751 <a href="#tokendef-open-curly"><{-token></a>s,
1752 <a href="#tokendef-open-paren"><(-token></a>s,
1753 and <a href="#tokendef-open-square"><[-token></a>s.
1755 Note: The non-preserved tokens listed above are always consumed into higher-level objects,
1756 either functions or simple blocks,
1757 and so never appear in any parser output themselves.
1759 Note: The tokens <a href="#tokendef-close-curly"><}-token></a>s, <a href="#tokendef-close-paren"><)-token></a>s, <a href="#tokendef-close-square"><]-token></a>, <<bad-string-token>>, and <<bad-url-token>> are always parse errors,
1760 but they are preserved in the token stream by this specification to allow other specs,
1761 such as Media Queries,
1762 to define more fine-grainted error-handling
1763 than just dropping an entire declaration or block.
1765 <dt><dfn export>function</dfn>
1766 <dd>
1767 A function has a name
1768 and a value consisting of a list of component values.
1770 <dt><dfn export>simple block</dfn>
1771 <dd>
1772 A simple block has an associated token (either a <a href="#tokendef-open-square"><[-token></a>, <a href="#tokendef-open-paren"><(-token></a>, or <a href="#tokendef-open-curly"><{-token></a>)
1773 and a value consisting of a list of component values.
1774 </dl>
1776 <!--
1777 ââââââââ âââ ââââ ââ ââââââââ âââââââ âââ ââââââââ
1778 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
1779 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
1780 ââââââââ ââ ââ ââ ââ ââââââââ ââ ââ ââ ââ ââ ââ
1781 ââ ââ âââââââââ ââ ââ ââ ââ ââ ââ âââââââââ ââ ââ
1782 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
1783 ââ ââ ââ ââ ââââ ââââââââ ââ ââ âââââââ ââ ââ ââââââââ
1784 -->
1786 <h3 id='parser-diagrams'>
1787 Parser Railroad Diagrams</h3>
1789 <em>This section is non-normative.</em>
1791 This section presents an informative view of the parser,
1792 in the form of railroad diagrams.
1794 These diagrams are <em>informative</em> and <em>incomplete</em>;
1795 they describe the grammar of "correct" stylesheets,
1796 but do not describe error-handling at all.
1797 They are provided solely to make it easier to get an intuitive grasp of the syntax.
1799 <dl>
1800 <dt id="stylesheet-diagram">Stylesheet
1801 <dd>
1802 <pre class='railroad'>
1803 Star:
1804 Choice: 3
1805 N: <CDO-token>
1806 N: <CDC-token>
1807 N: <whitespace-token>
1808 N: Qualified rule
1809 N: At-rule
1810 </pre>
1812 <dt id="rule-list-diagram">Rule list
1813 <dd>
1814 <pre class='railroad'>
1815 Star:
1816 Choice: 1
1817 N: <whitespace-token>
1818 N: Qualified rule
1819 N: At-rule
1820 </pre>
1822 <dt id="at-rule-diagram">At-rule
1823 <dd>
1824 <pre class='railroad'>
1825 N: <at-keyword-token>
1826 Star:
1827 N: Component value
1828 Choice:
1829 N: {} block
1830 T: ;
1831 </pre>
1833 <dt id="qualified-rule-diagram">Qualified rule
1834 <dd>
1835 <pre class='railroad'>
1836 Star:
1837 N: Component value
1838 N: {} block
1839 </pre>
1841 <dt id="declaration-list-diagram">Declaration list
1842 <dd>
1843 <pre class='railroad'>
1844 N: ws*
1845 Choice:
1846 Seq:
1847 Opt:
1848 N: Declaration
1849 Opt:
1850 Seq:
1851 T: ;
1852 N: Declaration list
1853 Seq:
1854 N: At-rule
1855 N: Declaration list
1856 </pre>
1858 <dt id="declaration-diagram">Declaration
1859 <dd>
1860 <pre class='railroad'>
1861 N: <ident-token>
1862 N: ws*
1863 T: :
1864 Star:
1865 N: Component value
1866 Opt: skip
1867 N: !important
1868 </pre>
1870 <dt id="!important-diagram">!important
1871 <dd>
1872 <pre class='railroad'>
1873 T: !
1874 N: ws*
1875 N: <ident-token "important">
1876 N: ws*
1877 </pre>
1879 <dt id="component-value-diagram">Component value
1880 <dd>
1881 <pre class='railroad'>
1882 Choice:
1883 N: Preserved token
1884 N: {} block
1885 N: () block
1886 N: [] block
1887 N: Function block
1888 </pre>
1891 <dt id="{}-block-diagram">{} block
1892 <dd>
1893 <pre class='railroad'>
1894 T: {
1895 Star:
1896 N: Component value
1897 T: }
1898 </pre>
1900 <dt id="()-block-diagram">() block
1901 <dd>
1902 <pre class='railroad'>
1903 T: (
1904 Star:
1905 N: Component value
1906 T: )
1907 </pre>
1909 <dt id="[]-block-diagram">[] block
1910 <dd>
1911 <pre class='railroad'>
1912 T: [
1913 Star:
1914 N: Component value
1915 T: ]
1916 </pre>
1918 <dt id="function-block-diagram">Function block
1919 <dd>
1920 <pre class='railroad'>
1921 N: <function-token>
1922 Star:
1923 N: Component value
1924 T: )
1925 </pre>
1926 </dl>
1928 <!--
1929 ââââââââ ââââââââ ââ ââ ââââââ
1930 ââ ââ ââ âââ ââ ââ ââ
1931 ââ ââ ââ ââââ ââ ââ
1932 ââ ââ ââââââ ââ ââ ââ ââââââ
1933 ââ ââ ââ ââ ââââ ââ
1934 ââ ââ ââ ââ âââ ââ ââ
1935 ââââââââ ââ ââ ââ ââââââ
1936 -->
1938 <h3 id="parser-definitions">
1939 Definitions</h3>
1941 <dl>
1942 <dt><dfn>current input token</dfn>
1943 <dd>
1944 The token or <a>component value</a> currently being operated on, from the list of tokens produced by the tokenizer.
1946 <dt><dfn>next input token</dfn>
1947 <dd>
1948 The token or <a>component value</a> following the <a>current input token</a> in the list of tokens produced by the tokenizer.
1949 If there isn't a token following the <a>current input token</a>,
1950 the <a>next input token</a> is an <<EOF-token>>.
1952 <dt><dfn><<EOF-token>></dfn>
1953 <dd>
1954 A conceptual token representing the end of the list of tokens.
1955 Whenever the list of tokens is empty,
1956 the <a>next input token</a> is always an <<EOF-token>>.
1958 <dt><dfn>consume the next input token</dfn>
1959 <dd>
1960 Let the <a>current input token</a> be the current <a>next input token</a>,
1961 adjusting the <a>next input token</a> accordingly.
1963 <dt><dfn>reconsume the current input token</dfn>
1964 <dd>
1965 The next time an algorithm instructs you to <a>consume the next input token</a>,
1966 instead do nothing
1967 (retain the <a>current input token</a> unchanged).
1968 </dl>
1970 <!--
1971 ââââââââ ââ ââ ââââââââ ââââââââ ââ ââ ââââââââ âââââââ ââââ ââ ââ ââââââââ ââââââ
1972 ââ âââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ ââ
1973 ââ ââââ ââ ââ ââ ââ ââââ ââ ââ ââ ââ ââ ââââ ââ ââ ââ
1974 ââââââ ââ ââ ââ ââ ââââââââ ââ ââââââââ ââ ââ ââ ââ ââ ââ ââ ââââââ
1975 ââ ââ ââââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââââ ââ ââ
1976 ââ ââ âââ ââ ââ ââ ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ
1977 ââââââââ ââ ââ ââ ââ ââ ââ ââ âââââââ ââââ ââ ââ ââ ââââââ
1978 -->
1980 <h3 id="parser-entry-points">
1981 Parser Entry Points</h3>
1983 The algorithms defined in this section produce high-level CSS objects
1984 from lower-level objects.
1985 They assume that they are invoked on a token stream,
1986 but they may also be invoked on a string;
1987 if so,
1988 first perform <a href="#input-preprocessing">input preprocessing</a>
1989 to produce a <a>code point</a> stream,
1990 then perform <a href="#tokenization">tokenization</a>
1991 to produce a token stream.
1993 "<a>Parse a stylesheet</a>" can also be invoked on a byte stream,
1994 in which case <a href="#input-byte-stream">The input byte stream</a>
1995 defines how to decode it into Unicode.
1997 Note: This specification does not define how a byte stream is decoded for other entry points.
1999 Note: Other specs can define additional entry points for their own purposes.
2001 <div class='note'>
2002 The following notes should probably be translated into normative text in the relevant specs,
2003 hooking this spec's terms:
2005 <ul>
2006 <li>
2007 "<a>Parse a stylesheet</a>" is intended to be the normal parser entry point,
2008 for parsing stylesheets.
2010 <li>
2011 "<a>Parse a list of rules</a>" is intended for the content of at-rules such as ''@media''.
2012 It differs from "<a>Parse a stylesheet</a>" in the handling of <<CDO-token>> and <<CDC-token>>.
2014 <li>
2015 "<a>Parse a rule</a>" is intended for use by the <code>CSSStyleSheet#insertRule</code> method,
2016 and similar functions which might exist,
2017 which parse text into a single rule.
2019 <li>
2020 "<a>Parse a declaration</a>" is used in ''@supports'' conditions. [[CSS3-CONDITIONAL]]
2022 <li>
2023 "<a>Parse a list of declarations</a>" is for the contents of a <code>style</code> attribute,
2024 which parses text into the contents of a single style rule.
2026 <li>
2027 "<a>Parse a component value</a>" is for things that need to consume a single value,
2028 like the parsing rules for ''attr()''.
2030 <li>
2031 "<a>Parse a list of component values</a>" is for the contents of presentational attributes,
2032 which parse text into a single declaration's value,
2033 or for parsing a stand-alone selector [[SELECT]] or list of Media Queries [[MEDIAQ]],
2034 as in <a href="http://www.w3.org/TR/selectors-api/">Selectors API</a>
2035 or the <code>media</code> HTML attribute.
2036 </ul>
2037 </div>
2039 All of the algorithms defined in this spec may be called with either a list of tokens or of component values.
2040 Either way produces an identical result.
2042 <h4 id="parse-grammar">
2043 Parse something according to a CSS grammar</h4>
2045 It is often desirable to parse a string or token list
2046 to see if it matches some CSS grammar,
2047 and if it does,
2048 to destructure it according to the grammar.
2049 This section provides a generic hook for this kind of operation.
2050 It should be invoked like "parse <var>foo</var> as a CSS <<color>>", or similar.
2052 Note: As a reminder, this algorithm, along with all the others in this section,
2053 can be called with a string,
2054 a stream of CSS tokens,
2055 or a stream of CSS component values,
2056 whichever is most convenient.
2058 This algorithm must be called with <b>some input to be parsed</b>,
2059 and <b>some CSS grammar specification or term</b>.
2061 This algorithm returns either failure,
2062 if the input does not match the provided grammar,
2063 or the result of parsing the input according to the grammar,
2064 which is an unspecified structure corresponding to the provided grammar specification.
2065 The return value must only be interacted with by specification prose,
2066 where the representation ambiguity is not problematic
2067 if it is meant to be exposed outside of spec language,
2068 the spec using the result must explicitly translate it into a well-specified representation,
2069 such as, for example, by invoking a CSS serialization algorithm
2070 (like "serialize as a CSS <<string>> value").
2072 To <dfn export>parse something according to a CSS grammar</dfn>:
2074 <ol>
2075 <li>
2076 <a>Parse a list of component values</a> from the input,
2077 and let <var>result</var> be the return value.
2079 <li>
2080 Attempt to match <var>result</var> against the provided grammar.
2081 If this is successful,
2082 return the matched result;
2083 otherwise, return failure.
2084 </ol>
2087 <h4 id="parse-stylesheet">
2088 Parse a stylesheet</h4>
2090 To <dfn export>parse a stylesheet</dfn> from a stream of tokens:
2092 <ol>
2093 <li>
2094 Create a new stylesheet.
2096 <li>
2097 <a>Consume a list of rules</a> from the stream of tokens, with the <var>top-level flag</var> set.
2098 Let the return value be <var>rules</var>.
2100 <li>
2101 If the first rule in <var>rules</var> is an <a>at-rule</a> with a name
2102 that is an <a>ASCII case-insensitive</a> match for "charset",
2103 remove it from <var>rules</var>.
2105 <li>
2106 Assign <var>rules</var> to the stylesheet's value.
2108 <li>
2109 Return the stylesheet.
2110 </ol>
2112 <h4 id="parse-list-of-rules">
2113 Parse a list of rules</h4>
2115 To <dfn export>parse a list of rules</dfn> from a stream of tokens:
2117 <ol>
2118 <li>
2119 <a>Consume a list of rules</a> from the stream of tokens, with the <var>top-level flag</var> unset.
2121 <li>
2122 Return the returned list.
2123 </ol>
2125 <h4 id="parse-rule">
2126 Parse a rule</h4>
2128 To <dfn export>parse a rule</dfn> from a stream of tokens:
2130 <ol>
2131 <li>
2132 While the <a>next input token</a> is a <<whitespace-token>>,
2133 <a>consume the next input token</a>.
2135 <li>
2136 If the <a>next input token</a> is an <<EOF-token>>,
2137 return a syntax error.
2139 Otherwise,
2140 if the <a>next input token</a> is an <<at-keyword-token>>,
2141 <a>consume an at-rule</a>,
2142 and let <var>rule</var> be the return value.
2144 Otherwise,
2145 <a>consume a qualified rule</a>
2146 and let <var>rule</var> be the return value.
2147 If nothing was returned,
2148 return a syntax error.
2150 <li>
2151 While the <a>next input token</a> is a <<whitespace-token>>,
2152 <a>consume the next input token</a>.
2154 <li>
2155 If the <a>next input token</a> is an <<EOF-token>>,
2156 return <var>rule</var>.
2157 Otherwise, return a syntax error.
2158 </ol>
2160 <h4 id="parse-declaration">
2161 Parse a declaration</h4>
2163 Note: Unlike "<a>Parse a list of declarations</a>",
2164 this parses only a declaration and not an at-rule.
2166 To <dfn export>parse a declaration</dfn>:
2168 <ol>
2169 <li>
2170 While the <a>next input token</a> is a <<whitespace-token>>,
2171 <a>consume the next input token</a>.
2173 <li>
2174 If the <a>next input token</a> is not an <<ident-token>>,
2175 return a syntax error.
2177 <li>
2178 <a>Consume a declaration</a>.
2179 If anything was returned, return it.
2180 Otherwise, return a syntax error.
2181 </ol>
2183 <h4 id="parse-list-of-declarations">
2184 Parse a list of declarations</h4>
2186 Note: Despite the name,
2187 this actually parses a mixed list of declarations and at-rules,
2188 as CSS 2.1 does for ''@page''.
2189 Unexpected at-rules (which could be all of them, in a given context)
2190 are invalid and should be ignored by the consumer.
2192 To <dfn export>parse a list of declarations</dfn>:
2194 <ol>
2195 <li>
2196 <a>Consume a list of declarations</a>.
2198 <li>
2199 Return the returned list.
2200 </ol>
2202 <h4 id="parse-component-value">
2203 Parse a component value</h4>
2205 To <dfn export>parse a component value</dfn>:
2207 <ol>
2208 <li>
2209 While the <a>next input token</a> is a <<whitespace-token>>,
2210 <a>consume the next input token</a>.
2212 <li>
2213 If the <a>next input token</a> is an <<EOF-token>>,
2214 return a syntax error.
2216 <li>
2217 <a>Consume a component value</a>
2218 and let <var>value</var> be the return value.
2220 <li>
2221 While the <a>next input token</a> is a <<whitespace-token>>,
2222 <a>consume the next input token</a>.
2224 <li>
2225 If the <a>next input token</a> is an <<EOF-token>>,
2226 return <var>value</var>.
2227 Otherwise,
2228 return a syntax error.
2229 </ol>
2231 <h4 id="parse-list-of-component-values">
2232 Parse a list of component values</h4>
2234 To <dfn export>parse a list of component values</dfn>:
2236 <ol>
2237 <li>
2238 Repeatedly <a>consume a component value</a> until an <<EOF-token>> is returned,
2239 appending the returned values (except the final <<EOF-token>>) into a list.
2240 Return the list.
2241 </ol>
2243 <h4 id="parse-comma-separated-list-of-component-values">
2244 Parse a comma-separated list of component values</h4>
2246 To <dfn export>parse a comma-separated list of component values</dfn>:
2248 <ol>
2249 <li>
2250 Let <var>list of cvls</var> be an initially empty list of component value lists.
2252 <li>
2253 Repeatedly <a>consume a component value</a> until an <<EOF-token>> or <<comma-token>> is returned,
2254 appending the returned values (except the final <<EOF-token>> or <<comma-token>>) into a list.
2255 Append the list to <var>list of cvls</var>.
2257 If it was a <<comma-token>> that was returned,
2258 repeat this step.
2260 <li>
2261 Return <var>list of cvls</var>.
2262 </ol>
2264 <!--
2265 âââ ââ ââââââ âââââââ ââââââ
2266 ââ ââ ââ ââ ââ ââ ââ ââ ââ
2267 ââ ââ ââ ââ ââ ââ ââ
2268 ââ ââ ââ ââ ââââ ââ ââ ââââââ
2269 âââââââââ ââ ââ ââ ââ ââ ââ
2270 ââ ââ ââ ââ ââ ââ ââ ââ ââ
2271 ââ ââ ââââââââ ââââââ âââââââ ââââââ
2272 -->
2274 <h3 id="parser-algorithms">
2275 Parser Algorithms</h3>
2277 The following algorithms comprise the parser.
2278 They are called by the parser entry points above.
2280 These algorithms may be called with a list of either tokens or of component values.
2281 (The difference being that some tokens are replaced by <a>functions</a> and <a>simple blocks</a> in a list of component values.)
2282 Similar to how the input stream returned EOF code points to represent when it was empty during the tokenization stage,
2283 the lists in this stage must return an <<EOF-token>> when the next token is requested but they are empty.
2285 An algorithm may be invoked with a specific list,
2286 in which case it consumes only that list
2287 (and when that list is exhausted,
2288 it begins returning <<EOF-token>>s).
2289 Otherwise,
2290 it is implicitly invoked with the same list as the invoking algorithm.
2293 <h4 id="consume-list-of-rules">
2294 Consume a list of rules</h4>
2296 To <dfn>consume a list of rules</dfn>:
2298 Create an initially empty list of rules.
2300 Repeatedly consume the <a>next input token</a>:
2302 <dl>
2303 <dt><<whitespace-token>>
2304 <dd>
2305 Do nothing.
2307 <dt><<EOF-token>>
2308 <dd>
2309 Return the list of rules.
2311 <dt><<CDO-token>>
2312 <dt><<CDC-token>>
2313 <dd>
2314 If the <dfn><var>top-level flag</var></dfn> is set,
2315 do nothing.
2317 Otherwise,
2318 <a>reconsume the current input token</a>.
2319 <a>Consume a qualified rule</a>.
2320 If anything is returned,
2321 append it to the list of rules.
2323 <dt><<at-keyword-token>>
2324 <dd>
2325 <a>Reconsume the current input token</a>.
2326 <a>Consume an at-rule</a>.
2327 If anything is returned,
2328 append it to the list of rules.
2330 <dt>anything else
2331 <dd>
2332 <a>Reconsume the current input token</a>.
2333 <a>Consume a qualified rule</a>.
2334 If anything is returned,
2335 append it to the list of rules.
2336 </dl>
2339 <h4 id="consume-at-rule">
2340 Consume an at-rule</h4>
2342 To <dfn>consume an at-rule</dfn>:
2344 <a>Consume the next input token</a>.
2345 Create a new at-rule
2346 with its name set to the value of the <a>current input token</a>,
2347 its prelude initially set to an empty list,
2348 and its value initially set to nothing.
2350 Repeatedly consume the <a>next input token</a>:
2352 <dl>
2353 <dt><<semicolon-token>>
2354 <dt><<EOF-token>>
2355 <dd>
2356 Return the at-rule.
2358 <dt><a href="#tokendef-open-curly"><{-token></a>
2359 <dd>
2360 <a>Consume a simple block</a>
2361 and assign it to the at-rule's block.
2362 Return the at-rule.
2364 <dt><a>simple block</a> with an associated token of <a href="#tokendef-open-curly"><{-token></a>
2365 <dd>
2366 Assign the block to the at-rule's block.
2367 Return the at-rule.
2369 <dt>anything else
2370 <dd>
2371 <a>Reconsume the current input token</a>.
2372 <a>Consume a component value</a>.
2373 Append the returned value to the at-rule's prelude.
2374 </dl>
2377 <h4 id="consume-qualified-rule">
2378 Consume a qualified rule</h4>
2380 To <dfn>consume a qualified rule</dfn>:
2382 Create a new qualified rule
2383 with its prelude initially set to an empty list,
2384 and its value initially set to nothing.
2386 Repeatedly consume the <a>next input token</a>:
2388 <dl>
2389 <dt><<EOF-token>>
2390 <dd>
2391 This is a <a>parse error</a>.
2392 Return nothing.
2394 <dt><a href="#tokendef-open-curly"><{-token></a>
2395 <dd>
2396 <a>Consume a simple block</a>
2397 and assign it to the qualified rule's block.
2398 Return the qualified rule.
2400 <dt><a>simple block</a> with an associated token of <a href="#tokendef-open-curly"><{-token></a>
2401 <dd>
2402 Assign the block to the qualified rule's block.
2403 Return the qualified rule.
2405 <dt>anything else
2406 <dd>
2407 <a>Reconsume the current input token</a>.
2408 <a>Consume a component value</a>.
2409 Append the returned value to the qualified rule's prelude.
2410 </dl>
2413 <h4 id="consume-list-of-declarations">
2414 Consume a list of declarations</h4>
2416 To <dfn>consume a list of declarations</dfn>:
2418 Create an initially empty list of declarations.
2420 Repeatedly consume the <a>next input token</a>:
2422 <dl>
2423 <dt><<whitespace-token>>
2424 <dt><<semicolon-token>>
2425 <dd>
2426 Do nothing.
2428 <dt><<EOF-token>>
2429 <dd>
2430 Return the list of declarations.
2432 <dt><<at-keyword-token>>
2433 <dd>
2434 <a>Reconsume the current input token</a>.
2435 <a>Consume an at-rule</a>.
2436 Append the returned rule to the list of declarations.
2438 <dt><<ident-token>>
2439 <dd>
2440 Initialize a temporary list initially filled with the <a>current input token</a>.
2441 As long as the <a>next input token</a> is anything other than a <<semicolon-token>> or <<EOF-token>>,
2442 <a>consume a component value</a> and append it to the temporary list.
2443 <a>Consume a declaration</a> from the temporary list.
2444 If anything was returned,
2445 append it to the list of declarations.
2447 <dt>anything else</dd>
2448 <dd>
2449 This is a <a>parse error</a>.
2450 <a>Reconsume the current input token</a>.
2451 As long as the <a>next input token</a> is anything other than a <<semicolon-token>> or <<EOF-token>>,
2452 <a>consume a component value</a>
2453 and throw away the returned value.
2454 </dl>
2457 <h4 id="consume-declaration">
2458 Consume a declaration</h4>
2460 Note: This algorithm assumes that the <a>next input token</a> has already been checked to be an <<ident-token>>.
2462 To <dfn>consume a declaration</dfn>:
2464 <a>Consume the next input token</a>.
2465 Create a new declaration
2466 with its name set to the value of the <a>current input token</a>
2467 and its value initially set to the empty list.
2469 <ol>
2470 <li>
2471 While the <a>next input token</a> is a <<whitespace-token>>,
2472 <a>consume the next input token</a>.
2474 <li>
2475 If the <a>next input token</a> is anything other than a <<colon-token>>,
2476 this is a <a>parse error</a>.
2477 Return nothing.
2479 Otherwise, <a>consume the next input token</a>.
2481 <li>
2482 As long as the <a>next input token</a> is anything other than an <<EOF-token>>,
2483 <a>consume a component value</a>
2484 and append it to the declaration's value.
2486 <li>
2487 If the last two non-<<whitespace-token>>s in the declaration's value are
2488 a <<delim-token>> with the value "!"
2489 followed by an <<ident-token>> with a value that is an <a>ASCII case-insensitive</a> match for "important",
2490 remove them from the declaration's value
2491 and set the declaration's <var>important</var> flag to true.
2493 <li>
2494 Return the declaration.
2495 </ol>
2498 <h4 id="consume-component-value">
2499 Consume a component value</h4>
2501 To <dfn>consume a component value</dfn>:
2503 <a>Consume the next input token</a>.
2505 If the <a>current input token</a>
2506 is a <a href="#tokendef-open-curly"><{-token></a>, <a href="#tokendef-open-square"><[-token></a>, or <a href="#tokendef-open-paren"><(-token></a>,
2507 <a>consume a simple block</a>
2508 and return it.
2510 Otherwise, if the <a>current input token</a>
2511 is a <<function-token>>,
2512 <a>consume a function</a>
2513 and return it.
2515 Otherwise, return the <a>current input token</a>.
2518 <h4 id="consume-simple-block">
2519 Consume a simple block</h4>
2521 Note: This algorithm assumes that the <a>current input token</a> has already been checked to be an <a href="#tokendef-open-curly"><{-token></a>, <a href="#tokendef-open-square"><[-token></a>, or <a href="#tokendef-open-paren"><(-token></a>.
2523 To <dfn>consume a simple block</dfn>:
2525 The <dfn>ending token</dfn> is the mirror variant of the <a>current input token</a>.
2526 (E.g. if it was called with <a href="#tokendef-open-square"><[-token></a>, the <a>ending token</a> is <a href="#tokendef-close-square"><]-token></a>.)
2528 Create a <a>simple block</a> with its associated token set to the <a>current input token</a>
2529 and with a value with is initially an empty list.
2531 Repeatedly consume the <a>next input token</a> and process it as follows:
2533 <dl>
2534 <dt><<EOF-token>>
2535 <dt><a>ending token</a>
2536 <dd>
2537 Return the block.
2539 <dt>anything else
2540 <dd>
2541 <a>Reconsume the current input token</a>.
2542 <a>Consume a component value</a>
2543 and append it to the value of the block.
2544 </dl>
2547 <h4 id="consume-function">
2548 Consume a function</h4>
2550 Note: This algorithm assumes that the <a>current input token</a> has already been checked to be a <<function-token>>.
2552 To <dfn>consume a function</dfn>:
2554 Create a function with a name equal to the value of the <a>current input token</a>,
2555 and with a value which is initially an empty list.
2557 Repeatedly consume the <a>next input token</a> and process it as follows:
2559 <dl>
2560 <dt><<EOF-token>>
2561 <dt><a href="#tokendef-close-paren"><)-token></a>
2562 <dd>
2563 Return the function.
2565 <dt>anything else
2566 <dd>
2567 <a>Reconsume the current input token</a>.
2568 <a>Consume a component value</a>
2569 and append the returned value
2570 to the function's value.
2571 </dl>
2573 <!--
2574 âââ ââ ââ ââââââââ
2575 ââ ââ âââ ââ ââ ââ ââ
2576 ââ ââ ââââ ââ ââ ââ ââ
2577 ââ ââ ââ ââ ââ ââââââ ââââââââ
2578 âââââââââ ââ ââââ ââ ââ ââ
2579 ââ ââ ââ âââ ââ ââ ââ
2580 ââ ââ ââ ââ ââââââââ
2581 -->
2583 <h2 id="anb-microsyntax">
2584 The <var>An+B</var> microsyntax</h2>
2586 Several things in CSS,
2587 such as the '':nth-child()'' pseudoclass,
2588 need to indicate indexes in a list.
2589 The <var>An+B</var> microsyntax is useful for this,
2590 allowing an author to easily indicate single elements
2591 or all elements at regularly-spaced intervals in a list.
2593 The <dfn export>An+B</dfn> notation defines an integer step (<dfn>A</dfn>) and offset (<dfn>B</dfn>),
2594 and represents the <var>An+B</var>th elements in a list,
2595 for every positive integer or zero value of <var>n</var>,
2596 with the first element in the list having index 1 (not 0).
2598 For values of <var>A</var> and <var>B</var> greater than 0,
2599 this effectively divides the list into groups of <var>A</var> elements
2600 (the last group taking the remainder),
2601 and selecting the <var>B</var>th element of each group.
2603 The <var>An+B</var> notation also accepts the ''even'' and ''odd'' keywords,
2604 which have the same meaning as ''2n'' and ''2n+1'', respectively.
2606 <div class="example">
2607 <p>Examples:
2608 <pre><!--
2609 -->2n+0 /* represents all of the even elements in the list */
<!--
2610 -->even /* same */
<!--
2611 -->4n+1 /* represents the 1st, 5th, 9th, 13th, etc. elements in the list */</pre>
2612 </div>
2614 The values of <var>A</var> and <var>B</var> can be negative,
2615 but only the positive results of <var>An+B</var>,
2616 for <var>n</var> ⥠0,
2617 are used.
2619 <div class="example">
2620 <p>Example:
2621 <pre><!--
2622 -->-1n+6 /* represents the first 6 elements of the list */
<!--
2623 -->-4n+10 /* represents the 2nd, 6th, and 10th elements of the list */
2624 </pre>
2625 </div>
2627 If both <var>A</var> and <var>B</var> are 0,
2628 the pseudo-class represents no element in the list.
2630 <h3 id='anb-syntax'>
2631 Informal Syntax Description</h3>
2633 <em>This section is non-normative.</em>
2635 When <var>A</var> is 0, the <var>An</var> part may be omitted
2636 (unless the <var>B</var> part is already omitted).
2637 When <var>An</var> is not included
2638 and <var>B</var> is non-negative,
2639 the ''+'' sign before <var>B</var> (when allowed)
2640 may also be omitted.
2641 In this case the syntax simplifies to just <var>B</var>.
2643 <div class="example">
2644 <p>Examples:
2645 <pre><!--
2646 -->0n+5 /* represents the 5th element in the list */
<!--
2647 -->5 /* same */</pre>
2648 </div>
2650 When <var>A</var> is 1 or -1,
2651 the <code>1</code> may be omitted from the rule.
2653 <div class="example">
2654 <p>Examples:
2655 <p>The following notations are therefore equivalent:
2656 <pre><!--
2657 -->1n+0 /* represents all elements in the list */
<!--
2658 -->n+0 /* same */
<!--
2659 -->n /* same */</pre>
2660 </div>
2662 If <var>B</var> is 0, then every <var>A</var>th element is picked.
2663 In such a case,
2664 the <var>+B</var> (or <var>-B</var>) part may be omitted
2665 unless the <var>A</var> part is already omitted.
2667 <div class="example">
2668 <p>Examples:
2669 <pre><!--
2670 -->2n+0 /* represents every even element in the list */
<!--
2671 -->2n /* same */</pre>
2672 </div>
2674 When B is negative, its minus sign replaces the ''+'' sign.
2676 <div class="example">
2677 <p>Valid example:
2678 <pre>3n-6</pre>
2679 <p>Invalid example:
2680 <pre>3n + -6</pre>
2681 </div>
2683 Whitespace is permitted on either side of the ''+'' or ''-''
2684 that separates the <var>An</var> and <var>B</var> parts when both are present.
2686 <div class="example">
2687 <p>Valid Examples with white space:
2688 <pre><!--
2689 -->3n + 1
<!--
2690 -->+3n - 2
<!--
2691 -->-n+ 6
<!--
2692 -->+6</pre>
2693 <p>Invalid Examples with white space:
2694 <pre><!--
2695 -->3 n
<!--
2696 -->+ 2n
<!--
2697 -->+ 2</pre>
2698 </div>
2701 <h3 id="the-anb-type">
2702 The <code><an+b></code> type</h3>
2704 The <var>An+B</var> notation was originally defined using a slightly different tokenizer than the rest of CSS,
2705 resulting in a somewhat odd definition when expressed in terms of CSS tokens.
2706 This section describes how to recognize the <var>An+B</var> notation in terms of CSS tokens
2707 (thus defining the <var><an+b></var> type for CSS grammar purposes),
2708 and how to interpret the CSS tokens to obtain values for <var>A</var> and <var>B</var>.
2710 The <var><an+b></var> type is defined
2711 (using the <a href="http://www.w3.org/TR/css3-values/#value-defs">Value Definition Syntax in the Values & Units spec</a>)
2712 as:
2714 <pre class='prod'>
2715 <dfn id="anb-production"><an+b></dfn> =
2716 odd | even |
2717 <var><integer></var> |
2719 <var><n-dimension></var> |
2720 '+'?<sup><a href="#anb-plus">â </a></sup> n |
2721 -n |
2723 <var><ndashdigit-dimension></var> |
2724 '+'?<sup><a href="#anb-plus">â </a></sup> <var><ndashdigit-ident></var> |
2725 <var><dashndashdigit-ident></var> |
2727 <var><n-dimension></var> <var><signed-integer></var> |
2728 '+'?<sup><a href="#anb-plus">â </a></sup> n <var><signed-integer></var> |
2729 -n <var><signed-integer></var> |
2731 <var><ndash-dimension></var> <var><signless-integer></var> |
2732 '+'?<sup><a href="#anb-plus">â </a></sup> n- <var><signless-integer></var> |
2733 -n- <var><signless-integer></var> |
2735 <var><n-dimension></var> ['+' | '-'] <var><signless-integer></var>
2736 '+'?<sup><a href="#anb-plus">â </a></sup> n ['+' | '-'] <var><signless-integer></var> |
2737 -n ['+' | '-'] <var><signless-integer></var>
2738 </pre>
2740 where:
2742 <ul>
2743 <li><dfn><code><n-dimension></code></dfn> is a <<dimension-token>> with its type flag set to "integer", and a unit that is an <a>ASCII case-insensitive</a> match for "n"
2744 <li><dfn><code><ndash-dimension></code></dfn> is a <<dimension-token>> with its type flag set to "integer", and a unit that is an <a>ASCII case-insensitive</a> match for "n-"
2745 <li><dfn><code><ndashdigit-dimension></code></dfn> is a <<dimension-token>> with its type flag set to "integer", and a unit that is an <a>ASCII case-insensitive</a> match for "n-*", where "*" is a series of one or more <a>digits</a>
2746 <li><dfn><code><ndashdigit-ident></code></dfn> is an <<ident-token>> whose value is an <a>ASCII case-insensitive</a> match for "n-*", where "*" is a series of one or more <a>digits</a>
2747 <li><dfn><code><dashndashdigit-ident></code></dfn> is an <<ident-token>> whose value is an <a>ASCII case-insensitive</a> match for "-n-*", where "*" is a series of one or more <a>digits</a>
2748 <li><dfn><code><integer></code></dfn> is a <<number-token>> with its type flag set to "integer"
2749 <li><dfn><code><signed-integer></code></dfn> is a <<number-token>> with its type flag set to "integer", and whose representation starts with "+" or "-"
2750 <li><dfn><code><signless-integer></code></dfn> is a <<number-token>> with its type flag set to "integer", and whose representation start with a <a>digit</a>
2751 </ul>
2753 <p id="anb-plus">
2754 <sup>â </sup>: When a plus sign (+) precedes an ident starting with "n", as in the cases marked above,
2755 there must be no whitespace between the two tokens,
2756 or else the tokens do not match the above grammar.
2757 Whitespace is valid (and ignored) between any other two tokens.
2759 The clauses of the production are interpreted as follows:
2761 <dl>
2762 <dt>''odd''
2763 <dd>
2764 <var>A</var> is 2, <var>B</var> is 1.
2766 <dt>''even''
2767 <dd>
2768 <var>A</var> is 2, <var>B</var> is 0.
2770 <dt><code><var><integer></var></code>
2771 <dd>
2772 <var>A</var> is 0, <var>B</var> is the integerâs value.
2774 <dt><code><var><n-dimension></var></code>
2775 <dt><code>'+'? n</code>
2776 <dt><code>-n</code>
2777 <dd>
2778 <var>A</var> is the dimension's value, 1, or -1, respectively.
2779 <var>B</var> is 0.
2781 <dt><code><var><ndashdigit-dimension></var></code>
2782 <dt><code>'+'? <var><ndashdigit-ident></var></code>
2783 <dd>
2784 <var>A</var> is the dimension's value or 1, respectively.
2785 <var>B</var> is the dimension's unit or ident's value, respectively,
2786 with the first <a>code point</a> removed and the remainder interpreted as a base-10 number.
2787 <span class=note>B is negative.</span>
2789 <dt><code><var><dashndashdigit-ident></var></code>
2790 <dd>
2791 <var>A</var> is -1.
2792 <var>B</var> is the ident's value, with the first two <a>code points</a> removed and the remainder interpreted as a base-10 number.
2793 <span class=note>B is negative.</span>
2795 <dt><code><var><n-dimension></var> <var><signed-integer></var></code>
2796 <dt><code>'+'? n <var><signed-integer></var></code>
2797 <dt><code>-n <var><signed-integer></var></code>
2798 <dd>
2799 <var>A</var> is the dimension's value, 1, or -1, respectively.
2800 <var>B</var> is the integerâs value.
2802 <dt><code><var><ndash-dimension></var> <var><signless-integer></var></code>
2803 <dt><code>'+'? n- <var><signless-integer></var></code>
2804 <dt><code>-n- <var><signless-integer></var></code>
2805 <dd>
2806 <var>A</var> is the dimension's value, 1, or -1, respectively.
2807 <var>B</var> is the negation of the integerâs value.
2809 <dt><code><var><n-dimension></var> ['+' | '-'] <var><signless-integer></var></code>
2810 <dt><code>'+'? n ['+' | '-'] <var><signless-integer></var></code>
2811 <dt><code>-n ['+' | '-'] <var><signless-integer></var></code>
2812 <dd>
2813 <var>A</var> is the dimension's value, 1, or -1, respectively.
2814 <var>B</var> is the integerâs value.
2815 If a <code>'-'</code> was provided between the two, <var>B</var> is instead the negation of the integerâs value.
2816 </dl>
2818 <!--
2819 ââ ââ ââââââââ âââ ââ ââ ââââââ ââââââââ
2820 ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ ââ
2821 ââ ââ ââ ââ ââ ââ ââââ ââ ââ ââ
2822 ââ ââ ââââââââ ââ ââ ââ ââ ââ ââ ââââ ââââââ
2823 ââ ââ ââ ââ âââââââââ ââ ââââ ââ ââ ââ
2824 ââ ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ
2825 âââââââ ââ ââ ââ ââ ââ ââ ââââââ ââââââââ
2826 -->
2828 <h2 id="urange">
2829 The Unicode-Range microsyntax</h2>
2831 Some constructs,
2832 such as the 'unicode-range' descriptor for the ''@font-face'' rule,
2833 need a way to describe one or more unicode code points.
2834 The <dfn><urange></dfn> production represents a range of one or more unicode code points.
2836 Informally, the <<urange>> production has three forms:
2838 <dl>
2839 <dt>U+0001
2840 <dd>
2841 Defines a range consisting of a single code point,
2842 in this case the code point "1".
2844 <dt>U+0001-00ff
2845 <dd>
2846 Defines a range of codepoints between the first and the second value,
2847 in this case the range between "1" and "ff" (255 in decimal) inclusive.
2849 <dt>U+00??
2850 <dd>
2851 Defines a range of codepoints where the "?" characters range over all <a>hex digits</a>,
2852 in this case defining the same as the value ''U+0000-00ff''.
2853 </dl>
2855 In each form, a maximum of 6 digits is allowed for each hexadecimal number
2856 (if you treat "?" as a hexadecimal digit).
2858 <h3 id="urange-syntax">
2859 The <<urange>> type</h3>
2861 The <<urange>> notation was originally defined as a primitive token in CSS,
2862 but it is used very rarely,
2863 and collides with legitimate <<ident-token>>s in confusing ways.
2864 This section describes how to recognize the <<urange>> notation
2865 in terms of existing CSS tokens,
2866 and how to interpret it as a range of unicode codepoints.
2868 Note: The syntax described here is intentionally very low-level,
2869 and geared toward implementors.
2870 Authors should instead read the informal syntax description in the previous section,
2871 as it contains all information necessary to use <<urange>>,
2872 and is actually readable.
2874 The <<urange>> type is defined
2875 (using the <a href="http://www.w3.org/TR/css3-values/#value-defs">Value Definition Syntax in the Values & Units spec</a>) as:
2877 <pre class="prod">
2878 <<urange>> =
2879 u '+' <<ident-token>> '?'* |
2880 u <<dimension-token>> '?'* |
2881 u <<number-token>> '?'* |
2882 u <<number-token>> <<dimension-token>> |
2883 u <<number-token>> <<number-token>> |
2884 u '+' '?'+
2885 </pre>
2887 In this production,
2888 no whitespace can occur between any of the tokens.
2890 The <<urange>> production represents a range of one or more contiguous unicode code points
2891 as a <var>start value</var> and an <var>end value</var>,
2892 which are non-negative integers.
2893 To interpret the production above into a range,
2894 execute the following steps in order:
2896 1. Skipping the first ''u'' token,
2897 concatenate the representations of all the tokens in the production together
2898 (or, in the case of <<dimension-token>>s,
2899 the representation followed by the unit).
2900 Let this be <var>text</var>.
2902 2. If the first character of <var>text</var> is U+002B PLUS SIGN,
2903 consume it.
2904 Otherwise,
2905 this is an invalid <<urange>>,
2906 and this algorithm must exit.
2908 3. Consume as many <a>hex digits</a> from <var>text</var> as possible.
2909 then consume as many U+003F QUESTION MARK (?) <a>code points</a> as possible.
2910 If zero <a>code points</a> were consumed,
2911 or more than six <a>code points</a> were consumed,
2912 this is an invalid <<urange>>,
2913 and this algorithm must exit.
2915 If any U+003F QUESTION MARK (?) <a>code points</a> were consumed, then:
2917 1. If there are any <a>code points</a> left in <var>text</var>,
2918 this is an invalid <<urange>>,
2919 and this algorithm must exit.
2921 2. Interpret the consumed <a>code points</a> as a hexadecimal number,
2922 with the U+003F QUESTION MARK (?) <a>code points</a>
2923 replaced by U+0030 DIGIT ZERO (0) <a>code points</a>.
2924 This is the <var>start value</var>.
2926 3. Interpret the consumed <a>code points</a> as a hexadecimal number again,
2927 with the U+003F QUESTION MARK (?) <a>code points</a>
2928 replaced by U+0046 LATIN CAPITAL LETTER F (F) <a>code points</a>.
2929 This is the <var>end value</var>.
2931 4. Exit this algorithm.
2933 Otherwise, interpret the consumed <a>code points</a> as a hexadecimal number.
2934 This is the <var>start value</var>.
2936 4. If there are no <a>code points</a> left in <var>text</var>,
2937 The <var>end value</var> is the same as the <var>start value</var>.
2938 Exit this algorithm.
2940 5. If the next <a>code point</a> in <var>text</var> is U+002D HYPHEN-MINUS (-),
2941 consume it.
2942 Otherwise,
2943 this is an invalid <<urange>>,
2944 and this algorithm must exit.
2946 6. Consume as many <a>hex digits</a> as possible from <var>text</var>.
2948 If zero <a>hex digits</a> were consumed,
2949 or more than 6 <a>hex digits</a> were consumed,
2950 this is an invalid <<urange>>,
2951 and this algorithm must exit.
2952 If there are any <a>code points</a> left in <var>text</var>,
2953 this is an invalid <<urange>>,
2954 and this algorithm must exit.
2956 7. Interpret the consumed <a>code points</a> as a hexadecimal number.
2957 This is the <var>end value</var>.
2959 To determine what codepoints the <<urange>> represents:
2961 1. If <var>end value</var> is greater than the <a>maximum allowed code point</a>,
2962 the <<urange>> is invalid and a syntax error.
2964 2. If <var>start value</var> is greater than <var>end value</var>,
2965 the <<urange>> is invalid and a syntax error.
2967 3. Otherwise, the <<urange>> represents a contiguous range of codepoints from <var>start value</var> to <var>end value</var>, inclusive.
2969 Note: The syntax of <<urange>> is intentionally fairly wide;
2970 its patterns capture every possible token sequence
2971 that the informal syntax can generate.
2972 However, it requires no whitespace between its constituent tokens,
2973 which renders it fairly safe to use in practice.
2974 Even grammars which have a <<urange>> followed by a <<number>> or <<dimension>>
2975 (which might appear to be ambiguous
2976 if an author specifies the <<urange>> with the ''u <<number>>'' clause)
2977 are actually quite safe,
2978 as an author would have to intentionally separate the <<urange>> and the <<number>>/<<dimension>>
2979 with a comment rather than whitespace
2980 for it to be ambiguous.
2981 Thus, while it's <em>possible</em> for authors to write things that are parsed in confusing ways,
2982 the actual code they'd have to write to cause the confusion is, itself, confusing and rare.
2985 <!--
2986 ââââââ ââââââââ âââ ââ ââ ââ ââ âââ ââââââââ ââââââ
2987 ââ ââ ââ ââ ââ ââ âââ âââ âââ âââ ââ ââ ââ ââ ââ ââ
2988 ââ ââ ââ ââ ââ ââââ ââââ ââââ ââââ ââ ââ ââ ââ ââ
2989 ââ ââââ ââââââââ ââ ââ ââ âââ ââ ââ âââ ââ ââ ââ ââââââââ ââââââ
2990 ââ ââ ââ ââ âââââââââ ââ ââ ââ ââ âââââââââ ââ ââ ââ
2991 ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ
2992 ââââââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââ ââââââ
2993 -->
2995 <h2 id='rule-defs'>
2996 Defining Grammars for Rules and Other Values</h2>
2998 The <a href="http://www.w3.org/TR/css3-values/">Values</a> spec defines how to specify a grammar for properties.
2999 This section does the same, but for rules.
3001 Just like in property grammars,
3002 the notation <code><foo></code> refers to the "foo" grammar term,
3003 assumed to be defined elsewhere.
3004 Substituting the <code><foo></code> for its definition results in a semantically identical grammar.
3006 Several types of tokens are written literally, without quotes:
3008 <ul>
3009 <li><<ident-token>>s (such as <code>auto</code>, <code>disc</code>, etc), which are simply written as their value.
3010 <li><<at-keyword-token>>s, which are written as an @ character followed by the token's value, like <code>@media</code>.
3011 <li><<function-token>>s, which are written as the function name followed by a ( character, like <code>translate(</code>.
3012 <li>The <<colon-token>> (written as <code>:</code>), <<comma-token>> (written as <code>,</code>), <<semicolon-token>> (written as <code>;</code>), <a href="#tokendef-open-paren"><(-token></a>, <a href="#tokendef-close-paren"><)-token></a>, <a href="#tokendef-open-curly"><{-token></a>, and <a href="#tokendef-close-curly"><}-token></a>s.
3013 </ul>
3015 Tokens match if their value is an <a>ASCII case-insensitive</a> match
3016 for the value defined in the grammar.
3018 <p class=note>
3019 Although it is possible, with <a>escaping</a>,
3020 to construct an <<ident-token>> whose value ends with <code>(</code> or starts with <code>@</code>,
3021 such a tokens is not a <<function-token>> or an <<at-keyword-token>>
3022 and does not match corresponding grammar definitions.
3024 <<delim-token>>s are written with their value enclosed in single quotes.
3025 For example, a <<delim-token>> containing the "+" <a>code point</a> is written as <code>'+'</code>.
3026 Similarly, the <a href="#tokendef-open-square"><[-token></a> and <a href="#tokendef-close-square"><]-token></a>s must be written in single quotes,
3027 as they're used by the syntax of the grammar itself to group clauses.
3028 <<whitespace-token>> is never indicated in the grammar;
3029 <<whitespace-token>>s are allowed before, after, and between any two tokens,
3030 unless explicitly specified otherwise in prose definitions.
3031 (For example, if the prelude of a rule is a selector,
3032 whitespace is significant.)
3034 When defining a function or a block,
3035 the ending token must be specified in the grammar,
3036 but if it's not present in the eventual token stream,
3037 it still matches.
3039 <div class='example'>
3040 For example, the syntax of the ''translateX()'' function is:
3042 <pre>translateX( <<translation-value>> )</pre>
3044 However, the stylesheet may end with the function unclosed, like:
3046 <pre>.foo { transform: translate(50px</pre>
3048 The CSS parser parses this as a style rule containing one declaration,
3049 whose value is a function named "translate".
3050 This matches the above grammar,
3051 even though the ending token didn't appear in the token stream,
3052 because by the time the parser is finished,
3053 the presence of the ending token is no longer possible to determine;
3054 all you have is the fact that there's a block and a function.
3055 </div>
3057 <h3 id='declaration-rule-list'>
3058 Defining Block Contents: the <<declaration-list>>, <<rule-list>>, and <<stylesheet>> productions</h3>
3060 The CSS parser is agnostic as to the contents of blocks,
3061 such as those that come at the end of some at-rules.
3062 Defining the generic grammar of the blocks in terms of tokens is non-trivial,
3063 but there are dedicated and unambiguous algorithms defined for parsing this.
3065 The <dfn><declaration-list></dfn> production represents a list of declarations.
3066 It may only be used in grammars as the sole value in a block,
3067 and represents that the contents of the block must be parsed using the <a>consume a list of declarations</a> algorithm.
3069 Similarly, the <dfn><rule-list></dfn> production represents a list of rules,
3070 and may only be used in grammars as the sole value in a block.
3071 It represents that the contents of the block must be parsed using the <a>consume a list of rules</a> algorithm.
3073 Finally, the <dfn><stylesheet></dfn> production represents a list of rules.
3074 It is identical to <<rule-list>>,
3075 except that blocks using it default to accepting all rules
3076 that aren't otherwise limited to a particular context.
3078 <div class='example'>
3079 For example, the ''@font-face'' rule is defined to have an empty prelude,
3080 and to contain a list of declarations.
3081 This is expressed with the following grammar:
3083 <pre>@font-face { <<declaration-list>> }</pre>
3085 This is a complete and sufficient definition of the rule's grammar.
3087 For another example,
3088 ''@keyframes'' rules are more complex,
3089 interpreting their prelude as a name and containing keyframes rules in their block
3090 Their grammar is:
3092 <pre>@keyframes <<keyframes-name>> { <<rule-list>> }</pre>
3093 </div>
3095 For rules that use <<declaration-list>>,
3096 the spec for the rule must define which properties, descriptors, and/or at-rules are valid inside the rule;
3097 this may be as simple as saying "The @foo rule accepts the properties/descriptors defined in this specification/section.",
3098 and extension specs may simply say "The @foo rule additionally accepts the following properties/descriptors.".
3099 Any declarations or at-rules found inside the block that are not defined as valid
3100 must be removed from the rule's value.
3102 Within a <<declaration-list>>,
3103 <code>!important</code> is automatically invalid on any descriptors.
3104 If the rule accepts properties,
3105 the spec for the rule must define whether the properties interact with the cascade,
3106 and with what specificity.
3107 If they don't interact with the cascade,
3108 properties containing <code>!important</code> are automatically invalid;
3109 otherwise using <code>!important</code> is valid and has its usual effect on the cascade origin of the property.
3111 <div class='example'>
3112 For example, the grammar for ''@font-face'' in the previous example must,
3113 in addition to what is written there,
3114 define that the allowed declarations are the descriptors defined in the Fonts spec.
3115 </div>
3117 For rules that use <<rule-list>>,
3118 the spec for the rule must define what types of rules are valid inside the rule,
3119 same as <<declaration-list>>,
3120 and unrecognized rules must similarly be removed from the rule's value.
3122 <div class='example'>
3123 For example, the grammar for ''@keyframes'' in the previous example must,
3124 in addition to what is written there,
3125 define that the only allowed rules are <<keyframe-rule>>s,
3126 which are defined as:
3128 <pre><<keyframe-rule>> = <<keyframe-selector>> { <<declaration-list>> }</pre>
3130 Keyframe rules, then,
3131 must further define that they accept as declarations all animatable CSS properties,
3132 plus the 'animation-timing-function' property,
3133 but that they do not interact with the cascade.
3134 </div>
3136 For rules that use <<stylesheet>>,
3137 all rules are allowed by default,
3138 but the spec for the rule may define what types of rules are <em>invalid</em> inside the rule.
3140 <div class='example'>
3141 For example, the ''@media'' rule accepts anything that can be placed in a stylesheet,
3142 except more ''@media'' rules.
3143 As such, its grammar is:
3145 <pre>@media <<media-query-list>> { <<stylesheet>> }</pre>
3147 It additionally defines a restriction that the <<stylesheet>> can not contain ''@media'' rules,
3148 which causes them to be dropped from the outer rule's value if they appear.
3149 </div>
3151 <h3 id="any-value">
3152 Defining Arbitrary Contents: the <<declaration-value>> and <<any-value>> productions</h3>
3154 In some grammars,
3155 it is useful to accept any reasonable input in the grammar,
3156 and do more specific error-handling on the contents manually
3157 (rather than simply invalidating the construct,
3158 as grammar mismatches tend to do).
3160 For example, <a>custom properties</a> allow any reasonable value,
3161 as they can contain arbitrary pieces of other CSS properties,
3162 or be used for things that aren't part of existing CSS at all.
3163 For another example, the <<general-enclosed>> production in Media Queries
3164 defines the bounds of what future syntax MQs will allow,
3165 and uses special logic to deal with "unknown" values.
3167 To aid in this, two additional productions are defined:
3169 The <dfn><declaration-value></dfn> production matches <em>any</em> sequence of one or more tokens,
3170 so long as the sequence does not contain
3171 <<bad-string-token>>,
3172 <<bad-url-token>>,
3173 unmatched <<)-token>>, <<]-token>>, or <<}-token>>,
3174 or top-level <<semicolon-token>> tokens or <<delim-token>> tokens with a value of "!".
3175 It represents the entirety of what a valid declaration can have as its value.
3177 The <dfn><any-value></dfn> production is identical to <<declaration-value>>,
3178 but also allows top-level <<semicolon-token>> tokens
3179 and <<delim-token>> tokens with a value of "!".
3180 It represents the entirety of what valid CSS can be in any context.
3182 <!--
3183 ââââââ ââââââ ââââââ
3184 ââ ââ ââ ââ ââ ââ
3185 ââ ââ ââ
3186 ââ ââââââ ââââââ
3187 ââ ââ ââ
3188 ââ ââ ââ ââ ââ ââ
3189 ââââââ ââââââ ââââââ
3190 -->
3192 <h2 id="css-stylesheets">
3193 CSS stylesheets</h2>
3195 To <dfn>parse a CSS stylesheet</dfn>,
3196 first <i>parse a stylesheet</i>.
3197 Interpret all of the resulting top-level <i>qualified rules</i> as <i>style rules</i>, defined below.
3199 If any style rule is <a>invalid</a>,
3200 or any at-rule is not recognized or is invalid according to its grammar or context,
3201 it's a <i>parse error</i>.
3202 Discard that rule.
3204 <h3 id="style-rules">
3205 Style rules</h3>
3207 A <dfn>style rule</dfn> is a <i>qualified rule</i>
3208 that associates a <a href="https://drafts.csswg.org/selectors4/#selector-list">selector list</a> [[!SELECT]]
3209 with a list of property declarations.
3210 They are also called
3211 <a href="http://www.w3.org/TR/CSS21/syndata.html#rule-sets">rule sets</a> in [[!CSS21]].
3212 CSS Cascading and Inheritance [[!CSS3CASCADE]] defines how the declarations inside of style rules participate in the cascade.
3214 The prelude of the qualified rule is parsed as a
3215 <a href="https://drafts.csswg.org/selectors4/#selector-list">selector list</a>.
3216 If this results in an <a href="https://drafts.csswg.org/selectors4/#invalid">invalid selector list</a>,
3217 the entire style rule is <a>invalid</a>.
3219 The content of the qualified ruleâs block is parsed as a
3220 <a lt="parse a list of declarations">list of declarations</a>.
3221 Unless defined otherwise by another specification or a future level of this specification,
3222 at-rules in that list are <a>invalid</a>
3223 and must be ignored.
3224 Declaration for an unknown CSS property
3225 or whose value does not match the syntax defined by the property are <a>invalid</a>
3226 and must be ignored.
3227 The validity of the style ruleâs contents have no effect on the validity of the style rule itself.
3228 Unless otherwise specified, property names are <a>ASCII case-insensitive</a>.
3230 Note: The names of Custom Properties [[CSS-VARIABLES]] are case-sensitive.
3232 <i>Qualified rules</i> at the top-level of a CSS stylesheet are style rules.
3233 Qualified rules in other contexts may or may not be style rules,
3234 as defined by the context.
3236 <p class='example'>
3237 For example, qualified rules inside ''@media'' rules [[CSS3-CONDITIONAL]] are style rules,
3238 but qualified rules inside ''@keyframes'' rules are not [[CSS3-ANIMATIONS]].
3240 <h3 id='charset-rule'>
3241 The ''@charset'' Rule</h3>
3243 The algorithm used to <a>determine the fallback encoding</a> for a stylesheet
3244 looks for a specific byte sequence as the very first few bytes in the file,
3245 which has the syntactic form of an <a>at-rule</a> named "@charset".
3247 However, there is no actual <a>at-rule</a> named <dfn>@charset</dfn>.
3248 When a stylesheet is actually parsed,
3249 any occurrences of an ''@charset'' rule must be treated as an unrecognized rule,
3250 and thus dropped as invalid when the stylesheet is grammar-checked.
3252 Note: The algorithm to <a>parse a stylesheet</a> explicitly drops the first ''@charset'' rule from the document,
3253 before the stylesheet is grammar-checked,
3254 so valid rules that must appear first in the stylesheet,
3255 such as ''@import'',
3256 can still be preceded by an (invalid) ''@charset'' rule
3257 without making themselves invalid.
3259 Note: In CSS 2.1, ''@charset'' was a valid rule.
3260 Some legacy specs may still refer to a ''@charset'' rule,
3261 and explicitly talk about its presence in the stylesheet.
3263 <!--
3264 ââââââ ââââââââ ââââââââ ââââ âââ ââ
3265 ââ ââ ââ ââ ââ ââ ââ ââ ââ
3266 ââ ââ ââ ââ ââ ââ ââ ââ
3267 ââââââ ââââââ ââââââââ ââ ââ ââ ââ
3268 ââ ââ ââ ââ ââ âââââââââ ââ
3269 ââ ââ ââ ââ ââ ââ ââ ââ ââ
3270 ââââââ ââââââââ ââ ââ ââââ ââ ââ ââââââââ
3271 -->
3273 <h2 id="serialization">
3274 Serialization</h2>
3276 The tokenizer described in this specification does not produce tokens for comments,
3277 or otherwise preserve them in any way.
3278 Implementations may preserve the contents of comments and their location in the token stream.
3279 If they do, this preserved information must have no effect on the parsing step.
3281 This specification does not define how to serialize CSS in general,
3282 leaving that task to the CSSOM and individual feature specifications.
3283 In particular, the serialization of comments and whitespace is not defined.
3285 The only requirement for serialization is that it must "round-trip" with parsing,
3286 that is, parsing the stylesheet must produce the same data structures as
3287 parsing, serializing, and parsing again,
3288 except for consecutive <<whitespace-token>>s,
3289 which may be collapsed into a single token.
3291 Note: This exception can exist because
3292 CSS grammars always interpret any amount of whitespace as identical to a single space.
3294 <div class=note id='serialization-tables'>
3295 To satisfy this requirement:
3297 <ul>
3298 <li>
3299 A <<delim-token>> containing U+005C REVERSE SOLIDUS (\)
3300 must be serialized as U+005C REVERSE SOLIDUS
3301 followed by a <a>newline</a>.
3302 (The tokenizer only ever emits such a token followed by a <<whitespace-token>>
3303 that starts with a newline.)
3305 <li>
3306 A <<hash-token>> with the "unrestricted" type flag may not need
3307 as much escaping as the same token with the "id" type flag.
3309 <li>
3310 The unit of a <<dimension-token>> may need escaping
3311 to disambiguate with scientific notation.
3313 <li>
3314 For any consecutive pair of tokens,
3315 if the first token shows up in the row headings of either of the following two tables,
3316 and the second token shows up in the column headings,
3317 and there's a â in the cell denoted by the intersection of the chosen row and column,
3318 the pair of tokens must be serialized with a comment between them.
3320 If the tokenizer preserves comments,
3321 the preserved comment should be used;
3322 otherwise, an empty comment (<code>/**/</code>) must be inserted.
3323 (Preserved comments may be reinserted even if the following tables don't require a comment between two tokens.)
3325 Single characters in the row and column headings represent a <<delim-token>> with that value,
3326 except for "<code>(</code>",
3327 which represents a <a href=#tokendef-open-paren>(-token</a>.
3328 </ul>
3330 <style>
3331 #serialization-tables th { font-size: 80%; line-height: normal }
3332 </style>
3334 <table class='data'>
3335 <tr>
3336 <td>
3337 <th>ident
3338 <th>function
3339 <th>url
3340 <th>bad url
3341 <th>-
3342 <th>number
3343 <th>percentage
3344 <th>dimension
3345 <th>CDC
3346 <th>(
3347 <th>?
3348 <tr>
3349 <th>ident
3350 <td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â <td>
3351 <tr>
3352 <th>at-keyword
3353 <td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td> <td>
3354 <tr>
3355 <th>hash
3356 <td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td> <td>
3357 <tr>
3358 <th>dimension
3359 <td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td> <td>
3360 <tr>
3361 <th>#
3362 <td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td> <td> <td>
3363 <tr>
3364 <th>-
3365 <td>â<td>â<td>â<td>â<td>â<td>â<td>â<td>â<td> <td> <td>
3366 <tr>
3367 <th>number
3368 <td>â<td>â<td>â<td>â<td> <td>â<td>â<td>â<td> <td> <td>
3369 <tr>
3370 <th>@
3371 <td>â<td>â<td>â<td>â<td>â<td> <td> <td> <td> <td> <td>
3372 <tr>
3373 <th>.
3374 <td> <td> <td> <td> <td> <td>â<td>â<td>â<td> <td> <td>
3375 <tr>
3376 <th>+
3377 <td> <td> <td> <td> <td> <td>â<td>â<td>â<td> <td> <td>
3378 </table>
3380 <table class='data'>
3381 <tr>
3382 <td>
3383 <th>=<th>|<th>*
3384 <tr>
3385 <th>$
3386 <td>â<td> <td>
3387 <tr>
3388 <th>*
3389 <td>â<td> <td>
3390 <tr>
3391 <th>^
3392 <td>â<td> <td>
3393 <tr>
3394 <th>~
3395 <td>â<td> <td>
3396 <tr>
3397 <th>|
3398 <td>â<td>â<td>
3399 <tr>
3400 <th>/
3401 <td> <td> <td>â
3402 </table>
3403 </div>
3405 <h3 id='serializing-anb'>
3406 Serializing <var><an+b></var></h3>
3408 To <dfn export>serialize an <var><an+b></var> value</dfn>,
3409 let <var>s</var> initially be the empty string:
3411 <dl>
3412 <dt><var>A</var> and <var>B</var> are both zero
3413 <dd>
3414 Append "0" to <var>s</var>.
3416 <dt><var>A</var> is zero, <var>B</var> is non-zero
3417 <dd>
3418 Serialize <var>B</var> and append it to <var>s</var>.
3420 <dt><var>A</var> is non-zero, <var>B</var> is zero
3421 <dd>
3422 Serialize <var>A</var> and append it to <var>s</var>.
3423 Append "n" to <var>s</var>.
3425 <dt><var>A</var> and <var>B</var> are both non-zero
3426 <dd>
3427 Serialize <var>A</var> and append it to <var>s</var>.
3428 Append "n" to <var>s</var>.
3429 If <var>B</var> is positive,
3430 append "+" to <var>s</var>
3431 Serialize <var>B</var> and append it to <var>s</var>.
3432 </dl>
3434 Return <var>s</var>.
3436 <h2 id="priv-sec">
3437 Privacy and Security Considerations</h2>
3439 This specification introduces no new privacy concerns.
3441 This specification improves security, in that CSS parsing is now unambiguously defined for all inputs.
3443 Insofar as old parsers, such as whitelists/filters, parse differently from this specification,
3444 they are somewhat insecure,
3445 but the previous parsing specification left a lot of ambiguous corner cases which browsers interpreted differently,
3446 so those filters were potentially insecure already,
3447 and this specification does not worsen the situation.
3449 <!--
3450 ââââââ ââ ââ âââ ââ ââ ââââââ ââââââââ ââââââ
3451 ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ ââ ââ ââ
3452 ââ ââ ââ ââ ââ ââââ ââ ââ ââ ââ
3453 ââ âââââââââ ââ ââ ââ ââ ââ ââ ââââ ââââââ ââââââ
3454 ââ ââ ââ âââââââââ ââ ââââ ââ ââ ââ ââ
3455 ââ ââ ââ ââ ââ ââ ââ âââ ââ ââ ââ ââ ââ
3456 ââââââ ââ ââ ââ ââ ââ ââ ââââââ ââââââââ ââââââ
3457 -->
3459 <h2 id="changes">
3460 Changes</h2>
3462 <em>This section is non-normative.</em>
3464 <h3 id="changes-CR-20140220">
3465 Changes from the 20 February 2014 Candidate Recommendation</h3>
3467 The following substantive changes were made:
3469 * Fixed a bug in the "Consume a URL token" algorithm,
3470 where it didn't consume the quote character starting a string before attempting to consume the string.
3472 * Fixed a bug in several of the parser algorithms
3473 related to the current/next input token and things getting consumed early/late.
3475 * Fix several bugs in the tokenization and parsing algorithms.
3477 * Change the definition of ident-like tokens to allow "--" to start an ident.
3479 The following editorial changes were made:
3481 * The "Consume a string token" algorithm was changed to allow calling it without specifying an explicit ending token,
3482 so that it uses the current input token instead.
3483 The three call-sites of the algorithm were changed to use that form.
3485 * Minor editorial restructuring of algorithms.
3487 <h3 id="changes-WD-20131105">
3488 Changes from the 5 November 2013 Last Call Working Draft</h3>
3490 <ul>
3491 <li>
3492 The <a href="#serialization">Serialization</a> section has been rewritten
3493 to make only the "round-trip" requirement normative,
3494 and move the details of how to achieve it into a note.
3495 Some corner cases in these details have been fixed.
3496 <li>
3497 [[ENCODING]] has been added to the list of normative references.
3498 It was already referenced in normative text before,
3499 just not listed as such.
3500 <li>
3501 In the algorithm to <a>determine the fallback encoding</a> of a stylesheet,
3502 limit the <code>@charset</code> byte sequence to 1024 bytes.
3503 This aligns with what HTML does for <code><meta charset></code>
3504 and makes sure the size of the sequence is bounded.
3505 This only makes a difference with leading or trailing whitespace
3506 in the encoding label:
3508 <pre>@charset " <em>(lots of whitespace)</em> utf-8";</pre>
3509 </ul>
3511 <h3 id="changes-WD-20130919">
3512 Changes from the 19 September 2013 Working Draft</h3>
3514 <ul>
3515 <li>
3516 The concept of <a>environment encoding</a> was added.
3517 The behavior does not change,
3518 but some of the definitions should be moved to the relevant specs.
3519 </ul>
3521 <h3 id="changes-css21">
3522 Changes from CSS 2.1 and Selectors Level 3</h3>
3524 Note: The point of this spec is to match reality;
3525 changes from CSS2.1 are nearly always because CSS 2.1 specified something that doesn't match actual browser behavior,
3526 or left something unspecified.
3527 If some detail doesn't match browsers,
3528 please let me know
3529 as it's almost certainly unintentional.
3531 Changes in decoding from a byte stream:
3533 <ul>
3534 <li>
3535 Only detect ''@charset'' rules in ASCII-compatible byte patterns.
3537 <li>
3538 Ignore ''@charset'' rules that specify an ASCII-incompatible encoding,
3539 as that would cause the rule itself to not decode properly.
3541 <li>
3542 Refer to [[!ENCODING]]
3543 rather than the IANA registery for character encodings.
3545 </ul>
3547 Tokenization changes:
3549 <ul>
3550 <li>
3551 Any U+0000 NULL <a>code point</a> in the CSS source is replaced with U+FFFD REPLACEMENT CHARACTER.
3553 <li>
3554 Any hexadecimal escape sequence such as ''\0'' that evaluates to zero
3555 produce U+FFFD REPLACEMENT CHARACTER rather than U+0000 NULL.
3556 <!--
3557 This covers a security issue:
3558 https://bugzilla.mozilla.org/show_bug.cgi?id=228856
3559 -->
3561 <li>
3562 The definition of <a>non-ASCII code point</a> was changed
3563 to be consistent with every definition of ASCII.
3564 This affects <a>code points</a> U+0080 to U+009F,
3565 which are now <a>name code points</a> rather than <<delim-token>>s,
3566 like the rest of <a>non-ASCII code points</a>.
3568 <li>
3569 Tokenization does not emit COMMENT or BAD_COMMENT tokens anymore.
3570 BAD_COMMENT is now considered the same as a normal token (not an error).
3571 <a href="#serialization">Serialization</a> is responsible
3572 for inserting comments as necessary between tokens that need to be separated,
3573 e.g. two consecutive <<ident-token>>s.
3575 <li>
3576 The <<unicode-range-token>> was removed,
3577 as it was low value and occasionally actively harmful.
3578 (''u+a { font-weight: bold; }'' was an invalid selector, for example...)
3580 Instead, a <<urange>> production was added,
3581 based on token patterns.
3582 It is technically looser than what 2.1 allowed
3583 (any number of digits and ? characters),
3584 but not in any way that should impact its use in practice.
3586 <li>
3587 Apply the <a href="http://www.w3.org/TR/CSS21/syndata.html#unexpected-eof">EOF error handling rule</a> in the tokenizer
3588 and emit normal <<string-token>> and <<url-token>> rather than BAD_STRING or BAD_URI
3589 on EOF.
3591 <li>
3592 <<prefix-match-token>>, <<suffix-match-token>>, and <<substring-match-token>> have been imported from Selectors 3.
3594 <li>
3595 The BAD_URI token (now <<bad-url-token>>) is "self-contained".
3596 In other words, once the tokenizer realizes it's in a <<bad-url-token>> rather than a <<url-token>>,
3597 it just seeks forward to look for the closing ),
3598 ignoring everything else.
3599 This behavior is simpler than treating it like a <<function-token>>
3600 and paying attention to opened blocks and such.
3601 Only WebKit exhibits this behavior,
3602 but it doesn't appear that we've gotten any compat bugs from it.
3604 <li>
3605 The <<comma-token>> has been added.
3607 <li>
3608 <<number-token>>, <<percentage-token>>, and <<dimension-token>> have been changed
3609 to include the preceding +/- sign as part of their value
3610 (rather than as a separate <<delim-token>> that needs to be manually handled every time the token is mentioned in other specs).
3611 The only consequence of this is that comments can no longer be inserted between the sign and the number.
3613 <li>
3614 Scientific notation is supported for numbers/percentages/dimensions to match SVG,
3615 per WG resolution.
3617 <li>
3618 <<column-token>> has been added,
3619 to keep Selectors parsing in single-token lookahead.
3621 <li>
3622 Hexadecimal escape for <a>surrogate code points</a> now emit a replacement character rather than the surrogate.
3623 This allows implementations to safely use UTF-16 internally.
3625 </ul>
3627 Parsing changes:
3629 <ul>
3630 <li>
3631 Any list of declarations now also accepts at-rules, like ''@page'',
3632 per WG resolution.
3633 This makes a difference in error handling
3634 even if no such at-rules are defined yet:
3635 an at-rule, valid or not, ends at a {} block without a <<semicolon-token>>
3636 and lets the next declaration begin.
3638 <li>
3639 The handling of some miscellanous "special" tokens
3640 (like an unmatched <a href="#tokendef-close-curly"><}-token></a>)
3641 showing up in various places in the grammar
3642 has been specified with some reasonable behavior shown by at least one browser.
3643 Previously, stylesheets with those tokens in those places just didn't match the stylesheet grammar at all,
3644 so their handling was totally undefined.
3645 Specifically:
3647 <ul>
3648 <li>
3649 [] blocks, () blocks and functions can now contain {} blocks, <<at-keyword-token>>s or <<semicolon-token>>s
3651 <li>
3652 Qualified rule preludes can now contain semicolons
3654 <li>
3655 Qualified rule and at-rule preludes can now contain <<at-keyword-token>>s
3656 </ul>
3658 </ul>
3660 <var>An+B</var> changes from Selectors Level 3 [[SELECT]]:
3662 <ul>
3663 <li>
3664 The <var>An+B</var> microsyntax has now been formally defined in terms of CSS tokens,
3665 rather than with a separate tokenizer.
3666 This has resulted in minor differences:
3668 <ul>
3669 <li>
3670 In some cases, minus signs or digits can be escaped
3671 (when they appear as part of the unit of a <<dimension-token>> or <<ident-token>>).
3672 </ul>
3673 </ul>
3675 <h2 class=no-num id="acknowledgments">
3676 Acknowledgments</h2>
3678 Thanks for feedback and contributions from
3679 Anne van Kesteren,
3680 David Baron,
3681 Henri Sivonen,
3682 Johannes Koch,
3683 å康豪 (Kang-Hao Lu),
3684 Marc O'Morain,
3685 Raffaello Giulietti,
3686 Simon Pieter,
3687 Tyler Karaszewski,
3688 and Zack Weinberg.