spec.txt   spec.txt 
--- ---
title: CommonMark Spec title: CommonMark Spec
author: author:
- John MacFarlane - John MacFarlane
version: 0.13 version: 0.14
date: 2014-12-10 date: 2014-12-10
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
... ...
# Introduction # Introduction
## What is Markdown? ## What is Markdown?
Markdown is a plain text format for writing structured documents, Markdown is a plain text format for writing structured documents,
based on conventions used for indicating formatting in email and based on conventions used for indicating formatting in email and
usenet posts. It was developed in 2004 by John Gruber, who wrote usenet posts. It was developed in 2004 by John Gruber, who wrote
the first Markdown-to-HTML converter in perl, and it soon became the first Markdown-to-HTML converter in perl, and it soon became
skipping to change at line 131 skipping to change at line 132
``` ```
10. What are the precedence rules between block-level and inline-level 10. What are the precedence rules between block-level and inline-level
structure? For example, how should the following be parsed? structure? For example, how should the following be parsed?
``` markdown ``` markdown
- `a long code span can contain a hyphen like this - `a long code span can contain a hyphen like this
- and it can screw things up` - and it can screw things up`
``` ```
11. Can list items include headers? (`Markdown.pl` does not allow this, 11. Can list items include section headers? (`Markdown.pl` does not
but headers can occur in blockquotes.) allow this, but does allow blockquotes to include headers.)
``` markdown ``` markdown
- # Heading - # Heading
``` ```
12. Can link references be defined inside block quotes or list items? 12. Can list items be empty?
``` markdown
* a
*
* b
```
13. Can link references be defined inside block quotes or list items?
``` markdown ``` markdown
> Blockquote [foo]. > Blockquote [foo].
> >
> [foo]: /url > [foo]: /url
``` ```
13. If there are multiple definitions for the same reference, which takes 14. If there are multiple definitions for the same reference, which takes
precedence? precedence?
``` markdown ``` markdown
[foo]: /url1 [foo]: /url1
[foo]: /url2 [foo]: /url2
[foo][] [foo][]
``` ```
In the absence of a spec, early implementers consulted `Markdown.pl` In the absence of a spec, early implementers consulted `Markdown.pl`
skipping to change at line 192 skipping to change at line 201
choice of HTML for the tests makes it possible to run the tests against choice of HTML for the tests makes it possible to run the tests against
an implementation without writing an abstract syntax tree renderer. an implementation without writing an abstract syntax tree renderer.
This document is generated from a text file, `spec.txt`, written This document is generated from a text file, `spec.txt`, written
in Markdown with a small extension for the side-by-side tests. in Markdown with a small extension for the side-by-side tests.
The script `spec2md.pl` can be used to turn `spec.txt` into pandoc The script `spec2md.pl` can be used to turn `spec.txt` into pandoc
Markdown, which can then be converted into other formats. Markdown, which can then be converted into other formats.
In the examples, the `→` character is used to represent tabs. In the examples, the `→` character is used to represent tabs.
# Preprocessing # Preliminaries
## Characters and lines
The input is a sequence of zero or more [lines](#line).
A [line](@line) A [line](@line)
is a sequence of zero or more [characters](#character) followed by a is a sequence of zero or more [characters](#character) followed by a
line ending (CR, LF, or CRLF) or by the end of file. [line ending](#line-ending) or by the end of file.
A [character](@character) is a unicode code point. A [character](@character) is a unicode code point.
This spec does not specify an encoding; it thinks of lines as composed This spec does not specify an encoding; it thinks of lines as composed
of characters rather than bytes. A conforming parser may be limited of characters rather than bytes. A conforming parser may be limited
to a certain encoding. to a certain encoding.
A [line ending](@line-ending) is, depending on the platform, a
newline (`U+000A`), carriage return (`U+000D`), or
carriage return + newline.
For security reasons, a conforming parser must strip or replace the
Unicode character `U+0000`.
A line containing no characters, or a line containing only spaces
(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line).
The following definitions of character classes will be used in this spec:
A [whitespace character](@whitespace-character) is a space
(`U+0020`), tab (`U+0009`), carriage return (`U+000D`), or
newline (`U+000A`).
[Whitespace](@whitespace) is a sequence of one or more [whitespace
characters](#whitespace-character).
A [unicode whitespace character](@unicode-whitespace-character) is
any code point in the unicode `Zs` class, or a tab (`U+0009`),
carriage return (`U+000D`), newline (`U+000A`), or form feed
(`U+000C`).
[Unicode whitespace](@unicode-whitespace) is a sequence of one
or more [unicode whitespace characters](#unicode-whitespace-character).
A [non-space character](@non-space-character) is anything but `U+0020`.
An [ASCII punctuation character](@ascii-punctuation-character)
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`,
`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`.
A [punctuation character](@punctuation-character) is an [ASCII
punctuation character](#ascii-punctuation-character) or anything in
the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
## Tab expansion
Tabs in lines are expanded to spaces, with a tab stop of 4 characters: Tabs in lines are expanded to spaces, with a tab stop of 4 characters:
. .
→foo→baz→→bim →foo→baz→→bim
. .
<pre><code>foo baz bim <pre><code>foo baz bim
</code></pre> </code></pre>
. .
. .
a→a a→a
ὐ→a ὐ→a
. .
<pre><code>a a <pre><code>a a
ὐ a ὐ a
</code></pre> </code></pre>
. .
Line endings are replaced by newline characters (LF).
A line containing no characters, or a line containing only spaces (after
tab expansion), is called a [blank line](@blank-line).
For security reasons, a conforming parser must strip or replace the
Unicode character `U+0000`.
# Blocks and inlines # Blocks and inlines
We can think of a document as a sequence of We can think of a document as a sequence of
[blocks](@block)---structural [blocks](@block)---structural
elements like paragraphs, block quotations, elements like paragraphs, block quotations,
lists, headers, rules, and code blocks. Blocks can contain other lists, headers, rules, and code blocks. Blocks can contain other
blocks, or they can contain [inline](@inline) content: blocks, or they can contain [inline](@inline) content:
words, spaces, links, emphasized text, images, and inline code. words, spaces, links, emphasized text, images, and inline code.
## Precedence ## Precedence
skipping to change at line 397 skipping to change at line 442
a------ a------
---a--- ---a---
. .
<p>_ _ _ _ a</p> <p>_ _ _ _ a</p>
<p>a------</p> <p>a------</p>
<p>---a---</p> <p>---a---</p>
. .
It is required that all of the non-space characters be the same. It is required that all of the
[non-space characters](#non-space-character) be the same.
So, this is not a horizontal rule: So, this is not a horizontal rule:
. .
*-* *-*
. .
<p><em>-</em></p> <p><em>-</em></p>
. .
Horizontal rules do not need blank lines before or after: Horizontal rules do not need blank lines before or after:
skipping to change at line 686 skipping to change at line 732
. .
## Setext headers ## Setext headers
A [setext header](@setext-header) A [setext header](@setext-header)
consists of a line of text, containing at least one nonspace character, consists of a line of text, containing at least one nonspace character,
with no more than 3 spaces indentation, followed by a [setext header with no more than 3 spaces indentation, followed by a [setext header
underline](#setext-header-underline). The line of text must be underline](#setext-header-underline). The line of text must be
one that, were it not followed by the setext header underline, one that, were it not followed by the setext header underline,
would be interpreted as part of a paragraph: it cannot be a code would be interpreted as part of a paragraph: it cannot be a code
block, header, blockquote, horizontal rule, or list. A [setext header block, header, blockquote, horizontal rule, or list.
underline](@setext-header-underline)
is a sequence of `=` characters or a sequence of `-` characters, with no A [setext header underline](@setext-header-underline) is a sequence of
more than 3 spaces indentation and any number of trailing `=` characters or a sequence of `-` characters, with no more than 3
spaces. The header is a level 1 header if `=` characters are used, and spaces indentation and any number of trailing spaces. If a line
a level 2 header if `-` characters are used. The contents of the header containing a single `-` can be interpreted as an
are the result of parsing the first line as Markdown inline content. empty [list item](#list-items), it should be interpreted this way
and not as a [setext header underline](#setext-header-underline).
The header is a level 1 header if `=` characters are used in the
[setext header underline](#setext-header-underline), and a level 2
header if `-` characters are used. The contents of the header are the
result of parsing the first line as Markdown inline content.
In general, a setext header need not be preceded or followed by a In general, a setext header need not be preceded or followed by a
blank line. However, it cannot interrupt a paragraph, so when a blank line. However, it cannot interrupt a paragraph, so when a
setext header comes after a paragraph, a blank line is needed between setext header comes after a paragraph, a blank line is needed between
them. them.
Simple examples: Simple examples:
. .
Foo *bar* Foo *bar*
skipping to change at line 951 skipping to change at line 1003
. .
\> foo \> foo
------ ------
. .
<h2>&gt; foo</h2> <h2>&gt; foo</h2>
. .
## Indented code blocks ## Indented code blocks
An [indented code block](@indented-code-block) An [indented code block](@indented-code-block) is composed of one or more
is composed of one or more
[indented chunks](#indented-chunk) separated by blank lines. [indented chunks](#indented-chunk) separated by blank lines.
An [indented chunk](@indented-chunk) An [indented chunk](@indented-chunk) is a sequence of non-blank lines,
is a sequence of non-blank lines, each indented four or more each indented four or more spaces. The contents of the code block are
spaces. An indented code block cannot interrupt a paragraph, so the literal contents of the lines, including trailing
if it occurs before or after a paragraph, there must be an [line endings](#line-ending), minus four spaces of indentation.
intervening blank line. The contents of the code block are An indented code block has no attributes.
the literal contents of the lines, including trailing newlines,
minus four spaces of indentation. An indented code block has no An indented code block cannot interrupt a paragraph, so there must be
attributes. a blank line between a paragraph and a following indented code block.
(A blank line is not needed, however, between a code block and a following
paragraph.)
. .
a simple a simple
indented code block indented code block
. .
<pre><code>a simple <pre><code>a simple
indented code block indented code block
</code></pre> </code></pre>
. .
skipping to change at line 1742 skipping to change at line 1795
Moreover, blank lines are usually not necessary and can be Moreover, blank lines are usually not necessary and can be
deleted. The exception is inside `<pre>` tags; here, one can deleted. The exception is inside `<pre>` tags; here, one can
replace the blank lines with `&#10;` entities. replace the blank lines with `&#10;` entities.
So there is no important loss of expressive power with the new rule. So there is no important loss of expressive power with the new rule.
## Link reference definitions ## Link reference definitions
A [link reference definition](@link-reference-definition) A [link reference definition](@link-reference-definition)
consists of a [link consists of a [link label](#link-label), indented up to three spaces, followed
label](#link-label), indented up to three spaces, followed by a colon (`:`), optional [whitespace](#whitespace) (including up to one
by a colon (`:`), optional blank space (including up to one [line ending](#line-ending)), a [link destination](#link-destination),
newline), a [link destination](#link-destination), optional optional [whitespace](#whitespace) (including up to one
blank space (including up to one newline), and an optional [link [line ending](#line-ending)), and an optional [link
title](#link-title), which if it is present must be separated title](#link-title), which if it is present must be separated
from the [link destination](#link-destination) by whitespace. from the [link destination](#link-destination) by [whitespace](#whitespace).
No further non-space characters may occur on the line. No further [non-space characters](#non-space-character) may occur on the line.
A [link reference-definition](#link-reference-definition) A [link reference-definition](#link-reference-definition)
does not correspond to a structural element of a document. Instead, it does not correspond to a structural element of a document. Instead, it
defines a label which can be used in [reference links](#reference-link) defines a label which can be used in [reference links](#reference-link)
and reference-style [images](#image) elsewhere in the document. [Link and reference-style [images](#images) elsewhere in the document. [Link
reference definitions] can come either before or after the links that use reference definitions] can come either before or after the links that use
them. them.
. .
[foo]: /url "title" [foo]: /url "title"
[foo] [foo]
. .
<p><a href="/url" title="title">foo</a></p> <p><a href="/url" title="title">foo</a></p>
. .
skipping to change at line 1866 skipping to change at line 1919
Here is a link reference definition with no corresponding link. Here is a link reference definition with no corresponding link.
It contributes nothing to the document. It contributes nothing to the document.
. .
[foo]: /url [foo]: /url
. .
. .
This is not a link reference definition, because there are This is not a link reference definition, because there are
non-space characters after the title: [non-space characters](#non-space-character) after the title:
. .
[foo]: /url "title" ok [foo]: /url "title" ok
. .
<p>[foo]: /url &quot;title&quot; ok</p> <p>[foo]: /url &quot;title&quot; ok</p>
. .
This is not a link reference definition, because it is indented This is not a link reference definition, because it is indented
four spaces: four spaces:
skipping to change at line 1930 skipping to change at line 1983
# [Foo] # [Foo]
[foo]: /url [foo]: /url
> bar > bar
. .
<h1><a href="/url">Foo</a></h1> <h1><a href="/url">Foo</a></h1>
<blockquote> <blockquote>
<p>bar</p> <p>bar</p>
</blockquote> </blockquote>
. .
Several [link references](#link-reference) can occur one after another, Several [link references definitions](#link-reference-definition)
without intervening blank lines. can occur one after another, without intervening blank lines.
. .
[foo]: /foo-url "foo" [foo]: /foo-url "foo"
[bar]: /bar-url [bar]: /bar-url
"bar" "bar"
[baz]: /baz-url [baz]: /baz-url
[foo], [foo],
[bar], [bar],
[baz] [baz]
skipping to change at line 2119 skipping to change at line 2172
The following rules define [block quotes](@block-quote): The following rules define [block quotes](@block-quote):
1. **Basic case.** If a string of lines *Ls* constitute a sequence 1. **Basic case.** If a string of lines *Ls* constitute a sequence
of blocks *Bs*, then the result of prepending a [block quote of blocks *Bs*, then the result of prepending a [block quote
marker](#block-quote-marker) to the beginning of each line in *Ls* marker](#block-quote-marker) to the beginning of each line in *Ls*
is a [block quote](#block-quote) containing *Bs*. is a [block quote](#block-quote) containing *Bs*.
2. **Laziness.** If a string of lines *Ls* constitute a [block 2. **Laziness.** If a string of lines *Ls* constitute a [block
quote](#block-quote) with contents *Bs*, then the result of deleting quote](#block-quote) with contents *Bs*, then the result of deleting
the initial [block quote marker](#block-quote-marker) from one or the initial [block quote marker](#block-quote-marker) from one or
more lines in which the next non-space character after the [block more lines in which the next
[non-space character](#non-space-character) after the [block
quote marker](#block-quote-marker) is [paragraph continuation quote marker](#block-quote-marker) is [paragraph continuation
text](#paragraph-continuation-text) is a block quote with *Bs* as text](#paragraph-continuation-text) is a block quote with *Bs* as
its content. its content.
[Paragraph continuation text](@paragraph-continuation-text) is text [Paragraph continuation text](@paragraph-continuation-text) is text
that will be parsed as part of the content of a paragraph, but does that will be parsed as part of the content of a paragraph, but does
not occur at the beginning of the paragraph. not occur at the beginning of the paragraph.
3. **Consecutiveness.** A document cannot contain two [block 3. **Consecutiveness.** A document cannot contain two [block
quotes](#block-quote) in a row unless there is a [blank quotes](#block-quote) in a row unless there is a [blank
line](#blank-line) between them. line](#blank-line) between them.
skipping to change at line 2479 skipping to change at line 2533
A [bullet list marker](@bullet-list-marker) A [bullet list marker](@bullet-list-marker)
is a `-`, `+`, or `*` character. is a `-`, `+`, or `*` character.
An [ordered list marker](@ordered-list-marker) An [ordered list marker](@ordered-list-marker)
is a sequence of one of more digits (`0-9`), followed by either a is a sequence of one of more digits (`0-9`), followed by either a
`.` character or a `)` character. `.` character or a `)` character.
The following rules define [list items](@list-item): The following rules define [list items](@list-item):
1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
blocks *Bs* starting with a non-space character and not separated blocks *Bs* starting with a [non-space character](#non-space-character)
and not separated
from each other by more than one blank line, and *M* is a list from each other by more than one blank line, and *M* is a list
marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result
of prepending *M* and the following spaces to the first line of of prepending *M* and the following spaces to the first line of
*Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
list item with *Bs* as its contents. The type of the list item list item with *Bs* as its contents. The type of the list item
(bullet or ordered) is determined by the type of its list marker. (bullet or ordered) is determined by the type of its list marker.
If the list item is ordered, then it is also assigned a start If the list item is ordered, then it is also assigned a start
number, based on the ordered list marker. number, based on the ordered list marker.
For example, let *Ls* be the lines For example, let *Ls* be the lines
skipping to change at line 2660 skipping to change at line 2715
- foo - foo
bar bar
- ``` - ```
foo foo
bar bar
``` ```
- baz
+ ```
foo
bar
```
. .
<ul> <ul>
<li> <li>
<p>foo</p> <p>foo</p>
<p>bar</p> <p>bar</p>
</li> </li>
<li> <li>
<p>foo</p> <p>foo</p>
</li> </li>
</ul> </ul>
<p>bar</p> <p>bar</p>
<ul> <ul>
<li> <li>
<pre><code>foo <pre><code>foo
bar bar
</code></pre> </code></pre>
</li> </li>
<li>
<p>baz</p>
<ul>
<li>
<pre><code>foo
bar
</code></pre>
</li>
</ul>
</li>
</ul> </ul>
. .
A list item may contain any kind of block: A list item may contain any kind of block:
. .
1. foo 1. foo
``` ```
bar bar
skipping to change at line 2855 skipping to change at line 2929
bar bar
. .
<ul> <ul>
<li> <li>
<p>foo</p> <p>foo</p>
<p>bar</p> <p>bar</p>
</li> </li>
</ul> </ul>
. .
3. **Indentation.** If a sequence of lines *Ls* constitutes a list item 3. **Empty list item.** A [list marker](#list-marker) followed by a
according to rule #1 or #2, then the result of indenting each line line containing only [whitespace](#whitespace) is a list item with
no contents.
Here is an empty bullet list item:
.
- foo
-
- bar
.
<ul>
<li>foo</li>
<li></li>
<li>bar</li>
</ul>
.
It does not matter whether there are spaces following the
[list marker](#list-marker):
.
- foo
-
- bar
.
<ul>
<li>foo</li>
<li></li>
<li>bar</li>
</ul>
.
Here is an empty ordered list item:
.
1. foo
2.
3. bar
.
<ol>
<li>foo</li>
<li></li>
<li>bar</li>
</ol>
.
A list may start or end with an empty list item:
.
*
.
<ul>
<li></li>
</ul>
.
4. **Indentation.** If a sequence of lines *Ls* constitutes a list item
according to rule #1, #2, or #3, then the result of indenting each line
of *L* by 1-3 spaces (the same for each line) also constitutes a of *L* by 1-3 spaces (the same for each line) also constitutes a
list item with the same contents and attributes. If a line is list item with the same contents and attributes. If a line is
empty, then it need not be indented. empty, then it need not be indented.
Indented one space: Indented one space:
. .
1. A paragraph 1. A paragraph
with two lines. with two lines.
skipping to change at line 2949 skipping to change at line 3080
. .
<pre><code>1. A paragraph <pre><code>1. A paragraph
with two lines. with two lines.
indented code indented code
&gt; A block quote. &gt; A block quote.
</code></pre> </code></pre>
. .
4. **Laziness.** If a string of lines *Ls* constitute a [list 5. **Laziness.** If a string of lines *Ls* constitute a [list
item](#list-item) with contents *Bs*, then the result of deleting item](#list-item) with contents *Bs*, then the result of deleting
some or all of the indentation from one or more lines in which the some or all of the indentation from one or more lines in which the
next non-space character after the indentation is next [non-space character](#non-space-character) after the indentation is
[paragraph continuation text](#paragraph-continuation-text) is a [paragraph continuation text](#paragraph-continuation-text) is a
list item with the same contents and attributes. The unindented list item with the same contents and attributes. The unindented
lines are called lines are called
[lazy continuation lines](@lazy-continuation-line). [lazy continuation lines](@lazy-continuation-line).
Here is an example with [lazy continuation Here is an example with [lazy continuation
lines](#lazy-continuation-line): lines](#lazy-continuation-line):
. .
1. A paragraph 1. A paragraph
skipping to change at line 3028 skipping to change at line 3159
<li> <li>
<blockquote> <blockquote>
<p>Blockquote <p>Blockquote
continued here.</p> continued here.</p>
</blockquote> </blockquote>
</li> </li>
</ol> </ol>
</blockquote> </blockquote>
. .
5. **That's all.** Nothing that is not counted as a list item by rules 6. **That's all.** Nothing that is not counted as a list item by rules
#1--4 counts as a [list item](#list-item). #1--5 counts as a [list item](#list-item).
The rules for sublists follow from the general rules above. A sublist The rules for sublists follow from the general rules above. A sublist
must be indented the same number of spaces a paragraph would need to be must be indented the same number of spaces a paragraph would need to be
in order to be included in the list item. in order to be included in the list item.
So, in this case we need two spaces indent: So, in this case we need two spaces indent:
. .
- foo - foo
- bar - bar
skipping to change at line 3128 skipping to change at line 3259
<li> <li>
<ol start="2"> <ol start="2">
<li>foo</li> <li>foo</li>
</ol> </ol>
</li> </li>
</ul> </ul>
</li> </li>
</ol> </ol>
. .
A list item may be empty:
.
- foo
-
- bar
.
<ul>
<li>foo</li>
<li></li>
<li>bar</li>
</ul>
.
.
-
.
<ul>
<li></li>
</ul>
.
A list item can contain a header: A list item can contain a header:
. .
- # Foo - # Foo
- Bar - Bar
--- ---
baz baz
. .
<ul> <ul>
<li> <li>
skipping to change at line 3228 skipping to change at line 3337
to break any existing documents. However, the spec given here should to break any existing documents. However, the spec given here should
correctly handle lists formatted with either the four-space rule or correctly handle lists formatted with either the four-space rule or
the more forgiving `Markdown.pl` behavior, provided they are laid out the more forgiving `Markdown.pl` behavior, provided they are laid out
in a way that is natural for a human to read. in a way that is natural for a human to read.
The strategy here is to let the width and indentation of the list marker The strategy here is to let the width and indentation of the list marker
determine the indentation necessary for blocks to fall under the list determine the indentation necessary for blocks to fall under the list
item, rather than having a fixed and arbitrary number. The writer can item, rather than having a fixed and arbitrary number. The writer can
think of the body of the list item as a unit which gets indented to the think of the body of the list item as a unit which gets indented to the
right enough to fit the list marker (and any indentation on the list right enough to fit the list marker (and any indentation on the list
marker). (The laziness rule, #4, then allows continuation lines to be marker). (The laziness rule, #5, then allows continuation lines to be
unindented if needed.) unindented if needed.)
This rule is superior, we claim, to any rule requiring a fixed level of This rule is superior, we claim, to any rule requiring a fixed level of
indentation from the margin. The four-space rule is clear but indentation from the margin. The four-space rule is clear but
unnatural. It is quite unintuitive that unnatural. It is quite unintuitive that
``` markdown ``` markdown
- foo - foo
bar bar
skipping to change at line 3461 skipping to change at line 3570
blank lines: blank lines:
I need to buy I need to buy
- new shoes - new shoes
- a coat - a coat
- a plane ticket - a plane ticket
Second, we are attracted to a Second, we are attracted to a
> [principle of uniformity](@principle-of-uniformity): > [principle of uniformity](@principle-of-uniformity):
> if a span of text has a certain > if a chunk of text has a certain
> meaning, it will continue to have the same meaning when put into a list > meaning, it will continue to have the same meaning when put into a
> item. > container block (such as a list item or blockquote).
(Indeed, the spec for [list items](#list-item) presupposes this.) (Indeed, the spec for [list items](#list-item) and
[blockquotes](#block-quotes) presupposes this principle.)
This principle implies that if This principle implies that if
* I need to buy * I need to buy
- new shoes - new shoes
- a coat - a coat
- a plane ticket - a plane ticket
is a list item containing a paragraph followed by a nested sublist, is a list item containing a paragraph followed by a nested sublist,
as all Markdown implementations agree it is (though the paragraph as all Markdown implementations agree it is (though the paragraph
may be rendered without `<p>` tags, since the list is "tight"), may be rendered without `<p>` tags, since the list is "tight"),
skipping to change at line 4027 skipping to change at line 4137
valid HTML entities in any context are recognized as such and valid HTML entities in any context are recognized as such and
converted into unicode characters before they are stored in the AST. converted into unicode characters before they are stored in the AST.
This allows implementations that target HTML output to trivially escape This allows implementations that target HTML output to trivially escape
the entities when generating HTML, and simplifies the job of the entities when generating HTML, and simplifies the job of
implementations targetting other languages, as these will only need to implementations targetting other languages, as these will only need to
handle the unicode chars and need not be HTML-entity aware. handle the unicode chars and need not be HTML-entity aware.
[Named entities](@name-entities) consist of `&` [Named entities](@name-entities) consist of `&`
+ any of the valid HTML5 entity names + `;`. The + any of the valid HTML5 entity names + `;`. The
[following document](http://www.whatwg.org/specs/web-apps/current-work/multipage /entities.json) [following document](https://html.spec.whatwg.org/multipage/entities.json)
is used as an authoritative source of the valid entity names and their is used as an authoritative source of the valid entity names and their
corresponding codepoints. corresponding codepoints.
Conforming implementations that target HTML don't need to generate Conforming implementations that target HTML don't need to generate
entities for all the valid named entities that exist, with the exception entities for all the valid named entities that exist, with the exception
of `"` (`&quot;`), `&` (`&amp;`), `<` (`&lt;`) and `>` (`&gt;`), which of `"` (`&quot;`), `&` (`&amp;`), `<` (`&lt;`) and `>` (`&gt;`), which
always need to be written as entities for security reasons. always need to be written as entities for security reasons.
. .
&nbsp; &amp; &copy; &AElig; &Dcaron; &frac34; &HilbertSpace; &DifferentialD; &Cl ockwiseContourIntegral; &nbsp; &amp; &copy; &AElig; &Dcaron; &frac34; &HilbertSpace; &DifferentialD; &Cl ockwiseContourIntegral;
skipping to change at line 4145 skipping to change at line 4255
<pre><code>f&amp;ouml;f&amp;ouml; <pre><code>f&amp;ouml;f&amp;ouml;
</code></pre> </code></pre>
. .
## Code span ## Code span
A [backtick string](@backtick-string) A [backtick string](@backtick-string)
is a string of one or more backtick characters (`` ` ``) that is neither is a string of one or more backtick characters (`` ` ``) that is neither
preceded nor followed by a backtick. preceded nor followed by a backtick.
A [code span](@code-span) begins with a backtick string and ends with a backtick A [code span](@code-span) begins with a backtick string and ends with
string of equal length. The contents of the code span are the a backtick string of equal length. The contents of the code span are
characters between the two backtick strings, with leading and trailing the characters between the two backtick strings, with leading and
spaces and newlines removed, and consecutive spaces and newlines trailing spaces and [line endings](#line-ending) removed, and
collapsed to single spaces. [whitespace](#whitespace) collapsed to single spaces.
This is a simple code span: This is a simple code span:
. .
`foo` `foo`
. .
<p><code>foo</code></p> <p><code>foo</code></p>
. .
Here two backticks are used, because the code contains a backtick. Here two backticks are used, because the code contains a backtick.
skipping to change at line 4177 skipping to change at line 4287
This example shows the motivation for stripping leading and trailing This example shows the motivation for stripping leading and trailing
spaces: spaces:
. .
` `` ` ` `` `
. .
<p><code>``</code></p> <p><code>``</code></p>
. .
Newlines are treated like spaces: [Line endings](#line-ending) are treated like spaces:
. .
`` ``
foo foo
`` ``
. .
<p><code>foo</code></p> <p><code>foo</code></p>
. .
Interior spaces and newlines are collapsed into single spaces, just Interior spaces and [line endings](#line-ending) are collapsed into
as they would be by a browser: single spaces, just as they would be by a browser:
. .
`foo bar `foo bar
baz` baz`
. .
<p><code>foo bar baz</code></p> <p><code>foo bar baz</code></p>
. .
Q: Why not just leave the spaces, since browsers will collapse them Q: Why not just leave the spaces, since browsers will collapse them
anyway? A: Because we might be targeting a non-HTML format, and we anyway? A: Because we might be targeting a non-HTML format, and we
shouldn't rely on HTML-specific rendering assumptions. shouldn't rely on HTML-specific rendering assumptions.
(Existing implementations differ in their treatment of internal (Existing implementations differ in their treatment of internal
spaces and newlines. Some, including `Markdown.pl` and spaces and [line endings](#line-ending). Some, including `Markdown.pl` and
`showdown`, convert an internal newline into a `<br />` tag. `showdown`, convert an internal [line ending](#line-ending) into a
But this makes things difficult for those who like to hard-wrap `<br />` tag. But this makes things difficult for those who like to
their paragraphs, since a line break in the midst of a code hard-wrap their paragraphs, since a line break in the midst of a code
span will cause an unintended line break in the output. Others span will cause an unintended line break in the output. Others just
just leave internal spaces as they are, which is fine if only leave internal spaces as they are, which is fine if only HTML is being
HTML is being targeted.) targeted.)
. .
`foo `` bar` `foo `` bar`
. .
<p><code>foo `` bar</code></p> <p><code>foo `` bar</code></p>
. .
Note that backslash escapes do not work in code spans. All backslashes Note that backslash escapes do not work in code spans. All backslashes
are treated literally: are treated literally:
skipping to change at line 4322 skipping to change at line 4432
Many implementations have also restricted intraword emphasis to Many implementations have also restricted intraword emphasis to
the `*` forms, to avoid unwanted emphasis in words containing the `*` forms, to avoid unwanted emphasis in words containing
internal underscores. (It is best practice to put these in code internal underscores. (It is best practice to put these in code
spans, but users often do not.) spans, but users often do not.)
``` markdown ``` markdown
internal emphasis: foo*bar*baz internal emphasis: foo*bar*baz
no emphasis: foo_bar_baz no emphasis: foo_bar_baz
``` ```
The following rules capture all of these patterns, while allowing The rules given below capture all of these patterns, while allowing
for efficient parsing strategies that do not backtrack: for efficient parsing strategies that do not backtrack.
First, some definitions. A [delimiter run](@delimiter-run) is either
a sequence of one or more `*` characters that is not preceded or
followed by a `*` character, or a sequence of one or more `_`
characters that is not preceded or followed by a `_` character.
A [left-flanking delimiter run](@right-facing-delimiter-run) is
a [delimiter run](#delimiter-run) that is (a) not followed by [unicode
whitespace](#unicode-whitespace), and (b) either not followed by a
[punctuation character](#punctuation-character), or
preceded by [unicode whitespace](#unicode-whitespace) or
a [punctuation character](#punctuation-character).
A [right-flanking delimiter run](@left-facing-delimiter-run) is
a [delimiter run](#delimiter-run) that is (a) not preceded by [unicode
whitespace](#unicode-whitespace), and (b) either not preceded by a
[punctuation character](#punctuation-character), or
followed by [unicode whitespace](#unicode-whitespace) or
a [punctuation character](#punctuation-character).
Here are some examples of delimiter runs.
- left-flanking but not right-flanking:
```
***abc
_abc
**"abc"
_"abc"
```
- right-flanking but not left-flanking:
```
abc***
abc_
"abc"**
_"abc"
```
- Both right and right-flanking:
```
abc***def
"abc"_"def"
```
- Neither right nor right-flanking:
```
abc *** def
a _ b
```
(The idea of distinguishing left-flanking and right-flanking
delimiter runs based on the character before and the character
after comes from Roopesh Chander's
[vfmd](http://www.vfmd.org/vfmd-spec/specification/#procedure-for-identifying-em
phasis-tags).
vfmd uses the terminology "emphasis indicator string" instead of "delimiter
run," and its rules for distinguishing left- and right-flanking runs
are a bit more complex than the ones given here.)
The following rules define emphasis and strong emphasis:
1. A single `*` character [can open emphasis](@can-open-emphasis) 1. A single `*` character [can open emphasis](@can-open-emphasis)
iff it is not followed by iff it is part of a
whitespace. [left-flanking delimiter run](#right-facing-delimiter-run).
2. A single `_` character [can open emphasis](#can-open-emphasis) iff 2. A single `_` character [can open emphasis](#can-open-emphasis) iff
it is not followed by whitespace and it is not preceded by an it is part of a
ASCII alphanumeric character. [left-flanking delimiter run](#right-facing-delimiter-run)
and is not preceded by an ASCII alphanumeric character.
3. A single `*` character [can close emphasis](@can-close-emphasis) 3. A single `*` character [can close emphasis](@can-close-emphasis)
iff it is not preceded by whitespace. iff it is part of a
[left-flanking delimiter run](#right-facing-delimiter-run).
4. A single `_` character [can close emphasis](#can-close-emphasis) iff 4. A single `_` character [can close emphasis](#can-close-emphasis)
it is not preceded by whitespace and it is not followed by an iff it is part of a
ASCII alphanumeric character. [left-flanking delimiter run](#right-facing-delimiter-run).
and it is not followed by an ASCII alphanumeric character.
5. A double `**` [can open strong emphasis](@can-open-strong-emphasis) 5. A double `**` [can open strong emphasis](@can-open-strong-emphasis)
iff it is not followed by iff it is part of a
whitespace. [left-flanking delimiter run](#right-facing-delimiter-run).
6. A double `__` [can open strong emphasis](#can-open-strong-emphasis) 6. A double `__` [can open strong emphasis](#can-open-strong-emphasis)
iff it is not followed by whitespace and it is not preceded by an iff it is part of a
ASCII alphanumeric character. [left-flanking delimiter run](#right-facing-delimiter-run)
and is not preceded by an ASCII alphanumeric character.
7. A double `**` [can close strong emphasis](@can-close-strong-emphasis) 7. A double `**` [can close strong emphasis](@can-close-strong-emphasis)
iff it is not preceded by iff it is part of a
whitespace. [right-flanking delimiter run](#right-facing-delimiter-run).
8. A double `__` [can close strong emphasis](#can-close-strong-emphasis) 8. A double `__` [can close strong emphasis](#can-close-strong-emphasis)
iff it is not preceded by whitespace and it is not followed by an iff it is part of a
ASCII alphanumeric character. [right-flanking delimiter run](#right-facing-delimiter-run).
and is not followed by an ASCII alphanumeric character.
9. Emphasis begins with a delimiter that [can open 9. Emphasis begins with a delimiter that [can open
emphasis](#can-open-emphasis) and ends with a delimiter that [can close emphasis](#can-open-emphasis) and ends with a delimiter that [can close
emphasis](#can-close-emphasis), and that uses the same emphasis](#can-close-emphasis), and that uses the same
character (`_` or `*`) as the opening delimiter. There must character (`_` or `*`) as the opening delimiter. There must
be a nonempty sequence of inlines between the open delimiter be a nonempty sequence of inlines between the open delimiter
and the closing delimiter; these form the contents of the emphasis and the closing delimiter; these form the contents of the emphasis
inline. inline.
10. Strong emphasis begins with a delimiter that [can open strong 10. Strong emphasis begins with a delimiter that [can open strong
skipping to change at line 4422 skipping to change at line 4600
Rule 1: Rule 1:
. .
*foo bar* *foo bar*
. .
<p><em>foo bar</em></p> <p><em>foo bar</em></p>
. .
This is not emphasis, because the opening `*` is followed by This is not emphasis, because the opening `*` is followed by
whitespace: whitespace, and hence not part of a [left-flanking delimiter
run](#right-facing-delimiter-run):
. .
a * foo bar* a * foo bar*
. .
<p>a * foo bar*</p> <p>a * foo bar*</p>
. .
This is not emphasis, because the opening `*` is preceded
by an alphanumeric and followed by punctuation, and hence
not part of a [left-flanking delimiter run](#right-facing-delimiter-run):
.
a*"foo"*
.
<p>a*&quot;foo&quot;*</p>
.
Unicode nonbreaking spaces count as whitespace, too:
.
* a *
.
<p>* a *</p>
.
Intraword emphasis with `*` is permitted: Intraword emphasis with `*` is permitted:
. .
foo*bar* foo*bar*
. .
<p>foo<em>bar</em></p> <p>foo<em>bar</em></p>
. .
. .
5*6*78 5*6*78
skipping to change at line 4461 skipping to change at line 4658
This is not emphasis, because the opening `*` is followed by This is not emphasis, because the opening `*` is followed by
whitespace: whitespace:
. .
_ foo bar_ _ foo bar_
. .
<p>_ foo bar_</p> <p>_ foo bar_</p>
. .
This is not emphasis, because the opening `_` is preceded
by an alphanumeric and followed by punctuation:
.
a_"foo"_
.
<p>a_&quot;foo&quot;_</p>
.
Emphasis with `_` is not allowed inside ASCII words: Emphasis with `_` is not allowed inside ASCII words:
. .
foo_bar_ foo_bar_
. .
<p>foo_bar_</p> <p>foo_bar_</p>
. .
. .
5_6_78 5_6_78
skipping to change at line 4485 skipping to change at line 4691
But it is permitted inside non-ASCII words: But it is permitted inside non-ASCII words:
. .
пристаням_стремятся_ пристаням_стремятся_
. .
<p>пристаням<em>стремятся</em></p> <p>пристаням<em>стремятся</em></p>
. .
Rule 3: Rule 3:
This is not emphasis, because the closing delimiter does
not match the opening delimiter:
.
_foo*
.
<p>_foo*</p>
.
This is not emphasis, because the closing `*` is preceded by This is not emphasis, because the closing `*` is preceded by
whitespace: whitespace:
. .
*foo bar * *foo bar *
. .
<p>*foo bar *</p> <p>*foo bar *</p>
. .
This is not emphasis, because the second `*` is
preceded by punctuation and followed by an alphanumeric
(hence it is not part of a [right-flanking delimiter
run](#left-facing-delimiter-run):
.
*(*foo)
.
<p>*(*foo)</p>
.
The point of this restriction is more easily appreciated
with this example:
.
*(*foo*)*
.
<p><em>(<em>foo</em>)</em></p>
.
Intraword emphasis with `*` is allowed: Intraword emphasis with `*` is allowed:
. .
*foo*bar *foo*bar
. .
<p><em>foo</em>bar</p> <p><em>foo</em>bar</p>
. .
Rule 4: Rule 4:
This is not emphasis, because the closing `_` is preceded by This is not emphasis, because the closing `_` is preceded by
whitespace: whitespace:
. .
_foo bar _ _foo bar _
. .
<p>_foo bar _</p> <p>_foo bar _</p>
. .
Intraword emphasis: This is not emphasis, because the second `_` is
preceded by punctuation and followed by an alphanumeric:
.
_(_foo)
.
<p>_(_foo)</p>
.
This is emphasis within emphasis:
.
_(_foo_)_
.
<p><em>(<em>foo</em>)</em></p>
.
Intraword emphasis is disallowed for `_`:
. .
_foo_bar _foo_bar
. .
<p>_foo_bar</p> <p>_foo_bar</p>
. .
. .
_пристаням_стремятся _пристаням_стремятся
. .
skipping to change at line 4550 skipping to change at line 4802
This is not strong emphasis, because the opening delimiter is This is not strong emphasis, because the opening delimiter is
followed by whitespace: followed by whitespace:
. .
** foo bar** ** foo bar**
. .
<p>** foo bar**</p> <p>** foo bar**</p>
. .
This is not strong emphasis, because the opening `**` is preceded
by an alphanumeric and followed by punctuation, and hence
not part of a [left-flanking delimiter run](#right-facing-delimiter-run):
.
a**"foo"**
.
<p>a**&quot;foo&quot;**</p>
.
Intraword strong emphasis with `**` is permitted: Intraword strong emphasis with `**` is permitted:
. .
foo**bar** foo**bar**
. .
<p>foo<strong>bar</strong></p> <p>foo<strong>bar</strong></p>
. .
Rule 6: Rule 6:
skipping to change at line 4575 skipping to change at line 4837
This is not strong emphasis, because the opening delimiter is This is not strong emphasis, because the opening delimiter is
followed by whitespace: followed by whitespace:
. .
__ foo bar__ __ foo bar__
. .
<p>__ foo bar__</p> <p>__ foo bar__</p>
. .
Intraword emphasis examples: This is not strong emphasis, because the opening `__` is preceded
by an alphanumeric and followed by punctuation:
.
a__"foo"__
.
<p>a__&quot;foo&quot;__</p>
.
Intraword strong emphasis is forbidden with `__`:
. .
foo__bar__ foo__bar__
. .
<p>foo__bar__</p> <p>foo__bar__</p>
. .
. .
5__6__78 5__6__78
. .
skipping to change at line 4615 skipping to change at line 4886
. .
**foo bar ** **foo bar **
. .
<p>**foo bar **</p> <p>**foo bar **</p>
. .
(Nor can it be interpreted as an emphasized `*foo bar *`, because of (Nor can it be interpreted as an emphasized `*foo bar *`, because of
Rule 11.) Rule 11.)
This is not strong emphasis, because the second `**` is
preceded by punctuation and followed by an alphanumeric:
.
**(**foo)
.
<p>**(**foo)</p>
.
The point of this restriction is more easily appreciated
with these examples:
.
*(**foo**)*
.
<p><em>(<strong>foo</strong>)</em></p>
.
.
**Gomphocarpus (*Gomphocarpus physocarpus*, syn.
*Asclepias physocarpa*)**
.
<p><strong>Gomphocarpus (<em>Gomphocarpus physocarpus</em>, syn.
<em>Asclepias physocarpa</em>)</strong></p>
.
.
**foo "*bar*" foo**
.
<p><strong>foo &quot;<em>bar</em>&quot; foo</strong></p>
.
Intraword emphasis: Intraword emphasis:
. .
**foo**bar **foo**bar
. .
<p><strong>foo</strong>bar</p> <p><strong>foo</strong>bar</p>
. .
Rule 8: Rule 8:
This is not strong emphasis, because the closing delimiter is This is not strong emphasis, because the closing delimiter is
preceded by whitespace: preceded by whitespace:
. .
__foo bar __ __foo bar __
. .
<p>__foo bar __</p> <p>__foo bar __</p>
. .
Intraword strong emphasis examples: This is not strong emphasis, because the second `__` is
preceded by punctuation and followed by an alphanumeric:
.
__(__foo)
.
<p>__(__foo)</p>
.
The point of this restriction is more easily appreciated
with this example:
.
_(__foo__)_
.
<p><em>(<strong>foo</strong>)</em></p>
.
Intraword strong emphasis is forbidden with `__`:
. .
__foo__bar __foo__bar
. .
<p>__foo__bar</p> <p>__foo__bar</p>
. .
. .
__пристаням__стремятся __пристаням__стремятся
. .
skipping to change at line 5182 skipping to change at line 5503
. .
__a<http://foo.bar?q=__> __a<http://foo.bar?q=__>
. .
<p>__a<a href="http://foo.bar?q=__">http://foo.bar?q=__</a></p> <p>__a<a href="http://foo.bar?q=__">http://foo.bar?q=__</a></p>
. .
## Links ## Links
A link contains [link text](#link-label) (the visible text), A link contains [link text](#link-label) (the visible text),
a [destination](#destination) (the URI that is the link destination), a [link destination](#link-destination) (the URI that is the link destination),
and optionally a [link title](#link-title). There are two basic kinds and optionally a [link title](#link-title). There are two basic kinds
of links in Markdown. In [inline links](#inline-links) the destination of links in Markdown. In [inline links](#inline-link) the destination
and title are given immediately after the link text. In [reference and title are given immediately after the link text. In [reference
links](#reference-links) the destination and title are defined elsewhere links](#reference-link) the destination and title are defined elsewhere
in the document. in the document.
A [link text](@link-text) consists of a sequence of zero or more A [link text](@link-text) consists of a sequence of zero or more
inline elements enclosed by square brackets (`[` and `]`). The inline elements enclosed by square brackets (`[` and `]`). The
following rules apply: following rules apply:
- Links may not contain other links, at any level of nesting. - Links may not contain other links, at any level of nesting.
- Brackets are allowed in the [link text](#link-text) only if (a) they - Brackets are allowed in the [link text](#link-text) only if (a) they
are backslash-escaped or (b) they appear as a matched pair of brackets, are backslash-escaped or (b) they appear as a matched pair of brackets,
skipping to change at line 5237 skipping to change at line 5558
- a sequence of zero or more characters between straight single-quote - a sequence of zero or more characters between straight single-quote
characters (`'`), including a `'` character only if it is characters (`'`), including a `'` character only if it is
backslash-escaped, or backslash-escaped, or
- a sequence of zero or more characters between matching parentheses - a sequence of zero or more characters between matching parentheses
(`(...)`), including a `)` character only if it is backslash-escaped. (`(...)`), including a `)` character only if it is backslash-escaped.
An [inline link](@inline-link) An [inline link](@inline-link)
consists of a [link text](#link-text) followed immediately consists of a [link text](#link-text) followed immediately
by a left parenthesis `(`, optional whitespace, by a left parenthesis `(`, optional [whitespace](#whitespace),
an optional [link destination](#link-destination), an optional [link destination](#link-destination),
an optional [link title](#link-title) separated from the link an optional [link title](#link-title) separated from the link
destination by whitespace, optional whitespace, and a right destination by [whitespace](#whitespace), optional
parenthesis `)`. The link's text consists of the inlines contained [whitespace](#whitespace), and a right parenthesis `)`.
The link's text consists of the inlines contained
in the [link text](#link-text) (excluding the enclosing square brackets). in the [link text](#link-text) (excluding the enclosing square brackets).
The link's URI consists of the link destination, excluding enclosing The link's URI consists of the link destination, excluding enclosing
`<...>` if present, with backslash-escapes in effect as described `<...>` if present, with backslash-escapes in effect as described
above. The link's title consists of the link title, excluding its above. The link's title consists of the link title, excluding its
enclosing delimiters, with backslash-escapes in effect as described enclosing delimiters, with backslash-escapes in effect as described
above. above.
Here is a simple inline link: Here is a simple inline link:
. .
skipping to change at line 5413 skipping to change at line 5735
entities, or using a different quote type for the enclosing title---to entities, or using a different quote type for the enclosing title---to
write titles containing double quotes. `Markdown.pl`'s handling of write titles containing double quotes. `Markdown.pl`'s handling of
titles has a number of other strange features. For example, it allows titles has a number of other strange features. For example, it allows
single-quoted titles in inline links, but not reference links. And, in single-quoted titles in inline links, but not reference links. And, in
reference links but not inline links, it allows a title to begin with reference links but not inline links, it allows a title to begin with
`"` and end with `)`. `Markdown.pl` 1.0.1 even allows titles with no closing `"` and end with `)`. `Markdown.pl` 1.0.1 even allows titles with no closing
quotation mark, though 1.0.2b8 does not. It seems preferable to adopt quotation mark, though 1.0.2b8 does not. It seems preferable to adopt
a simple, rational rule that works the same way in inline links and a simple, rational rule that works the same way in inline links and
link reference definitions.) link reference definitions.)
Whitespace is allowed around the destination and title: [Whitespace](#whitespace) is allowed around the destination and title:
. .
[link]( /uri [link]( /uri
"title" ) "title" )
. .
<p><a href="/uri" title="title">link</a></p> <p><a href="/uri" title="title">link</a></p>
. .
But it is not allowed between the link text and the But it is not allowed between the link text and the
following parenthesis: following parenthesis:
skipping to change at line 5486 skipping to change at line 5808
. .
<p>[foo <a href="/uri">bar</a>](/uri)</p> <p>[foo <a href="/uri">bar</a>](/uri)</p>
. .
. .
[foo *[bar [baz](/uri)](/uri)*](/uri) [foo *[bar [baz](/uri)](/uri)*](/uri)
. .
<p>[foo <em>[bar <a href="/uri">baz</a>](/uri)</em>](/uri)</p> <p>[foo <em>[bar <a href="/uri">baz</a>](/uri)</em>](/uri)</p>
. .
.
![[[foo](uri1)](uri2)](uri3)
.
<p><img src="uri3" alt="[foo](uri2)" /></p>
.
These cases illustrate the precedence of link text grouping over These cases illustrate the precedence of link text grouping over
emphasis grouping: emphasis grouping:
. .
*[foo*](/uri) *[foo*](/uri)
. .
<p>*<a href="/uri">foo*</a></p> <p>*<a href="/uri">foo*</a></p>
. .
. .
skipping to change at line 5527 skipping to change at line 5855
[foo<http://example.com?search=](uri)> [foo<http://example.com?search=](uri)>
. .
<p>[foo<a href="http://example.com?search=%5D(uri)">http://example.com?search=]( uri)</a></p> <p>[foo<a href="http://example.com?search=%5D(uri)">http://example.com?search=]( uri)</a></p>
. .
There are three kinds of [reference links](@reference-link): There are three kinds of [reference links](@reference-link):
[full](#full-reference-link), [collapsed](#collapsed-reference-link), [full](#full-reference-link), [collapsed](#collapsed-reference-link),
and [shortcut](#shortcut-reference-link). and [shortcut](#shortcut-reference-link).
A [full reference link](@full-reference-link) A [full reference link](@full-reference-link)
consists of a [link text](#link-text), optional whitespace, and consists of a [link text](#link-text),
optional [whitespace](#whitespace), and
a [link label](#link-label) that [matches](#matches) a a [link label](#link-label) that [matches](#matches) a
[link reference definition](#link-reference-definition) elsewhere in the [link reference definition](#link-reference-definition) elsewhere in the
document. document.
A [link label](@link-label) begins with a left bracket (`[`) and ends A [link label](@link-label) begins with a left bracket (`[`) and ends
with the first right bracket (`]`) that is not backslash-escaped. with the first right bracket (`]`) that is not backslash-escaped.
Unescaped square bracket characters are not allowed in Unescaped square bracket characters are not allowed in
[link labels](#link-label). A link label can have at most 999 [link labels](#link-label). A link label can have at most 999
characters inside the square brackets. characters inside the square brackets.
One label [matches](@matches) One label [matches](@matches)
another just in case their normalized forms are equal. To normalize a another just in case their normalized forms are equal. To normalize a
label, perform the *unicode case fold* and collapse consecutive internal label, perform the *unicode case fold* and collapse consecutive internal
whitespace to a single space. If there are multiple matching reference [whitespace](#whitespace) to a single space. If there are multiple
link definitions, the one that comes first in the document is used. (It matching reference link definitions, the one that comes first in the
is desirable in such cases to emit a warning.) document is used. (It is desirable in such cases to emit a warning.)
The contents of the first link label are parsed as inlines, which are The contents of the first link label are parsed as inlines, which are
used as the link's text. The link's URI and title are provided by the used as the link's text. The link's URI and title are provided by the
matching [link reference definition](#link-reference-definition). matching [link reference definition](#link-reference-definition).
Here is a simple example: Here is a simple example:
. .
[foo][bar] [foo][bar]
skipping to change at line 5687 skipping to change at line 6016
Unicode case fold is used: Unicode case fold is used:
. .
[Толпой][Толпой] is a Russian word. [Толпой][Толпой] is a Russian word.
[ТОЛПОЙ]: /url [ТОЛПОЙ]: /url
. .
<p><a href="/url">Толпой</a> is a Russian word.</p> <p><a href="/url">Толпой</a> is a Russian word.</p>
. .
Consecutive internal whitespace is treated as one space for Consecutive internal [whitespace](#whitespace) is treated as one space for
purposes of determining matching: purposes of determining matching:
. .
[Foo [Foo
bar]: /url bar]: /url
[Baz][Foo bar] [Baz][Foo bar]
. .
<p><a href="/url">Baz</a></p> <p><a href="/url">Baz</a></p>
. .
There can be whitespace between the [link text](#link-text) and the There can be [whitespace](#whitespace) between the
[link label](#link-label): [link text](#link-text) and the [link label](#link-label):
. .
[foo] [bar] [foo] [bar]
[bar]: /url "title" [bar]: /url "title"
. .
<p><a href="/url" title="title">foo</a></p> <p><a href="/url" title="title">foo</a></p>
. .
. .
skipping to change at line 5786 skipping to change at line 6115
[ref\[]: /uri [ref\[]: /uri
. .
<p><a href="/uri">foo</a></p> <p><a href="/uri">foo</a></p>
. .
A [collapsed reference link](@collapsed-reference-link) A [collapsed reference link](@collapsed-reference-link)
consists of a [link consists of a [link
label](#link-label) that [matches](#matches) a [link reference label](#link-label) that [matches](#matches) a [link reference
definition](#link-reference-definition) elsewhere in the definition](#link-reference-definition) elsewhere in the
document, optional whitespace, and the string `[]`. The contents of the document, optional [whitespace](#whitespace), and the string `[]`.
first link label are parsed as inlines, which are used as the link's The contents of the first link label are parsed as inlines,
text. The link's URI and title are provided by the matching reference which are used as the link's text. The link's URI and title are
link definition. Thus, `[foo][]` is equivalent to `[foo][foo]`. provided by the matching reference link definition. Thus,
`[foo][]` is equivalent to `[foo][foo]`.
. .
[foo][] [foo][]
[foo]: /url "title" [foo]: /url "title"
. .
<p><a href="/url" title="title">foo</a></p> <p><a href="/url" title="title">foo</a></p>
. .
. .
skipping to change at line 5817 skipping to change at line 6147
The link labels are case-insensitive: The link labels are case-insensitive:
. .
[Foo][] [Foo][]
[foo]: /url "title" [foo]: /url "title"
. .
<p><a href="/url" title="title">Foo</a></p> <p><a href="/url" title="title">Foo</a></p>
. .
As with full reference links, whitespace is allowed As with full reference links, [whitespace](#whitespace) is allowed
between the two sets of brackets: between the two sets of brackets:
. .
[foo] [foo]
[] []
[foo]: /url "title" [foo]: /url "title"
. .
<p><a href="/url" title="title">foo</a></p> <p><a href="/url" title="title">foo</a></p>
. .
skipping to change at line 6092 skipping to change at line 6422
The labels are case-insensitive: The labels are case-insensitive:
. .
![Foo][] ![Foo][]
[foo]: /url "title" [foo]: /url "title"
. .
<p><img src="/url" alt="Foo" title="title" /></p> <p><img src="/url" alt="Foo" title="title" /></p>
. .
As with full reference links, whitespace is allowed As with full reference links, [whitespace](#whitespace) is allowed
between the two sets of brackets: between the two sets of brackets:
. .
![foo] ![foo]
[] []
[foo]: /url "title" [foo]: /url "title"
. .
<p><img src="/url" alt="foo" title="title" /></p> <p><img src="/url" alt="foo" title="title" /></p>
. .
skipping to change at line 6178 skipping to change at line 6508
They are parsed as links, with the URL or email address as the link They are parsed as links, with the URL or email address as the link
label. label.
A [URI autolink](@uri-autolink) A [URI autolink](@uri-autolink)
consists of `<`, followed by an [absolute consists of `<`, followed by an [absolute
URI](#absolute-uri) not containing `<`, followed by `>`. It is parsed URI](#absolute-uri) not containing `<`, followed by `>`. It is parsed
as a link to the URI, with the URI as the link's label. as a link to the URI, with the URI as the link's label.
An [absolute URI](@absolute-uri), An [absolute URI](@absolute-uri),
for these purposes, consists of a [scheme](#scheme) followed by a colon (`:`) for these purposes, consists of a [scheme](#scheme) followed by a colon (`:`)
followed by zero or more characters other than ASCII whitespace and followed by zero or more characters other than ASCII
control characters, `<`, and `>`. If the URI includes these characters, [whitespace](#whitespace) and control characters, `<`, and `>`. If
you must use percent-encoding (e.g. `%20` for a space). the URI includes these characters, you must use percent-encoding
(e.g. `%20` for a space).
The following [schemes](@scheme) The following [schemes](@scheme)
are recognized (case-insensitive): are recognized (case-insensitive):
`coap`, `doi`, `javascript`, `aaa`, `aaas`, `about`, `acap`, `cap`, `coap`, `doi`, `javascript`, `aaa`, `aaas`, `about`, `acap`, `cap`,
`cid`, `crid`, `data`, `dav`, `dict`, `dns`, `file`, `ftp`, `geo`, `go`, `cid`, `crid`, `data`, `dav`, `dict`, `dns`, `file`, `ftp`, `geo`, `go`,
`gopher`, `h323`, `http`, `https`, `iax`, `icap`, `im`, `imap`, `info`, `gopher`, `h323`, `http`, `https`, `iax`, `icap`, `im`, `imap`, `info`,
`ipp`, `iris`, `iris.beep`, `iris.xpc`, `iris.xpcs`, `iris.lwz`, `ldap`, `ipp`, `iris`, `iris.beep`, `iris.xpc`, `iris.xpcs`, `iris.lwz`, `ldap`,
`mailto`, `mid`, `msrp`, `msrps`, `mtqp`, `mupdate`, `news`, `nfs`, `mailto`, `mid`, `msrp`, `msrps`, `mtqp`, `mupdate`, `news`, `nfs`,
`ni`, `nih`, `nntp`, `opaquelocktoken`, `pop`, `pres`, `rtsp`, `ni`, `nih`, `nntp`, `opaquelocktoken`, `pop`, `pres`, `rtsp`,
`service`, `session`, `shttp`, `sieve`, `sip`, `sips`, `sms`, `snmp`,` `service`, `session`, `shttp`, `sieve`, `sip`, `sips`, `sms`, `snmp`,`
skipping to change at line 6252 skipping to change at line 6583
. .
An [email autolink](@email-autolink) An [email autolink](@email-autolink)
consists of `<`, followed by an [email address](#email-address), consists of `<`, followed by an [email address](#email-address),
followed by `>`. The link's label is the email address, followed by `>`. The link's label is the email address,
and the URL is `mailto:` followed by the email address. and the URL is `mailto:` followed by the email address.
An [email address](@email-address), An [email address](@email-address),
for these purposes, is anything that matches for these purposes, is anything that matches
the [non-normative regex from the HTML5 the [non-normative regex from the HTML5
spec](http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#e-m ail-state-%28type=email%29): spec](https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email )):
/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0- 9])? /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0- 9])?
(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
Examples of email autolinks: Examples of email autolinks:
. .
<[email protected]> <[email protected]>
. .
<p><a href="mailto:[email protected]">[email protected]</a></p> <p><a href="mailto:[email protected]">[email protected]</a></p>
skipping to change at line 6327 skipping to change at line 6658
Text between `<` and `>` that looks like an HTML tag is parsed as a Text between `<` and `>` that looks like an HTML tag is parsed as a
raw HTML tag and will be rendered in HTML without escaping. raw HTML tag and will be rendered in HTML without escaping.
Tag and attribute names are not limited to current HTML tags, Tag and attribute names are not limited to current HTML tags,
so custom tags (and even, say, DocBook tags) may be used. so custom tags (and even, say, DocBook tags) may be used.
Here is the grammar for tags: Here is the grammar for tags:
A [tag name](@tag-name) consists of an ASCII letter A [tag name](@tag-name) consists of an ASCII letter
followed by zero or more ASCII letters or digits. followed by zero or more ASCII letters or digits.
An [attribute](@attribute) consists of whitespace, An [attribute](@attribute) consists of [whitespace](#whitespace),
an [attribute name](#attribute-name), and an optional an [attribute name](#attribute-name), and an optional
[attribute value specification](#attribute-value-specification). [attribute value specification](#attribute-value-specification).
An [attribute name](@attribute-name) An [attribute name](@attribute-name)
consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII
letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML
specification restricted to ASCII. HTML5 is laxer.) specification restricted to ASCII. HTML5 is laxer.)
An [attribute value specification](@attribute-value-specification) An [attribute value specification](@attribute-value-specification)
consists of optional whitespace, consists of optional [whitespace](#whitespace),
a `=` character, optional whitespace, and an [attribute a `=` character, optional [whitespace](#whitespace), and an [attribute
value](#attribute-value). value](#attribute-value).
An [attribute value](@attribute-value) An [attribute value](@attribute-value)
consists of an [unquoted attribute value](#unquoted-attribute-value), consists of an [unquoted attribute value](#unquoted-attribute-value),
a [single-quoted attribute value](#single-quoted-attribute-value), a [single-quoted attribute value](#single-quoted-attribute-value),
or a [double-quoted attribute value](#double-quoted-attribute-value). or a [double-quoted attribute value](#double-quoted-attribute-value).
An [unquoted attribute value](@unquoted-attribute-value) An [unquoted attribute value](@unquoted-attribute-value)
is a nonempty string of characters not is a nonempty string of characters not
including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``. including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``.
skipping to change at line 6360 skipping to change at line 6691
A [single-quoted attribute value](@single-quoted-attribute-value) A [single-quoted attribute value](@single-quoted-attribute-value)
consists of `'`, zero or more consists of `'`, zero or more
characters not including `'`, and a final `'`. characters not including `'`, and a final `'`.
A [double-quoted attribute value](@double-quoted-attribute-value) A [double-quoted attribute value](@double-quoted-attribute-value)
consists of `"`, zero or more consists of `"`, zero or more
characters not including `"`, and a final `"`. characters not including `"`, and a final `"`.
An [open tag](@open-tag) consists of a `<` character, An [open tag](@open-tag) consists of a `<` character,
a [tag name](#tag-name), zero or more [attributes](#attribute), a [tag name](#tag-name), zero or more [attributes](#attribute),
optional whitespace, an optional `/` character, and a `>` character. optional [whitespace](#whitespace), an optional `/` character, and a
`>` character.
A [closing tag](@closing-tag) consists of the A [closing tag](@closing-tag) consists of the
string `</`, a [tag name](#tag-name), optional whitespace, and the string `</`, a [tag name](#tag-name), optional
character `>`. [whitespace](#whitespace), and the character `>`.
An [HTML comment](@html-comment) consists of the An [HTML comment](@html-comment) consists of the
string `<!--`, a string of characters not including the string `--`, and string `<!--`, a string of characters not including the string `--`, and
the string `-->`. the string `-->`.
A [processing instruction](@processing-instruction) A [processing instruction](@processing-instruction)
consists of the string `<?`, a string consists of the string `<?`, a string
of characters not including the string `?>`, and the string of characters not including the string `?>`, and the string
`?>`. `?>`.
A [declaration](@declaration) consists of the A [declaration](@declaration) consists of the
string `<!`, a name consisting of one or more uppercase ASCII letters, string `<!`, a name consisting of one or more uppercase ASCII letters,
whitespace, a string of characters not including the character `>`, and [whitespace](#whitespace), a string of characters not including the
the character `>`. character `>`, and the character `>`.
A [CDATA section](@cdata-section) consists of A [CDATA section](@cdata-section) consists of
the string `<![CDATA[`, a string of characters not including the string the string `<![CDATA[`, a string of characters not including the string
`]]>`, and the string `]]>`. `]]>`, and the string `]]>`.
An [HTML tag](@html-tag) consists of an [open An [HTML tag](@html-tag) consists of an [open
tag](#open-tag), a [closing tag](#closing-tag), an [HTML tag](#open-tag), a [closing tag](#closing-tag), an [HTML
comment](#html-comment), a [processing comment](#html-comment), a [processing instruction](#processing-instruction),
instruction](#processing-instruction), an [element type a [declaration](#declaration), or a [CDATA section](#cdata-section).
declaration](#element-type-declaration), or a [CDATA
section](#cdata-section).
Here are some simple open tags: Here are some simple open tags:
. .
<a><bab><c2c> <a><bab><c2c>
. .
<p><a><bab><c2c></p> <p><a><bab><c2c></p>
. .
Empty elements: Empty elements:
. .
<a/><b2/> <a/><b2/>
. .
<p><a/><b2/></p> <p><a/><b2/></p>
. .
Whitespace is allowed: [Whitespace](#whitespace) is allowed:
. .
<a /><b2 <a /><b2
data="foo" > data="foo" >
. .
<p><a /><b2 <p><a /><b2
data="foo" ></p> data="foo" ></p>
. .
With attributes: With attributes:
skipping to change at line 6451 skipping to change at line 6781
. .
Illegal attribute values: Illegal attribute values:
. .
<a href="hi'> <a href=hi'> <a href="hi'> <a href=hi'>
. .
<p>&lt;a href=&quot;hi'&gt; &lt;a href=hi'&gt;</p> <p>&lt;a href=&quot;hi'&gt; &lt;a href=hi'&gt;</p>
. .
Illegal whitespace: Illegal [whitespace](#whitespace):
. .
< a>< < a><
foo><bar/ > foo><bar/ >
. .
<p>&lt; a&gt;&lt; <p>&lt; a&gt;&lt;
foo&gt;&lt;bar/ &gt;</p> foo&gt;&lt;bar/ &gt;</p>
. .
Missing whitespace: Missing [whitespace](#whitespace):
. .
<a href='bar'title=title> <a href='bar'title=title>
. .
<p>&lt;a href='bar'title=title&gt;</p> <p>&lt;a href='bar'title=title&gt;</p>
. .
Closing tags: Closing tags:
. .
skipping to change at line 6564 skipping to change at line 6894
in HTML as a `<br />` tag): in HTML as a `<br />` tag):
. .
foo foo
baz baz
. .
<p>foo<br /> <p>foo<br />
baz</p> baz</p>
. .
For a more visible alternative, a backslash before the newline may be For a more visible alternative, a backslash before the
used instead of two spaces: [line ending](#line-ending) may be used instead of two spaces:
. .
foo\ foo\
baz baz
. .
<p>foo<br /> <p>foo<br />
baz</p> baz</p>
. .
More than two spaces can be used: More than two spaces can be used:
skipping to change at line 6688 skipping to change at line 7018
. .
### foo ### foo
. .
<h3>foo</h3> <h3>foo</h3>
. .
## Soft line breaks ## Soft line breaks
A regular line break (not in a code span or HTML tag) that is not A regular line break (not in a code span or HTML tag) that is not
preceded by two or more spaces is parsed as a softbreak. (A preceded by two or more spaces is parsed as a softbreak. (A
softbreak may be rendered in HTML either as a newline or as a space. softbreak may be rendered in HTML either as a
The result will be the same in browsers. In the examples here, a [line ending](#line-ending) or as a space. The result will be the same
newline will be used.) in browsers. In the examples here, a [line ending](#line-ending) will
be used.)
. .
foo foo
baz baz
. .
<p>foo <p>foo
baz</p> baz</p>
. .
Spaces at the end of the line and beginning of the next line are Spaces at the end of the line and beginning of the next line are
skipping to change at line 6925 skipping to change at line 7256
list_item list_item
paragraph paragraph
str "Qui " str "Qui "
emph emph
str "quodsi iracundia" str "quodsi iracundia"
list_item list_item
paragraph paragraph
str "aliquando id" str "aliquando id"
``` ```
Notice how the newline in the first paragraph has been parsed as Notice how the [line ending](#line-ending) in the first paragraph has
a `softbreak`, and the asterisks in the first list item have become been parsed as a `softbreak`, and the asterisks in the first list item
an `emph`. have become an `emph`.
The document can be rendered as HTML, or in any other format, given The document can be rendered as HTML, or in any other format, given
an appropriate renderer. an appropriate renderer.
 End of changes. 82 change blocks. 
164 lines changed or deleted 496 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/