CSS Text Module Level 3

W3C Working Draft 13 November 2012

This version:: http://www.w3.org/TR/2012/WD-css3-text-20121113/
Latest version:: http://www.w3.org/TR/css3-text/
Latest editor's draft:: http://dev.w3.org/csswg/css3-text/
Previous version:: http://www.w3.org/TR/2012/WD-css3-text-20120814/
Issues List:: http://www.w3.org/Style/CSS/Tracker/products/10
Discussion:: [email protected] with subject line “[css3-text] … message topic …”
Editors:: Elika J. Etemad (Mozilla); Koji Ishii (Rakuten, Inc.)

Abstract

This CSS3 module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, and text transformation.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This CSS module has been produced as a combined effort of the W3C Internationalization Activity, and the Style Activity and is maintained by the CSS Working Group. It also includes contributions made by participants in the XSL Working Group (members only).

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Feedback on this draft should be posted to the (archived) public mailing list [email protected] (see instructions) with [css3-text] in the subject line. You are strongly encouraged to complain if you see something stupid in this draft. The editors will do their best to respond to all feedback.

The following features are at risk and may be cut from the spec during its CR period if there are no (correct) implementations:

the ‘full-width’ value of ‘text-transform’
the <length> values of the ‘tab-size’ property
the ‘start end’ and ‘<string>’ values of ‘text-align’
the ‘text-justify’ property, particularly its ‘kashida’ value
the percentage values of ‘word-spacing’
minimum and maximum limits of ‘word-spacing’ and ‘letter-spacing’
the ‘hanging-punctuation’ property

1. Introduction

[document here]

This draft describes features that are specific to certain scripts. There is an ongoing discussion about where these features belong: in existing CSS properties, in new CSS properties, or perhaps in other specifications.

Text decoration has moved to CSS Text Decoration Module Level 3 [CSS3-TEXT-DECOR].

1.1. Module Interactions

This module replaces and extends the text-level features defined in [CSS21] chapter 16.

1.2. Values

This specification follows the CSS property definition conventions from [CSS21]. Value types not defined in this specification are defined in CSS Level 2 Revision 1 [CSS21]. Other CSS modules may expand the definitions of these value types: for example [CSS3COLOR], when combined with this module, expands the definition of the <color> value type as used in this specification.

In addition to the property-specific values listed in their definitions, all properties defined in this specification also accept the inherit keyword as their property value. For readability it has not been repeated explicitly.

1.3. Terminology

A grapheme cluster is what a language user considers to be a character or a basic unit of the script. The term is described in detail in the Unicode Technical Report: Text Boundaries [UAX29]. This specification uses the extended grapheme cluster definition in [UAX29] (not the legacy grapheme cluster definition). The UA may further tailor the definition as allowed by Unicode. Within this specification, the ambiguous term character is used as a friendlier synonym for grapheme cluster. See Characters and Properties for how to determine the Unicode properties of a character.

A letter for the purpose of this specification is a character belonging to one of the Letter or Number general categories in Unicode. [UAX44]

The rendering characteristics of a character divided by an element boundary is undefined: it may be rendered as belonging to either side of the boundary, or as some approximation of belonging to both. Authors are forewarned that dividing grapheme clusters by element boundaries may give inconsistent or undesired results.

The content language of an element is the (human) language the element is declared to be in, according to the rules of the document language. For example, the rules for determining the content language of an HTML element use the lang attribute and are defined in [HTML5], and the rules for determining the content language of an XML element use the xml:lang attribute and are defined in [XML10]. Note that it is possible for the content language of an element to be unknown.

Other terminology and concepts used in this specification are defined in [CSS21] and [CSS3-WRITING-MODES].

2. Transforming Text

2.1. Transforming Text: the ‘`text-transform`’ property

Name:	text-transform
Value:	none \| capitalize \| uppercase \| lowercase \| full-width
Initial:	none
Applies to:	all elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	as specified

This property transforms text for styling purposes. (It has no effect on the underlying content.) Values have the following meanings:

‘none’: No effects.
‘capitalize’: Puts the first letter of each word in titlecase; other characters are unaffected.
‘uppercase’: Puts all letters in uppercase.
‘lowercase’: Puts all letters in lowercase.
‘full-width’: Puts all characters in fullwidth form. If the character does not have a corresponding fullwidth form, it is left as is. This value is typically used to typeset Latin characters and digits like ideographic characters.

The case mapping rules for the character repertoire specified by the Unicode Standard can be found on the Unicode Consortium Web site [UNICODE]. The UA must use the full case mappings for Unicode characters, including any conditional casing rules, as defined in Default Case Algorithm section. If (and only if) the content language of the element is, according to the rules of the document language, known, then any appropriate language-specific rules must be applied as well. These minimally include, but are not limited to, the language-specific rules in Unicode's SpecialCasing.txt.

For example, in Turkish there are two “i”s, one with a dot—“İ” and “i”— and one without—“I” and “ı”. Thus the usual case mappings between “I” and “i” are replaced with a different set of mappings to their respective undotted/dotted counterparts, which do not exist in English. This mapping must only take effect if the content language is Turkish (or another Turkic language that uses Turkish casing rules); in other languages, the usual mapping of “I” and “i” is required. This rule is thus conditionally defined in Unicode's SpecialCasing.txt file.

The definition of "word" used for ‘capitalize’ is UA-dependent; [UAX29] is suggested (but not required) for determining such word boundaries. Authors should not expect ‘capitalize’ to follow language-specific titlecasing conventions (such as skipping articles in English).

The definition of fullwidth and halfwidth forms can be found on the Unicode consortium web site at [UAX11]. The mapping to fullwidth form is defined by taking code points with the <wide> or the <narrow> tag in their Decomposition_Mapping in [UAX44]. For the <narrow> tag, the mapping is from the code point to the decomposition (minus <narrow> tag), and for the <wide> tag, the mapping is from the decomposition (minus the <wide> tag) back to the original code point.

Text transformation happens after white space processing, which means that ‘full-width’ transforms only preserved U+0020 spaces to U+3000.

The following example converts the ASCII characters in abbreviations in Japanese to their fullwidth variants so that they lay out and line break like ideographs:

abbr:lang(ja) { text-transform: full-width; }

CSS may introduce the ability to create custom mapping tables for less common text transforms, such as by an ‘@text-transform’ rule similar to ‘@counter-style’ from [CSS3LIST].

3. White Space and Wrapping: the ‘`white-space`’ property

This property specifies two things:

whether and how white space inside the element is collapsed
whether lines may wrap at unforced soft wrap opportunities

Name:	white-space
Value:	normal \| pre \| nowrap \| pre-wrap \| pre-line
Initial:	not defined for shorthand properties
Applies to:	all elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	see individual properties

Values have the following meanings, which must be interpreted according to the White Space Processing and Line Breaking rules:

‘normal’: This value directs user agents to collapse sequences of white space into a single character (or in some cases, no character). Lines may wrap at allowed soft wrap opportunities, as determined by the line-breaking rules in effect, in order to minimize overflow.
‘pre’: This value prevents user agents from collapsing sequences of white space. Segment breaks such as line feeds and carriage returns are preserved as forced line breaks. Lines only break at forced line breaks; content that does not fit within the block container overflows it.
‘nowrap’: Like ‘normal’, this value collapses white space; but like ‘pre’, it does not allow wrapping.
‘pre-wrap’: Like ‘pre’, this value preserves white space; but like ‘normal’, it allows wrapping.
‘pre-line’: Like ‘normal’, this value collapses consecutive spaces and allows wrapping, but preserves segment breaks in the source as forced line breaks.

There have been requests for the ability to "discard" white space; the current definition has no facility for this.

The following informative table summarizes the behavior of various ‘white-space’ values:

	New Lines	Spaces and Tabs	Text Wrapping
‘`normal`’	Collapse	Collapse	Wrap
‘`pre`’	Preserve	Preserve	No wrap
‘`nowrap`’	Collapse	Collapse	No wrap
‘`pre-wrap`’	Preserve	Preserve	Wrap
‘`pre-line`’	Preserve	Collapse	Wrap

See White Space Processing Rules for details on how white space collapses. An informative summary of collapsing (‘normal’ and ‘nowrap’) is presented below:

A sequence of segment breaks and other white space between two Chinese, Japanese, or Yi characters collapses into nothing.
A zero width space before or after a white space sequence containing a segment break causes the entire sequence of white space to collapse into a zero width space.
Otherwise, consecutive white space collapses into a single space.

See Line Breaking for details on wrapping behavior.

4. White Space Processing Details

The source text of a document often contains formatting that is not relevant to the final rendering: for example, breaking the source into segments (lines) for ease of editing or adding white space characters such as tabs and spaces to indent the source code. CSS white space processing allows the author to control interpretation of such formatting: to preserve or collapse it away when rendering the document. White space processing in CSS interprets white space characters only for rendering: it has no effect on the underlying document data.

White space processing in CSS is controlled with the ‘white-space’ property.

CSS does not define document segmentation rules. Segments could be separated by a particular newline seqence (such as a line feed or CRLF pair), or delimited by some other mechanism, such as the SGML RECORD-START and RECORD-END tokens. For CSS processing, each document language–defined segment break, CRLF sequence (U+000D U+000A), carriage return (U+000D), and line feed (U+000A) in the text is treated as a segment break, which is then interpreted for rendering as specified by the ‘white-space’ property.

Note that the document parser may have not only normalized any segment breaks, but also collapsed other space characters or otherwise processed white space according to markup rules. Because CSS processing occurs after the parsing stage, it is not possible to restore these characters for styling. Therefore, some of the behavior specified below can be affected by these limitations and may be user agent dependent.

Note that anonymous inlines consisting entirely of collapsible white space are removed from the rendering tree. See [CSS21] section 9.2.2.1

Control characters (Unicode class Cc) other than tab (U+0009), line feed (U+000A), and carriage return (U+000D) are ignored for the purpose of rendering.

4.1. The White Space Processing Rules

White space processing affects only spaces (U+0020), tabs (U+0009), and segment breaks.

For each inline (including anonymous inlines) within an inline formatting context, white space characters are handled as follows, ignoring bidi formatting characters as if they were not there:

If ‘white-space’ is set to ‘normal’, ‘nowrap’, or ‘pre-line’, white space characters are considered collapsible and are processed by performing the following steps:
1. All spaces and tabs immediately preceding or following a segment break are removed.
2. Segment breaks are transformed for rendering according to the line break transformation rules.
3. Every tab is converted to a space (U+0020).
4. Any space immediately following another collapsible space —even one outside the boundary of the inline containing the space, provided they are within the same inline formatting context—is collapsed to have zero advance width. (It is invisible, but retains its soft wrap opportunity, if any.)
If ‘white-space’ is set to ‘pre-wrap’, any sequence of spaces is treated as a sequence of non-breaking spaces. However, a soft wrap opportunity exists at the end of the sequence.

Then, the entire block is rendered. Inlines are laid out, taking bidi reordering into account, and wrapping as specified by the ‘white-space’ property.

As each line is laid out,

A sequence of collapsible spaces at the beginning of a line is removed.
Each tab is rendered as a horizontal shift that lines up the start edge of the next glyph with the next tab stop. Tab stops occur at points that are multiples of the tab size from the block's starting content edge. The tab size is given by the ‘tab-size’ property.
A sequence of collapsible spaces at the end of a line is removed.
If spaces or tabs at the end of a line are non-collapsible but have ‘text-wrap’ set to ‘normal’ the UA may visually collapse their character advance widths.

White space that was not removed or collapsed during the white space processing steps is called preserved white space.

4.1.1. Example of bidirectionality with white space collapsing

Consider the following markup fragment, taking special note of spaces (with varied backgrounds and borders for emphasis and identification):

<ltr>A <rtl> B </rtl> C</ltr>

where the <ltr> element represents a left-to-right embedding and the <rtl> element represents a right-to-left embedding. If the ‘text-space-collapse’ property is set to ‘collapse’, the above processing model would result in the following:

The space before the B ( ) would collapse with the space after the A ( ).
The space before the C ( ) would collapse with the space after the B ( ).

This would leave two spaces, one after the A in the left-to-right embedding level, and one after the B in the right-to-left embedding level. This is then ordered according to the Unicode bidirectional algorithm, with the end result being:

A  BC

Note that there are two spaces between A and B, and none between B and C. This is best avoided by putting spaces outside the element instead of just inside the opening and closing tags and, where practical, by relying on implicit bidirectionality instead of explicit embedding levels.

4.1.2. Line Break Transformation Rules

When ‘white-space’ is ‘pre’, ‘pre-wrap’, or ‘pre-line’, segment breaks are not collapsible and are instead transformed into a preserved line feed (U+000A).

For other values of ‘white-space’, segment breaks are collapsible, and are either transformed into a space (U+0020) or removed depending on the context before and after the break:

If the character immediately before or immediately after the segment break is the zero-width space character (U+200B), then the break is removed, leaving behind the zero-width space.
Otherwise, if the East Asian Width property [UAX11] of both the character before and after the line feed is F, W, or H (not A), and neither side is Hangul, then the segment break is removed.
Otherwise, the segment break is converted to a space (U+0020).

Note that the white space processing rules have already removed any tabs and spaces after the segment break before these checks take place.

Comments on how well this would work in practice would be very much appreciated, particularly from people who work with Thai and similar scripts. Note that browser implementations do not currently follow these rules (although IE does in some cases transform the break).

4.2. Tab Character Size: the ‘`tab-size`’ property

Name:	tab-size
Value:	<integer> \| <length>
Initial:	8
Applies to:	block containers
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	the specified integer or length made absolute

This property determines the tab size used to render preserved tab characters (U+0009). Integers represent the measure as multiples of the space character's advance width (U+0020). Negative values are not allowed.

5. Line Breaking and Word Boundaries

When inline-level content is laid out into lines, it is broken across line boxes. Such a break is called a line break. When a line is broken due to explicit line-breaking controls, or due to the start or end of a block, it is a forced line break. When a line is broken due to content wrapping (i.e. when the UA creates unforced line breaks in order to fit the content within the measure), it is a soft wrap break. The process of breaking inline-level content into lines is called line breaking.

Wrapping is only performed at an allowed break point, called a soft wrap opportunity.

In most writing systems, in the absence of hyphenation a soft wrap opportunity occurs only at word boundaries. Many such systems use spaces or punctuation to explicitly separate words, and soft wrap opportunities can be identified by these characters. Scripts such as Thai, Lao, and Khmer, however, do not use spaces or punctuation to separate words. Although the zero width space (U+200B) can be used as an explicit word delimiter in these scripts, this practice is not common. As a result, a lexical resource is needed to correctly identify soft wrap opportunities in such texts.

In several other writing systems, (including Chinese, Japanese, Yi, and sometimes also Korean) a soft wrap opportunity is based on syllable boundaries, not word boundaries. In these systems a line can break anywhere except between certain character combinations. Additionally the level of strictness in these restrictions can vary with the typesetting style.

CSS does not fully define where soft wrap opportunities occur, however some controls are provided to distinguish common variations.

Further information on line breaking conventions can be found in [JLREQ] and [JIS4051] for Japanese, [ZHMARK] for Chinese, and in [UAX14] for all scripts in Unicode.

Any guidance for appropriate references here would be much appreciated.

5.1. Line Breaking Details

When determining line breaks:

Regardless of the ‘white-space’ value, lines always break at each preserved forced break character: for all values, line-breaking behavior defined for the BK, CR, LF, CM, NL, and SG line breaking classes in [UAX14] must be honored.
When ‘white-space’ allows wrapping, line breaking behavior defined for the WJ, ZW, and GL line-breaking classes in [UAX14] must be honored.
UAs that allow wrapping at punctuation other than spaces should prioritize breakpoints. For example, if breaks after slashes are given a lower priority than spaces, the sequence "check /etc" will never break between the ‘/’ and the ‘e’. The UA may use the width of the containing block, the text's language, and other factors in assigning priorities. As long as care is taken to avoid such awkward breaks, allowing breaks at appropriate punctuation other than spaces is recommended, as it results in more even-looking margins, particularly in narrow measures.
Out-of-flow elements do not introduce a forced line break or soft wrap opportunity in the flow.
The line breaking behavior of a replaced element or other atomic inline is equivalent to that of the Object Replacement Character (U+FFFC).
For soft wrap opportunities created by characters that disappear at the line break (e.g. U+0020 SPACE), properties on the element containing that character control the line breaking at that opportunity. For soft wrap opportunities defined by the boundary between two characters, the properties on the element containing the boundary control breaking.
For soft wrap opportunities before the first or after the last character of a box, the break occurs immediately before/after the box (at its margin edge) rather than breaking the box between its content edge and the content.
For line breaking in/around ruby, the base text is considered part of the same inline formatting context as its surrouding content, but the ruby text is not: i.e. line breaking opportunities between the ruby element and its surrounding content are determined as if the ruby base were inline and the ruby text were not there.

5.2. Breaking Rules for Punctuation: the ‘`line-break`’ property

Name:	line-break
Value:	auto \| loose \| normal \| strict
Initial:	auto
Applies to:	all elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	specified value

This property specifies the strictness of line-breaking rules applied within an element: particularly how wrapping interacts with punctuation and symbols. Values have the following meanings:

‘auto’: The UA determines the set of line-breaking restrictions to use, and it may vary the restrictions based on the length of the line; e.g., use a less restrictive set of line-break rules for short lines.
‘loose’: Breaks text using the least restrictive set of line-breaking rules. Typically used for short lines, such as in newspapers.
‘normal’: Breaks text using the most common set of line-breaking rules.
‘strict’: Breaks text using the most stringent set of line-breaking rules.

CSS distinguishes between three levels of strictness in the rules for text wrapping. The precise set of rules in effect for each level is up to the UA and should follow language conventions. However, this specification does recommend that:

Following breaks be forbidden in ‘strict’ line breaking and allowed in ‘normal’ and ‘loose’:
- breaks before Japanese small kana
- breaks before the Katakana-Hiragana prolonged sound mark: ー U+30FC, ｰ U+FF70
If the content language is Chinese, Japanese, or Korean, then additionally:
- breaks before hyphens:
  ‐ U+2010, – U+2013, 〜 U+301C, ゠ U+30A0
Following breaks be forbidden in ‘normal’ and ‘strict’ line breaking and allowed in ‘loose’:
- breaks before iteration marks:
  々 U+3005, 〻 U+303B, ゝ U+309D, ゞ U+309E, ヽ U+30FD, ヾ U+30FE
- breaks between some inseparable characters:
  ‥ U+2025, … U+2026
If the content language is Chinese, Japanese, or Korean, then additionally:
- breaks before certain centered punctuation marks:
  : U+003A, ; U+003B, ・ U+30FB, ： U+FF1A, ； U+FF1B, ･ U+FF65, ! U+0021, ? U+003F, ‼ U+203C, ⁇ U+2047, ⁈ U+2048, ⁉ U+2049, ！ U+FF01, ？ U+FF1F
- breaks before postfixes:
  % U+0025, ¢ U+00A2, ° U+00B0, ‰ U+2030, ′ U+2032, ″ U+2033, ℃ U+2103, ％ U+FF05, ￠ U+FFE0
- breaks after prefixes:
  $ U+0024, £ U+00A3, ¥ U+00A5, € U+20AC, № U+2116, ＄ U+FF04, ￡ U+FFE1, ￥ U+FFE5

In the recommended list above, no distinction is made among the levels of strictness in non-CJK text: only CJK codepoints are affected, unless the text is marked as Chinese or Japanese, in which case some additional common codepoints are affected. However a future level of CSS may add behaviors affecting non-CJK text.

Support for this property is optional. It is recommended for UAs that wish to support CJK typography and strongly recommended for UAs in the Japanese market.

The CSSWG recognizes that in a future edition of the specification finer control over line breaking may be necessary to satisfy high-end publishing requirements.

5.3. Breaking Rules for Letters: the ‘`word-break`’ property

Name:	word-break
Value:	normal \| keep-all \| break-all
Initial:	normal
Applies to:	all elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	specified value

This property specifies soft wrap opportunities between letters. Values have the following meanings:

‘normal’: Words break according to their usual rules.
‘break-all’: In addition to ‘normal’ soft wrap opportunities, lines may break between any two letters (except where forbidden by the ‘line-break’ property). Hyphenation is not applied. This option is used mostly in a context where the text is predominantly using CJK characters with few non-CJK excerpts and it is desired that the text be better distributed on each line.
‘keep-all’: Implicit soft wrap opportunities between letters are suppressed, i.e. breaks are prohibited between pairs of letters (except where opportunities exist due to dictionary-based breaking). Otherwise this option is equivalent to ‘normal’. In this style, sequences of CJK characters do not break.
This is sometimes seen in Korean (which uses spaces between words), and is also useful for mixed-script text where CJK snippets are mixed into another language that uses spaces for separation.

Symbols that line-break the same way as letters of a particular category are affected the same way as those letters.

Here's a mixed-script sample text:

这是一些汉字, and some Latin, و کمی نوشتنن عربی, และตัวอย่างการเขียนภาษาไทย.

The break-points are determined as follows (indicated by ‘·’):

‘word-break: normal’

这·是·一·些·汉·字,·and·some·Latin,·و·کمی·نوشتنن·عربی·และ·ตัวอย่าง·การเขียน·ภาษาไทย.

‘word-break: break-all’

这·是·一·些·汉·字,·a·n·d·s·o·m·e·L·a·t·i·n,·و·ﮐ·ﻤ·ﻰ·ﻧ·ﻮ·ﺷ·ﺘ·ﻦ·ﻋ·ﺮ·ﺑ·ﻰ,·แ·ล·ะ·ตั·ว·อ·ย่·า·ง·ก·า·ร·เ·ขี·ย·น·ภ·า·ษ·า·ไ·ท·ย.

‘word-break: keep-all’

这是一些汉字,·and·some·Latin,·و·کمی·نوشتنن·عربی,·และตัวอย่างการเขียนภาษาไทย.

When shaping scripts such as Arabic are allowed to break within words due to ‘break-all’, the characters must still be shaped as if the word were not broken.

6. Hyphenation

Hyphenation allows the controlled splitting of words to improve the layout of paragraphs, typically splitting words at syllabic or morphemic boundaries and visually indicating the split (usually with a hyphen). Hyphenation occurs when the line breaks at a valid hyphenation opportunity, which creates a soft wrap opportunity within the word.

Hyphenation in CSS is controlled with the ‘hyphens’ property. CSS Text Level 3 does not define the exact rules for hyphenation, however UAs are strongly encouraged to optimize their line-breaking implementation to choose good break points and appropriate hyphenation points.

Hyphenation opportunities are considered when calculating ‘min-content’ intrinsic sizes.

6.1. Hyphenation Control: the ‘`hyphens`’ property

Name:	hyphens
Value:	none \| manual \| auto
Initial:	manual
Applies to:	all elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	specified value

This property controls whether hyphenation is allowed to create more soft wrap opportunities within a line of text. Values have the following meanings:

‘none’

Words are not hyphenated, even if characters inside the word explicitly define hyphenation opportunities.

‘manual’

Words are only hyphenated where there are characters inside the word that explicitly suggest hyphenation opportunities.

In Unicode, U+00AD is a conditional "soft hyphen" and U+2010 is an unconditional hyphen. Unicode Standard Annex #14 describes the role of soft hyphens in Unicode line breaking. [UAX14] In HTML,  represents the soft hyphen character which suggests a hyphenation opportunity.

ex&shy;ample

‘auto’

Words may be broken at appropriate hyphenation points either as determined by hyphenation characters inside the word or as determined automatically by a language-appropriate hyphenation resource. Conditional hyphenation characters inside a word, if present, take priority over automatic resources when determining hyphenation opportunities within the word.

Correct automatic hyphenation requires a hyphenation resource appropriate to the language of the text being broken. The UA is therefore only required to automatically hyphenate text for which the author has declared a language (e.g. via HTML lang or XML xml:lang) and for which it has an appropriate hyphenation resource.

When shaping scripts such as Arabic are allowed to break within words due to hyphenation, the characters must still be shaped as if the word were not broken.

For example, if the word “نوشتنن” were hyphenated, it would appear as “ﻧﻮﺷ-ﺘﻦ” not as “ﻧﻮﺵ-ﺗﻦ”.

6.2. Overflow Wrapping: the ‘`word-wrap`’/‘`overflow-wrap`’ property

Name:	overflow-wrap/word-wrap
Value:	normal \| break-word
Initial:	normal
Applies to:	all elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	specified value

This property specifies whether the UA may arbitrarily break within a word to prevent overflow when an otherwise-unbreakable string is too long to fit within the line box. It only has an effect when ‘white-space’ allows wrapping. Possible values:

‘normal’: Lines may break only at allowed break points. However, the restrictions introduced by ‘word-break: keep-all’ may be relaxed to match ‘word-break: normal’ if there are no otherwise-acceptable break points in the line.
‘break-word’: An unbreakable "word" may be broken at an arbitrary point if there are no otherwise-acceptable break points in the line. Shaping characters are still shaped as if the word were not broken, and grapheme clusters must together stay as one unit. No hyphenation character is inserted at the break point.

Soft wrap opportunities not part of ‘overflow-wrap: normal’ line breaking are not considered when calculating ‘min-content’ intrinsic sizes.

For legacy reasons, UAs must treat ‘word-wrap’ as an alternate name for the ‘overflow-wrap’ property, as if it were a shorthand of ‘overflow-wrap’.

7. Alignment and Justification

7.1. Text Alignment: the ‘`text-align`’ property

Name:	text-align
Value:	[ [ start \| end \| left \| right \| center ] \|\| <string> ] \| justify \| match-parent \| start end
Initial:	start
Applies to:	block containers
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	specified value, except for ‘`match-parent`’ (see prose)

This property describes how inline contents of a block are aligned along the inline axis if the contents do not completely fill the line box. Values have the following meanings:

‘start’: The inline contents are aligned to the start edge of the line box.
‘end’: The inline contents are aligned to the end edge of the line box.
‘left’: The inline contents are aligned to the line left edge of the line box. (Note that in vertical writing modes, this will be either the physical top or bottom.) [CSS3-WRITING-MODES]
‘right’: The inline contents are aligned to the line right edge of the line box. (Note that in vertical writing modes, this will be either the physical top or bottom.) [CSS3-WRITING-MODES]
‘center’: The inline contents are centered within the line box.
‘justify’: The text is justified according to the method specified by the ‘text-justify’ property.
‘<string>’: The string must be a single character; otherwise the declaration must be ignored. When applied to a table cell, specifies the alignment character around which the cell's contents will align. See below for further details and how this value combines with keywords.
‘match-parent’: This value behaves the same as ‘inherit’ except that an inherited ‘start’ or ‘end’ keyword is calculated against its parent's ‘direction’ value and results in a computed value of either ‘left’ or ‘right’.
‘start end’: Specifies ‘start’ alignment of the first line and any line immediately after a forced line break; and ‘end’ alignment of any remaining lines not affected by ‘text-align-last’.

A block of text is a stack of line boxes. In the case of ‘start’, ‘end’, ‘left’, ‘right’ and ‘center’, this property specifies how the inline-level boxes within each line box align with respect to the start and end sides of the line box: alignment is not with respect to the viewport or containing block.

In the case of ‘justify’, the UA may stretch or shrink any inline boxes by adjusting their text in addition to shifting their positions. (See also ‘text-justify’, ‘letter-spacing’, and ‘word-spacing’.) If an element's white space is not collapsible, then the UA is not required to adjust its text for the purpose of justification and may instead treat the text as having no expansion opportunities. If the UA chooses to adjust the text, then it must ensure that tab stops continue to line up as required by the white space processing rules.

7.1.1. Bidirectionality and Line Boxes

The start and end edges of a line box are determined by the inline base direction of the line box. In most cases, this is given by its containing block's computed ‘direction’. However if its containing block has ‘unicode-bidi: plaintext’ [CSS3-WRITING-MODES], the inline base direction the line box must be determined by the base direction of the bidi paragraph to which it belongs: that is, the bidi paragraph for which the line box holds content. An empty line box (i.e. one that contains no atomic inlines or characters other than the line-breaking character, if any), takes its inline base direction from the preceding line box (if any), or, if this is the first line box in the containing block, then from the ‘direction’ property of the containing block.

In the following example, assuming the <block> is a preformatted block (‘display: block; white-space: pre’) inheriting ‘text-align: start’, every other line is right-aligned:

<block style="unicode-bidi: plaintext">
  Latin
  و·کمی
  Latin
  و·کمی
  Latin
  و·کمی
</block>

Note that the inline base direction determined here applies to the line box itself, and not to its contents. It affects ‘text-align’, ‘text-align-last’, ‘text-indent’, and ‘hanging-punctuation’, i.e. the position and alignment of its contents with respect to its edges. It does not affect the formatting or ordering of its content.

In the following example:

<para style="display: block; direction: rtl; unicode-bidi:plaintext">
<quote style="unicode-bidi:plaintext">שלום!</quote>", he said.
</para>

The result should be a left-aligned line looking like this:

"!שלום", he said.

The line is left-aligned (despite the containing block having ‘direction: rtl’) because the containing block (the <para>) has ‘unicode-bidi:plaintext’, and the line box belongs to a bidi paragraph that is LTR. This is because that paragraph's first character with a strong direction is the LTR "h" from "he". The RTL "שלום!" does precede the "he", but it sits in its own bidi-isolated paragraph that is not immediately contained by the <para>, and is thus irrelevent to the line box's alignment. From from the standpoint of the bidi paragraph immediately contained by the <para> containing block, the <quote>’s bidi-isolated paragraph inside it is, by definition, just a neutral U+FFFC character, so the immediately-contained paragraph becomes LTR by virtue of the "he" following it.

<fieldset style="direction: rtl">
<textarea style="unicode-bidi:plaintext">

Hello!

</textarea>
</fieldset>

As expected, the "Hello!" should be displayed LTR (i.e. with the exclamation mark on the right end, despite the <textarea>‘s ’‘direction:rtl’‘) and left-aligned. This makes the empty line following it left-aligned as well, which means that the caret on that line should appear at its left edge. The first empty line, on the other hand, should be right-aligned, due to the RTL direction of its containing paragraph, the <textarea>.

7.1.2. Character-based Alignment in a Table Column

When multiple cells in a column have an alignment character specified, the alignment character of each such cell in the column is centered along a single column-parallel axis and the rest of the text in the column shifted accordingly. (Note that the strings do not have to be the same for each cell, although they usually are.)

The following style sheet:

TD { text-align: "." center }

will cause the column of dollar figures in the following HTML table:

<TABLE>
<COL width="40">
<TR> <TH>Long distance calls
<TR> <TD> $1.30
<TR> <TD> $2.50
<TR> <TD> $10.80
<TR> <TD> $111.01
<TR> <TD> $85.
<TR> <TD> N/A
<TR> <TD> $.05
<TR> <TD> $.06
</TABLE>

to align along the decimal point. The table might be rendered as follows:

+---------------------+
| Long distance calls |
+---------------------+
|        $11.30       |
|        $22.50       |
|         $0.80       |
|    $200567.01       |
|        $85.         |
|        N/A          |
|          $.05       |
|          $.06       |
+---------------------+

A keyword value may be specified in conjunction with the <string> value; if it is not given, it defaults to ’‘right’‘. This value is used:

when character-based alignment is applied to boxes that are not table cells.
when the text wraps to multiple lines (at unforced break points).
when a character-aligned cell spans more than one column. In this case the keyword alignment value is used to determine which column’s axis to align with: the leftmost column for ‘left’, the rightmost column for ‘right’ and ‘center’, the startmost column for ‘start’, the endmost column for ‘end’.
when the column is wide enough that the character alignment alone does not determine the positions of its character-aligned contents. In this case the keyword alignment of the first cell in the column with a specified alignment character is used to slide the position of the character-aligned contents to match the keyword alignment insofar as possible without changing the width of the column. For ‘center’, the UA may center the aligned contents using its extremes, center the alignment axis itself (insofar as possible), or optically center the aligned contents some other way (such as by taking a weighted average of the extent of the cells' contents to either side of the axis).

Right alignment is used by default for character-based alignment because numbering systems are almost all left-to-right even in right-to-left writing systems, and the primary use case of character-based alignment is for numerical alignment.

If the alignment character appears more than once in the text, the first instance is used for alignment. If the alignment character does not appear in a cell at all, the string is aligned as if the alignment character had been inserted at the end of its contents.

Character-based alignment occurs before table cell width computation so that auto width computations can leave enough space for alignment. Whether column-spanning cells participate in the alignment prior to or after width computation is undefined. If width constraints on the cell contents prevent full alignment throughout the column, the resulting alignment is undefined.

7.2. Last Line Alignment: the ‘`text-align-last`’ property

Name:	text-align-last
Value:	auto \| start \| end \| left \| right \| center \| justify
Initial:	auto
Applies to:	block containers
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	specified value

This property describes how the last line of a block or a line right before a forced line break is aligned. If a line is also the first line of the block or the first line after a forced line break, then, unless ‘text-align’ assigns an explicit first line alignment (via ‘start end’), ‘text-align-last’ takes precedence over ‘text-align’.

If ‘auto’ is specified, content on the affected line is aligned per ‘text-align’ unless ‘text-align’ is set to ‘justify’. In this case, content is justified if ‘text-justify’ is ‘distribute’ and start-aligned otherwise. All other values have the same meanings as in ‘text-align’.

7.3. Justification Method: the ‘`text-justify`’ property

Name:	text-justify
Value:	auto \| none \| inter-word \| inter-ideograph \| inter-cluster \| distribute \| kashida
Initial:	auto
Applies to:	block containers and, optionally, inline elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	specified value

This property selects the justification method used when a line's alignment is set to ‘justify’ (see ‘text-align’), primarily by controlling which scripts' characters are adjusted together or separately. The property applies to block containers, but the UA may (but is not required to) also support it on inline elements. It takes the following values:

Values of ‘text-justify’: ‘inter-word’, ‘inter-cluster’, ‘inter-ideograph’, and ‘distribute’

One possible example of rendering for ‘text-justify’: ‘kashida’

‘auto’: The UA determines the justification algorithm to follow, based on a balance between performance and adequate presentation quality.
One possible algorithm is to determine the behavior based on the language of the paragraph: the UA can then choose appropriate value for the language, like ‘inter-ideograph’ for CJK, or ‘inter-word’ for English. Another possibility is to use a justification method that is a universal compromise for all scripts, e.g. the ‘inter-cluster’ method with block scripts raised to first priority.
‘none’: Justification is disabled. This value is intended for use in user stylesheets to improve readability or for accessibility purposes.
‘inter-word’: Justification primarily changes spacing at word separators. This value is typically used for languages that separate words using spaces, like English or Korean.
‘inter-ideograph’: Justification primarily changes spacing at word separators and between characters in block scripts. This value is typically used for CJK languages.
‘inter-cluster’: Justification primarily changes spacing at word separators and between characters in clustered scripts. This value is typically used for Southeast Asian scripts such as Thai.
‘distribute’: Justification primarily changes spacing both at word separators and between characters in all scripts equally (except those in the connected and cursive categories). This value is sometimes used in e.g. Japanese.
‘kashida’: Justification primarily stretches cursive scripts through the use of kashida or other calligraphic elongation. This value is optional for conformance to CSS3 Text. (UAs that do not support cursive elongation must treat the value as invalid.)

When justifying text, the user agent takes the remaining space between the ends of a line's contents and the edges of its line box, and distributes that space throughout its contents so that the contents exactly fill the line box. If the ‘letter-spacing’ and ‘word-spacing’ property values allow it, the user agent may also distribute negative space, putting more content on the line than would otherwise fit under normal spacing conditions. The exact justification algorithm is UA-dependent; however, CSS provides some general guidelines which should be followed when any justification method other than ‘auto’ is specified.

The guidelines in this level of CSS do not describe a complete justification algorithm. They are merely a minimum set of requirements that a complete algorithm should meet. Limiting the set of requirements gives UAs some latitude in choosing a justification algorithm that meets their needs.

For instance, a basic but fast ‘inter-word’ justification algorithm might use a simple greedy method for determining line breaks, then distribute leftover space using the spacing limits provided. This algorithm could follow the guidelines by expanding word spaces first, expanding between letters only if ‘word-spacing’ hit a limit.

A more sophisticated but slower ‘inter-word’ justification algorithm might use a Knuth/Plass method where expansion opportunities and limits were assigned weights and assessed with other line breaking considerations. This algorithm could follow the guidelines by giving more weight to word spaces than letter spacing.

CSS defines expansion opportunities as points where the justification algorithm may alter spacing within the text. These expansion opportunities fall into priority levels as defined by the justification method. Within a line, expansion and compression should primarily target the first-priority expansion opportunities; lower priority expansion opportunities are adjusted at a lower priority as needed.

Expansion and compression limits are given by the letter-spacing and word-spacing properties. How any remaining space is distributed once all expansion opportunities reach their limits is up to the UA. If the inline contents of a line cannot be stretched to the full width of the line box, then they must be aligned as specified by the ‘text-align-last’ property. (If ‘text-align-last’ is ‘justify’, then they must be aligned as for ‘center’ if ‘text-justify’ is ‘distribute’ and as ‘start’ otherwise.)

The expansion opportunity priorities for values of ‘text-justify’ are given in the table below. Since justification behavior varies by writing system, expansion opportunities are organized by script categories. An expansion opportunity exists between two letters at a priority level when at least one of them belongs to a script category at that level and the other does not belong to a higher priority level. All scripts in the same priority level must be treated exactly the same. Word separators (spaces) and other symbols and punctuation are treated specially, see below.

Prioritization of Expansion Points
	‘`inter-word`’	‘`inter-ideograph`’	‘`distribute`’	‘`inter-cluster`’	‘`kashida`’	‘`auto`’
block	2	1	1	3	3	2*
clustered	2	2	1	2	3	2*
cursive	2	2	2	3	1	3*
discrete	2	2	1	3	3	3*
connected	never	never	never	never	never	never
spaces	1	1	1	1	2	1*
symbols	2	1	1	2	3	*

* The ‘auto’ column defined above is informative; it suggests a prioritization that presents a universal compromise among justification methods.

The spaces category represents expansion opportunities at word separators. (See ‘word-spacing’.) Except when ‘text-justify’ is ‘distribute’, the UA may treat spaces differently than other expansion opportunities in the same priority, but must not change their priority with respect to expansion opportunities in other priority levels. For example, in Japanese ‘inter-ideograph’ justification (which treats CJK characters at a higher priority than Latin characters), word spaces traditionally have a higher priority than inter-CJK spacing, and the UA may split the 1st-priority level to implement that. However the UA is not allowed to drop either spaces or CJK characters to the same priority as Latin characters.

The symbols category represents the expansion opportunity existing at or between any pair of characters from the Unicode Symbols (S*) and Punctuation (P*) classes. The default justification priority of these expansion opportunities is given above. However, there may be additional rules controlling their justification behavior due to typographic tradition. Therefore, the UA may reassign specific characters or introduce additional levels of prioritization to handle expansion opportunities involving symbols and punctuation. For example, there are traditionally no expansion opportunities between consecutive EM DASH U+2014, HORIZONTAL BAR U+2015, HORIZONTAL ELLIPSIS U+2026, or TWO DOT LEADER U+2025 characters [JLREQ]; thus a UA might assign these characters to the "never" prioritization level. As another example, certain fullwidth punctuation characters are considered to contain an expansion opportunity (see ‘text-spacing’). The UA might therefore assign these characters to a higher prioritization level than the opportunities between ideographic characters.

For justification of cursive scripts, words may be expanded through kashida elongation or other cursive expansion processes. Kashida may be applied in discrete units or continuously, and the prioritization of kashida opportunities is UA-dependent: for example, the UA may apply more at the end of the line. The UA should not apply kashida to fonts for which it is inappropriate. It may instead rely on other justification methods that lengthen or shorten Arabic segments (e.g. by substituting in swash forms or optional ligatures). Because elongation rules depend on the typeface style, the UA should rely on on the font whenever possible rather than inserting kashida based on a font-independent ruleset. The UA should limit elongation so that, e.g. in multi-script lines a short stretch of Arabic will not be forced to soak up too much of the extra space by itself. If the UA does not support cursive elongation, then, as with connected scripts, no expansion opportunities exist between characters of these scripts.

The UA may enable or break optional ligatures or use other font features such as alternate glyphs or glyph compression to help justify the text under any method. This behavior is not controlled by this level of CSS.

3.8 Line Adjustment in [JLREQ] gives an example of a set of rules for how a text formatter can justify Japanese text. It describes rules for cases where the ‘text-justify’ property is ‘inter-ideograph’ and the ‘text-spacing’ property does not specify ‘no-compress’.

It produces an effect similar to cases where the computed value of ‘text-spacing’ property does not specify ‘trim-end’ or ‘space-end’. If the UA wants to prohibit this behavior, rule b. of 3.8.3 should be omitted.

Note that the rules described in the document specifically target Japanese. Therefore they may produce non-optimal results when used to justify other languages such as English. To make the rules more applicable to other scripts, the UA could, for instance, omit the rule to compress half-width spaces (rule a. of 3.8.3).

8. Spacing

CSS offers control over text spacing via the ‘word-spacing’ and ‘letter-spacing’ properties. While in CSS1 and CSS2 these could only be ‘normal’ (justifiable) or a fixed length, CSS3 can indicate range constraints to control flexibility in justification. In addition the ‘word-spacing’ property can now be specified in percentages, making it possible to, for example, double or eliminate word spacing.

In the following example, word spacing is halved, but may expand up to its full amount if needed for text justification.

p { word-spacing: -50% 0%; }

The <spacing-limits> value type, which represents optimum, minimum, and maximum spacing in ‘word-spacing’ and ‘letter-spacing’, is defined as

<spacing-limits> = [ normal | <length> | <percentage>]{1,3}

If three values are specified, they represent the optimum, minimum, and maximum in that order. If only two values are specified, then the first represents both the optimum and the minimum, and the second represents the maximum. If just one value is specified, then it represents the optimum, minimum, and maximum. The values are interpreted as defined below:

‘normal’: Specifies normal spacing as defined by the current font and/or the user agent; see below. A ‘normal’ optimum spacing value computes to zero.
‘<length>’: Specifies extra spacing in addition to the intrinsic inter-character/inter-word spacing defined by the font. Values may be negative, but there may be implementation-dependent limits.
‘<percentage>’: Specifies the additional spacing as a percentage of the affected character's advance measure. Only valid on ‘word-spacing’.

In the absence of justification the optimum spacing is be used. The text justification process may alter the spacing from its optimum (see the ‘text-justify’ property, above) but must not violate the minimum spacing limit and should also avoid exceeding the maximum. The UA may also use the difference between the minimum/maximum limits and the optimum as input into a weighting algorithm for justification.

The minimum is treated as a hard constraint: if the maximum is less than the minimum, then the used it is set to the minimum. Likewise for the optimum. Similarly if the maximum is less than the optimum, then the used optimum is set to the used maximum.

Normal spacing: Although ‘normal’ minimum and maximum spacing limits are UA-defined, they must be defined relative to the optimum so that the limits increase and decrease with changes to the optimum spacing. Normal limits may also vary according to the value of the ‘text-justify’ property, the element's language, some measure of the amount of text on a line (e.g. block width divided by font size), and/or other factors.

8.1. Word Spacing: the ‘`word-spacing`’ property

Name:	word-spacing
Value:	<spacing-limits>
Initial:	normal
Applies to:	all elements
Inherited:	yes
Percentages:	refers to width of the affected glyph
Media:	visual
Computed value:	an optimum, minimum, and maximum value, each consisting of either an absolute length, a percentage, or the keyword ‘`normal`’

This property specifies the minimum, maximum, and optimal spacing between “words”.

Additional spacing is applied to each word-separator character left in the text after the white space processing rules have been applied, and should be applied half on each side of the character unless otherwise dictated by typographic tradition.

The following example will make all the spaces between words in Arabic be rendered as zero-width, and double the width of each space in English:

:lang(ar) { word-spacing: -100%; }
:lang(en) { word-spacing: 100%; }

The following example will add half the the width of the “0” glyph to word spacing character [CSS3VAL]:

p { word-spacing: 0.5ch; }

Word-separator characters include the space (U+0020), the no-break space (U+00A0), the Ethiopic word space (U+1361), the Aegean word separators (U+10100,U+10101), the Ugaritic word divider (U+1039F), the Phoenician Word Separator (U+1091F), and the Tibetan tsek (U+0F0B, U+0F0C). If there are no word-separator characters, or if the word-separating character has a zero advance width (such as the zero width space U+200B) then the user agent must not create an additional spacing between words. General punctuation and fixed-width spaces (such as U+3000 and U+2000 through U+200A) are not considered word-separator characters.

8.2. Tracking: the ‘`letter-spacing`’ property

Name:	letter-spacing
Value:	<spacing-limits>
Initial:	normal
Applies to:	all elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	an optimum, minimum, and maximum value, each consisting of either an absolute length or the keyword ‘`normal`’

This property specifies the minimum, maximum, and optimal spacing between characters. Letter-spacing is applied in addition to any word-spacing.

Letter-spacing must not be applied at the beginning or at the end of a line. At element boundaries, the total letter spacing between two characters is given by and rendered within the innermost element that contains the boundary.

For the purpose of letter-spacing, each consecutive run of atomic inlines (such as image and/or inline blocks) is treated as a single character.

For example, given the markup

<P>a<LS>b<Z>cd</Z><Y>ef</Y></LS>g</P>

and the style sheet

LS { letter-spacing: 1em; }
Z { letter-spacing: 0.3em; }
Y { letter-spacing: 0.4em; }

the spacing would be

a[0]b[1em]c[0.3em]d[1em]e[0.4em]f[0]g

UAs may apply letter-spacing to cursive scripts. In this case, UAs should extend the space between disjoint characters as specified above and extend the visible connection between cursively connected characters by the same amount (rather than leaving a gap). The UA may use glyph substitution or other font capabilities to spread out the letters. If the UA cannot expand a cursive script without breaking the cursive connections, it should not apply letter-spacing between characters of that script at all.

Letter-spacing ignores zero-width characters (such as those from the Unicode Cf category). For example, ‘letter-spacing’ applied to A&zwsp;B is identical to AB.

When the effective letter-spacing between two characters is not zero (due to either justification or a non-zero specified optimum), user agents should not apply optional ligatures.

9. Edge Effects

Edge effects control the indentation of lines with respect to other lines in the block (‘text-indent’) and how content is aligned to the start and end edges of a line (‘hanging-punctuation’).

9.1. First Line Indentation: the ‘`text-indent`’ property

Name:	text-indent
Value:	[ <length> \| <percentage> ] && [ hanging \|\| each-line ]?
Initial:	0
Applies to:	block containers
Inherited:	yes
Percentages:	refers to width of containing block
Media:	visual
Computed value:	the percentage as specified or the absolute length, plus any keywords as specified

This property specifies the indentation applied to lines of inline content in a block. The indent is treated as a margin applied to the start edge of the line box. Unless otherwise specified via the ‘each-line’ and/or ‘hanging’ keywords, only lines that are the first formatted line of an element are affected. For example, the first line of an anonymous block box is only affected if it is the first child of its parent element.

Values have the following meanings:

‘<length>’: Gives the amount of the indent as an absolute length.
‘<percentage>’: Gives the amount of the indent as a percentage of the containing block's logical width.
‘each-line’: Indentation affects the first line of the block container as well as each line after a forced line break, but does not affect lines after a soft wrap break.
‘hanging’: Inverts which lines are affected.

If ‘text-align’ is ‘start’ and ‘text-indent’ is ‘5em’ in left-to-right text with no floats present, then first line of text will start 5em into the block:

     Since CSS1 it has been possible
to indent the first line of a block
element using the 'text-indent'
property.

Note that since the ‘text-indent’ property inherits, when specified on a block element, it will affect descendant inline-block elements. For this reason, it is often wise to specify ‘text-indent: 0’ on elements that are specified ‘display: inline-block’.

9.2. Hanging Punctuation: the ‘`hanging-punctuation`’ property

Name:	hanging-punctuation
Value:	none \| [ first \|\| [ force-end \| allow-end ] \|\| last ]
Initial:	none
Applies to:	inline elements
Inherited:	yes
Percentages:	N/A
Media:	visual
Computed value:	as specified

This property determines whether a punctuation mark, if one is present, hangs and may be placed outside the line box (or in the indent) at the start or at the end of a line of text.

Note that if there is not sufficient padding on the block container, ‘hanging-punctuation’ can trigger overflow.

When a punctuation mark hangs, it is not considered when measuring the line's contents for fit, alignment, or justification. Depending on the line's alignment, this may (or may not) result in the mark being placed outside the line box.

Values have the following meanings:

‘none’: No character hangs.
‘first’: An opening bracket or quote at the start of the first formatted line of an element hangs. This applies to all characters in the Unicode categories Ps, Pf, Pi.
‘last’: A closing bracket or quote at the end of the last formatted line of an element hangs. This applies to all characters in the Unicode categories Pe, Pf, Pi.
‘force-end’: A stop or comma at the end of a line hangs.
‘allow-end’: A stop or comma at the end of a line hangs if it does not otherwise fit prior to justification.

Non-zero start and end borders/padding between a hangeable mark and the edge of the line prevent the mark from hanging. For example, a period at the end of an inline box with end padding does not hang at the end edge of a line. At most one punctuation character may hang at each edge of the line.

A hanging punctuation mark is still enclosed inside its inline box and participates in text justification: its character advance width is just not measured when determining how much content fits on the line, how much the line's contents need to be expanded or compressed for justification, or how to position the content within the line box for text alignment.

Stops and commas allowed to hang include:

U+002C	,	COMMA
U+002E	.	FULL STOP
U+060C	،	ARABIC COMMA
U+06D4	۔	ARABIC FULL STOP
U+3001	、	IDEOGRAPHIC COMMA
U+3002	。	IDEOGRAPHIC FULL STOP
U+FF0C	，	FULLWIDTH COMMA
U+FF0E	．	FULLWIDTH FULL STOP
U+FE50	﹐	SMALL COMMA
U+FE51	﹑	SMALL IDEOGRAPHIC COMMA
U+FE52	﹒	SMALL FULL STOP
U+FF61	｡	HALFWIDTH IDEOGRAPHIC FULL STOP
U+FF64	､	HALFWIDTH IDEOGRAPHIC COMMA

The UA may include other characters as appropriate.

The CSS Working Group would appreciate if UAs including other characters would inform the working group of such additions.

Support for this property is optional. It is recommended for UAs that wish to support CJK typography, particularly those in the Japanese market.

The ‘allow-end’ and ‘force-end’ are two variations of hanging punctuation used in East Asia.

p {
   text-align: justify;
   hanging-punctuation: allow-end;
}

p {
   text-align: justify;
   hanging-punctuation: force-end;
}

The punctuation at the end of the first line for ‘allow-end’ does not hang, because it fits without hanging. However, if ‘force-end’ is used, it is forced to hang. The justification measures the line without the hanging punctuation. Therefore when the line is expanded, the punctuation is pushed outside the line.

10. Conformance

10.1. Document Conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

10.2. Conformance Classes

Conformance to CSS Text Level 3 is defined for three conformance classes:

style sheet: A CSS style sheet.
renderer: A UA that interprets the semantics of a style sheet and renders documents that use them.
authoring tool: A UA that writes a style sheet.

A style sheet is conformant to CSS Text Level 3 if all of its declarations that use properties defined in this module have values that are valid according to the generic CSS grammar and the individual grammars of each property as given in this module.

A renderer is conformant to CSS Text Level 3 if, in addition to interpreting the style sheet as defined by the appropriate specifications, it supports all the features defined by CSS Text Level 3 by parsing them correctly and rendering the document accordingly. However, the inability of a UA to correctly render a document due to limitations of the device does not make the UA non-conformant. (For example, a UA is not required to render color on a monochrome monitor.)

An authoring tool is conformant to CSS Text Level 3 if it writes style sheets that are syntactically correct according to the generic CSS grammar and the individual grammars of each feature in this module, and meet all other conformance requirements of style sheets as described in this module.

10.3. Partial Implementations

So that authors can exploit the forward-compatible parsing rules to assign fallback values, CSS renderers must treat as invalid (and ignore as appropriate) any at-rules, properties, property values, keywords, and other syntactic constructs for which they have no usable level of support. In particular, user agents must not selectively ignore unsupported component values and honor supported values in a single multi-value property declaration: if any value is considered invalid (as unsupported values must be), CSS requires that the entire declaration be ignored.

10.4. Experimental Implementations

To avoid clashes with future CSS features, the CSS2.1 specification reserves a prefixed syntax for proprietary and experimental extensions to CSS.

Prior to a specification reaching the Candidate Recommendation stage in the W3C process, all implementations of a CSS feature are considered experimental. The CSS Working Group recommends that implementations use a vendor-prefixed syntax for such features, including those in W3C Working Drafts. This avoids incompatibilities with future changes in the draft.

10.5. Non-Experimental Implementations

Once a specification reaches the Candidate Recommendation stage, non-experimental implementations are possible, and implementors should release an unprefixed implementation of any CR-level feature they can demonstrate to be correctly implemented according to spec.

To establish and maintain the interoperability of CSS across implementations, the CSS Working Group requests that non-experimental CSS renderers submit an implementation report (and, if necessary, the testcases used for that implementation report) to the W3C before releasing an unprefixed implementation of any CSS features. Testcases submitted to W3C are subject to review and correction by the CSS Working Group.

Further information on submitting testcases and implementation reports can be found from on the CSS Working Group's website at http://www.w3.org/Style/CSS/Test/. Questions should be directed to the [email protected] mailing list.

10.6. CR Exit Criteria

For this specification to be advanced to Proposed Recommendation, there must be at least two independent, interoperable implementations of each feature. Each feature may be implemented by a different set of products, there is no requirement that all features be implemented by a single product. For the purposes of this criterion, we define the following terms:

independent

each implementation must be developed by a different party and cannot share, reuse, or derive from code used by another qualifying implementation. Sections of code that have no bearing on the implementation of this specification are exempt from this requirement.

interoperable

passing the respective test case(s) in the official CSS test suite, or, if the implementation is not a Web browser, an equivalent test. Every relevant test in the test suite should have an equivalent test created if such a user agent (UA) is to be used to claim interoperability. In addition if such a UA is to be used to claim interoperability, then there must one or more additional UAs which can also pass those equivalent tests in the same way for the purpose of interoperability. The equivalent tests must be made publicly available for the purposes of peer review.

implementation

a user agent which:

implements the specification.
is available to the general public. The implementation may be a shipping product or other publicly available version (i.e., beta version, preview release, or “nightly build”). Non-shipping product releases must have implemented the feature(s) for a period of at least one month in order to demonstrate stability.
is not experimental (i.e., a version specifically designed to pass the test suite and is not intended for normal usage going forward).

The specification will remain Candidate Recommendation for at least six months.

Appendix A: Acknowledgements

This specification would not have been possible without the help from: Ayman Aldahleh, Bert Bos, Tantek Çelik, Stephen Deach, John Daggett, Martin Dürst, Laurie Anna Edlund, Ben Errez, Yaniv Feinberg, Arye Gittelman, Ian Hickson, Martin Heijdra, Richard Ishida, Masayasu Ishikawa, Michael Jochimsen, Eric LeVine, Ambrose Li, Håkon Wium Lie, Chris Lilley, Ken Lunde, Nat McCully, Shinyu Murakami, Paul Nelson, Chris Pratley, Marcin Sawicki, Arnold Schrijver, Rahul Sonnad, Michel Suignard, Takao Suzuki, Frank Tang, Chris Thrasher, Etan Wexler, Chris Wilson, Masafumi Yabe and Steve Zilles.

Appendix B: References

Normative references


[CSS21]: Bert Bos; et al. Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification. 7 June 2011. W3C Recommendation. URL: http://www.w3.org/TR/2011/REC-CSS2-20110607
[CSS3-FONTS]: John Daggett. CSS Fonts Module Level 3. 23 August 2012. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2012/WD-css3-fonts-20120823/
[CSS3-WRITING-MODES]: Elika J. Etemad; Koji Ishii. CSS Writing Modes Module Level 3. 1 May 2012. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2012/WD-css3-writing-modes-20120501/
[RFC2119]: S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt
[UAX11]: Asmus Freytag. East Asian Width. 17 January 2012. Unicode Standard Annex #11. URL: http://www.unicode.org/reports/tr11/
[UAX14]: Asmus Freytag. Line Breaking Properties. 17 January 2012. Unicode Standard Annex #14. URL: http://www.unicode.org/unicode/reports/tr14/
[UAX29]: Mark Davis. Unicode Text Segmentation. 24 January 2012. Unicode Standard Annex #29. URL: http://www.unicode.org/reports/tr29/
[UAX44]: Mark Davis; Ken Whistler. Unicode Character Database. 23 January 2012. Unicode Standard Annex #44. URL: http://www.unicode.org/reports/tr44/
[UNICODE]: The Unicode Consortium. The Unicode Standard. 2003. Defined by: The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, ISBN 0-321-18578-1), as updated from time to time by the publication of new versions URL: http://www.unicode.org/standard/versions/enumeratedversions.html

Informative references


[CSS3-TEXT-DECOR]: Elika J. Etemad; Koji Ishii. CSS Text Decoration Module Level 3. 13 November 2012. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2012/WD-css-text-decor-3-20121113/
[CSS3COLOR]: Tantek Çelik; Chris Lilley; L. David Baron. CSS Color Module Level 3. 7 June 2011. W3C Recommendation. URL: http://www.w3.org/TR/2011/REC-css3-color-20110607
[CSS3LIST]: Tab Atkins Jr. CSS Lists and Counters Module Level 3. 24 May 2011. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2011/WD-css3-lists-20110524
[CSS3VAL]: Håkon Wium Lie; Tab Atkins; Elika J. Etemad. CSS Values and Units Module Level 3. 28 August 2012. W3C Candidate Recommendation. (Work in progress.) URL: http://www.w3.org/TR/2012/CR-css3-values-20120828/
[HTML5]: Ian Hickson. HTML5. 25 May 2011. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2011/WD-html5-20110525/
[JIS4051]: Formatting rules for Japanese documents (『日本語文書の組版方法』). Japanese Standards Association. 2004. JIS X 4051:2004. In Japanese
[JLREQ]: Yasuhiro Anan; et al. Requirements for Japanese Text Layout. 3 April 2012. W3C Working Group Note. URL: http://www.w3.org/TR/2012/NOTE-jlreq-20120403/
[XML10]: C. M. Sperberg-McQueen; et al. Extensible Markup Language (XML) 1.0 (Fifth Edition). 26 November 2008. W3C Recommendation. URL: http://www.w3.org/TR/2008/REC-xml-20081126/
[ZHMARK]: 标点符号用法 (Punctuation Mark Usage). 1995. 中华人民共和国国家标准

Appendix C: Changes

Changes from the August 2012 CSS3 Text WD

Major changes include:

Shifted text decoration chapter to a separate Text Decoration module [CSS3-TEXT-DECOR]

Significant details updated:

Shifted spaces higher in priority than clustered scripts for ‘inter-cluster’ value of ‘text-justify’.
Defined line breaking behavior for ruby and atomic inlines.
Added Korean to Chinese and Japanese in ‘line-break’ special rules.
Added missing halfwidth codepoint to ‘line-break’ rules.

Appendix D: Default UA Stylesheet

This appendix is informative, and is to help UA developers to implement default stylesheet, but UA developers are free to ignore or change.

/* make list items and option elements align together */
li, option { text-align: match-parent; }

If you find any issues, recommendations to add, or corrections, please send the information to [email protected] with [css3-text] in the subject line.

Appendix E: Scripts and Spacing

This appendix is informative (non-normative).

Typographic behavior varies somewhat by language, but varies drastically by writing system. This appendix categorizes some common scripts in Unicode 6.0 according to their justification and spacing behavior. Category descriptions are descriptive, not prescriptive; the determining factor is the prioritization of expansion opportunities.

block scripts: CJK and by extension all Wide characters. (See [UAX11]) The following scripts are included: Bopomofo, Han, Hangul, Hiragana, Katakana, Yi
clustered scripts: Scripts that have discrete units but do not use spaces between words, such as many Southeast Asian systems. The following scripts are included: Javanese, Khmer, Lao, Myanmar, Thai, This list is likely incomplete. What else fits here?
connected scripts: Devanagari, Ogham, and other scripts that use spaces between words and baseline connectors within words. By extension this category also includes any other Indic scripts whose typographic behavior is similar to Devanagari. The following scripts are included: Bengali, Brahmi, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya?, Ogham, Tamil?, Telugu
cursive scripts: Arabic and similar inherently cursive scripts. The following scripts are included: Arabic, Mongolian, N'Ko, Phags Pa, Syriac
discrete scripts: Scripts that use spaces or visible word-separating punctuation between words and have discrete, unconnected (in print) units within words. The following scripts are included: Armenian, Bamum?, Braille, Canadian Aboriginal, Cherokee, Coptic, Cyrillic, Deseret, Ethiopic Greek, Hebrew, Kharoshthi, Latin, Lisu, Osmanya, Shavian, Tifinagh, Vai?

UAs should treat unrecognized scripts as discrete.

This listing should ideally be exhaustive wrt Unicode. Please send suggestions and corrections to the CSS Working Group.

Guidelines for classification consider letter-spacing and justification:

If the script is cursive and may expand cursively but must not space between letters, it is cursive.
If the script primarily flexes word separators, it is either discrete or connected. Discrete scripts can space between letters. Connected scripts must not space between letters (typically because that would break the connections or otherwise look bad).
If the script primarily expands equally between its "letters" in native typesettings, it is either block or clustered. The exact classification depends on whether it always spaces when mixed with CJK and sometimes stays together when mixed with Thai and related scripts (block) or sometimes spaces when mixed with CJK and always spaces with Thai (clustered).

Appendix F: Small Kana

Small Kana
A	I	U	E	O
ぁ U+3041	ぃ U+3043	ぅ U+3045	ぇ U+3047	ぉ U+3049
ゕ U+3095			ゖ U+3096
		っ U+3063
ゃ U+3083		ゅ U+3085		ょ U+3087
ゎ U+308E
ァ U+30A1	ィ U+30A3	ゥ U+30A5	ェ U+30A7	ォ U+30A9
ヵ U+30F5		ㇰ U+31F0	ヶ U+30F6
	ㇱ U+31F1	ㇲ U+31F2
		ッ U+30C3		ㇳ U+31F3
		ㇴ U+31F4
ㇵ U+31F5	ㇶ U+31F6	ㇷ U+31F7	ㇸ U+31F8	ㇹ U+31F9
		ㇺ U+31FA
ャ U+30E3		ュ U+30E5		ョ U+30E7
ㇻ U+31FB	ㇼ U+31FC	ㇽ U+31FD	ㇾ U+31FE	ㇿ U+31FF
ヮ U+30EE
ｧ U+FF67	ｨ U+FF68	ｩ U+FF69	ｪ U+FF6A	ｫ U+FF6B
		ｯ U+FF6F
ｬ U+FF6C		ｭ U+FF6D		ｮ U+FF6E

Appendix G: Text Processing Order of Operations

The following list defines the order of text operations. (Implementations are not bound to this order as long as the resulting layout is the same.)

text combination [CSS3-WRITING-MODES]
white space processing part I (pre-wrapping)
text transformation
default spacing
text wrapping while applying per line:
justification (which may affect glyph selection and/or text wrapping, looping back into that step)
text alignment

Appendix H: Full Property Index

Property	Values	Initial	Applies to	Inh.	Percentages	Media
hanging-punctuation	none \| [ first \|\| [ force-end \| allow-end ] \|\| last ]	none	inline elements	yes	N/A	visual
hyphens	none \| manual \| auto	manual	all elements	yes	N/A	visual
letter-spacing	<spacing-limits>	normal	all elements	yes	N/A	visual
line-break	auto \| loose \| normal \| strict	auto	all elements	yes	N/A	visual
overflow-wrap/word-wrap	normal \| break-word	normal	all elements	yes	N/A	visual
tab-size	<integer> \| <length>	8	block containers	yes	N/A	visual
text-align-last	auto \| start \| end \| left \| right \| center \| justify	auto	block containers	yes	N/A	visual
text-align	[ [ start \| end \| left \| right \| center ] \|\| <string> ] \| justify \| match-parent \| start end	start	block containers	yes	N/A	visual
text-indent	[ <length> \| <percentage> ] && [ hanging \|\| each-line ]?	0	block containers	yes	refers to width of containing block	visual
text-justify	auto \| none \| inter-word \| inter-ideograph \| inter-cluster \| distribute \| kashida	auto	block containers and, optionally, inline elements	yes	N/A	visual
text-transform	none \| capitalize \| uppercase \| lowercase \| full-width	none	all elements	yes	N/A	visual
white-space	normal \| pre \| nowrap \| pre-wrap \| pre-line	not defined for shorthand properties	all elements	yes	N/A	visual
word-break	normal \| keep-all \| break-all	normal	all elements	yes	N/A	visual
word-spacing	<spacing-limits>	normal	all elements	yes	refers to width of the affected glyph	visual