Lookup in HTML, JavaScript, CSS, SVG, XPath:
Lookup in HTML, JavaScript, CSS, SVG, XPath:
Perceivable
Operable
Understandable
Robust
See WCAG Overview.
The character encoding you choose determines how bytes are mapped to characters in your text.
Normally character encodings limit you to a particular script or set of languages. Unicode allows you to deal simply with almost all scripts and languages in use around the world. In this way Unicode simplifies the handling of content in multiple languages, whether within a single page or across one or more sites. Unicode is particularly useful when used in forms, scripts and databases, where you often need to support multiple languages. Unicode also makes it very straightforward to add new languages to your content.
Unless you appropriately declare which character encoding you are using your users may be unable to read your content. This is because incorrect assumptions may be made by the application interpreting your text about how the bytes map to characters.
Escapes such as Numeric Character References (NCRs), and entities are ways of representing any Unicode character in markup using only ASCII characters. For example, you can represent the character á in X/HTML as á or á or á.
Such escapes are useful for clearly representing ambiguous or invisible characters, and to prevent problems with syntax characters such as ampersands and angle brackets. They may also be useful on occasion to represent characters not supported by your character encoding or unavailable from your keyboard. Otherwise you should always use characters rather than escapes.
Information about the (human) language of content is already important for accessibility, styling, searching, editing, and other reasons. As more and more content is tagged and tagged correctly, applications that can detect language information will become more and more useful and pervasive.
When declaring language, you may need to express information about a specific range of content in a different way from metadata about the document as a whole. It is important to understand this distinction.
It is an important principle of Web design to keep the way content is styled or presented separate from the actual text itself. This makes it simple to apply alternative styling for the same text, for example in order to display the same content on both a conventional browser and a small hand-held device.
This principle is particularly useful for localization, since different scripts have different typographic needs. For example, due to the complexity of Japanese characters, it may be preferable to show emphasis in Japanese X/HTML pages in other ways than bolding or italicisation. It is much easier to apply such changes if the presentation is described using CSS, and markup is much cleaner and more manageable if text is correctly and unambiguously labelled as 'emphasised' rather than just 'bold'.
It can save considerable time and effort during localization to work with CSS files rather than have to change the markup, because any needed changes can be made in a single location for all pages, and the translator can focus on the content rather than the presentation.
If you want your content to really communicate with people, you need to speak their language, not only through the text, but also through local imagery, color, objects and preoccupations. It is easy to overlook the culture-specific nature of symbolism, behaviour, concepts, body language, humor, etc. You should get feedback on the suitability and relevance of your images, video-clips, and examples from in-country users.
You should also take care when incorporating text in graphics when content is translated. Text on complex backgrounds or in restricted spaces can cause considerable trouble for the translator. You should provide graphics to the localization group that have text on a separate layer, and you should bear in mind that text in languages such as English and Chinese will almost certainly expand in translation.
The encoding used for an HTML page that contains a form should support all the characters needed to enter data into that form. This is particularly important if users are likely to enter information in multiple languages.
Databases and scripts that receive data from forms on pages in multiple languages must also be able to support the characters for all those languages simultaneously.
The simplest way to enable this is to use Unicode for both pages containing forms and all back-end processing and storage. In such a scenario the user can fill in data in whatever language and script they need to.
You should also try to avoid making assumptions that things such as the user's name and address will follow the same formatting rules as your own. Ask yourself how much detail you really need to break out into separate fields for things such as addresses. Bear in mind that in some cultures there are no street names, in others the house number follows the street name, some people need more than one line for the part of the address that precedes the town or city name, etc. In fact in some places an address runs top down from the general to the specific, which implies a very different layout strategy. Be very careful about building into validation routines incorrect assumptions about area codes or telephone number lengths. Recognize that careful labelling is required for how to enter numeric dates, since there are different conventions for ordering of day, month and year.
If you are gathering information from people in more than one country, it is important to develop a strategy for addressing the different formats people will expect to be able to use. Not only is this important for the design of the forms you create, but it also has an impact on how you will store such information in databases.
dir="rtl"
to the html
tag for right-to-left text.
Only re-use it to change the base direction.
Text in languages such as Arabic, Hebrew, Persian and Urdu is read from right to left. This reading order typically leads to right-aligned text and mirror-imaging of things like page and table layout. You can set the default alignment and ordering of page content to right to left by simply including dir="rtl" in the html tag.
The direction set in the html tag sets a base direction for the document which cascades down through all the elements on the page. It is not necessary to repeat the attribute on lower level elements unless you want to explicitly change the directional flow.
Embedded text in, for example, Latin script still runs left to right within the overall right to left flow. So do numbers. If you are working with right to left languages, you should become familiar with the basics of the Unicode bidirectional algorithm. This algorithm takes care of much of this bidirectional text without the need for intervention from the author. There are some circumstances, however, where markup or Unicode control characters are needed to ensure the correct effect.
http://www.w3.org/International/
Use the proper English characters instead of their misused equivalents.
“
) opening quote (instead of ")”
) closing quote (instead of ")’
) apostrophe (instead of ')–
or –
) en dash, used for ranges, e.g. “13–15 November” (instead of -)—
or —
) em dash, used for change of thought, e.g. “Star Wars is—as everyone knows—amazing.” (instead of -, or --)…
or …
) horizontal ellipsis, used to indicate an omission or a pause (instead of ...)The W3C Cheat Sheet is an open source tool freely available for Web developers. It provides quick access to useful information from a variety of specifications
published by W3C, the leading international Web standards community. It also incorporates data from the WHATWG DOM specification. Please send bugs reports and enhancements suggestions to <[email protected]>
.
Please consider making a donation to support the Cheat Sheet and other free tools W3C makes available to the community.
Your donation will be used exclusively for the maintenance, hosting and development of our free validation tools and similar open source projects. Thank you!