Showing posts with label locales. Show all posts
Showing posts with label locales. Show all posts

Thursday, April 23, 2020

Unicode Locale Data v37 released!

The final version of Unicode CLDR version 37 is now available. It focuses on adding new locales, enhancing support for units of measurement, adding annotations (names and search keywords) for symbols, and adding annotations for Emoji v13.


Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and accurately convert input measurement into those units.

Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added. The collation sequences are updated for new Unicode 13.0, and for emoji.

Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanum, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

New locales. New languages at Basic coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese. New languages at Modern coverage: Nigerian Pidgin. See Locale Coverage Data for the coverage per locale, for both new and old locales.

Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

Updates to code sets. In particular, the EU is updated (removing GB).

For more details, access to the data and charts, and important notes for smoothly migrating implementations, see Unicode CLDR Version 37.

Wednesday, October 5, 2016

CLDR Version 30 Released

CLDR CoverageUnicode CLDR 30 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.

  • Unicode support is updated to 9.0, including updated Unihan readings for the pinyin collation and Han-Latin transforms, and support for new script codes and number systems.
  • The set of language codes for translation has been updated, with a significant increase in the total number of translated language names.
  • Substantial new data has been added for likely subtags (e.g., to get the main script for each language).
  • New data items have been added to support relative times such as “3 Fridays ago” or “this hour”.
  • New draft format and preference structure has been added to support week designations such as “the week of August 10” or “week 3 of March”.
  • New <characterlabels> data can be used to generate labels for groups of related characters in character pickers.
  • The structure for emoji annotations has been revised, and the data has been significantly updated. The emoji collation has been updated, and data is added for improved segmentation behavior. Added a specification for synthesizing ZWJ sequence names.
  • The CLDR 30 Survey Tool data collection resulted in a net increase in data items of about 9.2%, with an additional 5.9% of items changed.
For further details and links to documentation, see the CLDR Release Notes

Wednesday, March 16, 2016

CLDR Version 29 Released

CLDR CoverageUnicode CLDR 29 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.

New BCP47 extension keys have been added for specifying transliteration and emoji presentation, and for customizing locales with region-specific settings. Many new transforms are provided, the rule format has been simplified, and BCP47 IDs have been added for all transforms. Region data now includes appropriate preferences for day periods such as “6:00 in the morning” and “7:00 in the evening”, and there is new structure for choosing appropriate units based on region and usage. A Cantonese locale has been added. The emoji ordering has been improved, and annotations are provided for more emoji and in more locales. The JSON-format data has been extended to include number spellout (RBNF) and script metadata.

The specification and charts have also been updated.

For further details and links to documentation, see the CLDR Release Notes

Thursday, September 17, 2015

CLDR Version 28 Released

CLDR 28 CoverageUnicode CLDR 28 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.
  • General locale data. Overall, about 5% of the data items in this release are new (see Growth), while about 8% have corrections. Notable changes include a major review of and improvement to Spanish locales for Latin America; the addition of two new “modern-coverage” locales (Belarusian and Irish); and moving certain data from en_GB to en_001 for improved quality and reduced data size in locales that use en_GB conventions.
  • Formatting. There are a number of new units and types of formats, with a major revision to the day period rules—preferred for many languages instead of AM/PM (“10:30 at night”)—with localizations; the addition of compact formatting for currencies (“€10M”, “€10 million”), and the addition of more unit measures, including 7 new general units (duration-century), 21 new per-unit types, 4 new units for measuring personal age (needed for some languages), and new coordinate units for formatting latitude and longitude across languages (“10°N”).
  • Identifiers. The new features extend the ability to specify subregions of countries, validate identifiers, and customize locales, including the addition of subdivisions of countries, such as Scotland and California (localized names are not yet present, except for English); the addition of validity data for currency codes, measurement units, and locale identifier elements (allowing validation of Unicode language and locale identifiers without requiring BCP47 data); the addition of seven -u- extension keys and corresponding types to allow customization of locales (“cf” for specifying standard vs accounting currency formats), and the clarification of the specification of identifiers, especially for validity testing.
The specification and charts have also been updated.

Thursday, March 19, 2015

CLDR Version 27 Released

CLDR 27 Coverage Unicode CLDR 27 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

There was no Survey Tool data collection phase for CLDR 27. Instead, the release focused primarily on stability—cleaning up data inheritance and making specific fixes—as well as improvements to the JSON format of the data. Changes include the following:
  • Cleanup of region locales: A major cleanup effort was undertaken to resolve gratuitous differences between region-specific locales and the parent from which they inherit. In regional locales, it was determined where the parent value was an acceptable replacement for a child-specific value which could then be removed, providing greater consistency in behavior in the various region locales. A special effort was made to clean up country names in certain locales.
  • Changes to English inheritance: As an outcome of the cleanup effort above, the inheritance model for English locales is now simplified, making all en_XX locales inherit from either “en” directly ( for current or former U.S. territories ), or from British-influenced “en_001 - World English”. This is also reflected in some changes for measurement systems.
  • Emoji: Data for emoji annotations and an emoji collation were added, to accompany Unicode Technical Report #51, Unicode Emoji.
  • Collation: There are new sort orders for emoji (as noted above), and an Austrian phonebook sort order. Scripts can be reordered individually, rather than only in specific groups. Fractional tertiary weights are now used that are lower than common, to allow shorter sort-keys with normal Hiragana letters.
  • Specification: The LDML specification has descriptions of new or modified structure, plus a number of fixes and clarifications. See Modifications for a list of changes.
    • Improved documentation of locale inheritance and matching, bundle versus item lookup, and parent locale information.
    • Extensive clarifications to the intended use of the language matching data.
    • Explicit new definitions of Unicode identifiers, such as Unicode Calendar Identifier, for use in citations.
  • Charts: The navigation within charts has been improved, and new ones added:
  • JSON on github: The JSON form of the data is now available on github, rather than being found through the Data link.
Details are provided in http://cldr.unicode.org/index/downloads/cldr-27, along with a detailed Migration section.

Wednesday, March 19, 2014

CLDR Version 25 Released

Unicode CLDR 25 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Unicode CLDR 25 focused primarily on improvements to the LDML structure and tools, and on consistency of data. There are many smaller data fixes, but there was no general data submission. Changes include the following:
  • New rules for plural ranges (1-2 liters) for 72 locales, plurals for 2 locales, and ordinals for 18 locales.
  • Better locale matching with fallbacks for languages, default languages for continents and subcontinents, and default scripts for more languages.
  • Two new locales: West Frisian (fy) and Uyghur (ug).
  • Two new metazones: Mexico_Pacific and Mexico_Northwest
  • Updated zh pinyin & zhuyin collations and translators for Unicode 6.3 kMandarin data
  • Updated keyboard layout data for OSX, Windows and others.
This version contains data for 238 languages and 259 territories—740 locales in all.

Details are provided in http://cldr.unicode.org/index/downloads/cldr-25, along with a detailed Migration section.