Tuesday, December 18, 2012
Unicode 6.2 Paperback Available
Responding to requests, the editorial committee has created a modestly-priced print-on-demand volume that contains the complete text of the core specification of Version 6.2 of the Unicode Standard. This 692-page volume may be purchased from Lulu.com for $17.24, plus shipping.
Note that this volume does not include the Version 6.2 code charts, nor does it include the Version 6.2 Standard Annexes and Unicode Character Database, all of which are available only on the Unicode website, http://www.unicode.org/versions/Unicode6.2.0/ .
Purchase The Unicode Standard, Version 6.2 - Core Specification.
Friday, December 14, 2012
Unicode Stability Policies Updated
Recent changes to these policies include new guarantees:
- Property aliases will not be reused later for different properties.
- Property value aliases will not be reused later for different property values.
- Characters with the General_Category of Number are guaranteed to have a corresponding Numeric_Type value.
- No new General_Category property values will ever be added.
- New Bidi_Class property values can only be added for a tightly constrained class of new character additions.
Wednesday, December 12, 2012
Feedback requested for Unicode 6.3
Unicode 6.3 is slated to be released in 2013Q3. Now is your opportunity to comment on the contents of this release.
The text of the Unicode Standard Annexes (segmentation, normalization, identifiers, etc.) is open for comments and feedback, with proposed update versions posted at UAX Proposed Updates. Initially, the contents of these documents are unchanged: the one exception is UAX #9 (BIDI), which has major revisions in PRI232. Changes to the text will be rolled in over the next few months, with more significant changes being announced. Feedback is especially useful on the changes in the proposed updates, and should be submitted by mid-January for consideration at the Unicode Technical Committee meeting at the end of January.
A later announcement will be sent when the beta versions of the Unicode character properties for 6.3 are available for comment. The only characters planned for this release are a small number of bidi control characters connected with the changes to UAX #9.
Monday, December 10, 2012
Unicode Collation Proposed Update
These and other changes are in the new proposed update: see PRI 235. For the exact list of modifications, see Modifications.
Friday, November 16, 2012
Unicode 6.2 core specification now available
For more details, see http://www.unicode.org/versions/Unicode6.2.0.
Friday, October 26, 2012
CLDR Version 22.1 Released
Unicode CLDR 22.1 contains data for 215 languages and 227 territories—654 locales in all. Version 22.1 is an update release, with several important fixes to CLDR 22.0, such as addition of the new Turkish currency symbol, and simpler patterns for fallback timezone formatting (“Los Angeles Time” instead of “United States Time (Los Angeles)”). For details, see CLDR-22.1.
CLDR is by far the largest and most extensive standard repository of locale data, used by a wide spectrum of companies for their software internationalization and localization. It is widely deployed via International Components for Unicode (ICU), and also accessed directly by companies such as Apple, Google, IBM, Twitter, and many others. CLDR is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML)—an XML format used for general interchange of locale data, such as in Microsoft's .NET.
See the Charts pages for views of the CLDR data, organized in various ways. For more information about the Unicode CLDR project see cldr.unicode.org.
About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.
Tuesday, October 16, 2012
Two New Public Review Issues, UAX #9 and UTR #20
http://www.unicode.org/review/
Review periods for the new items close on January 21, 2013.
Please see the page for links to discussion and relevant documents.
Briefly, the new issues are:
PRI #232, Proposed Update UAX #9, Unicode Bidirectional Algorithm
UAX #9 will be updated for Unicode 6.2.1. This proposed update involves a substantial extension of the Unicode Bidirectional Algorithm to allow for the implementation of isolate runs. It also introduces a new X_Bidi_Class property in support of that extension. See the modifications section of the proposed update for information on specific changes to sections in the document.
http://www.unicode.org/review/pri232/
PRI #233, Proposed Update UTR #20, Unicode in XML and other Markup Languages
This Unicode Technical Report will have its references corrected and various other small editorial changes made to bring it up-to-date with Unicode 6.2.
http://www.unicode.org/review/pri233/
To supply feedback on these issues, see http://www.unicode.org/review/#feedback .
Wednesday, September 26, 2012
Announcing The Unicode Standard, Version 6.2
The Unicode Collation Algorithm has been greatly enhanced for Version 6.2, with a major overhaul of its documentation. There have also been significant changes to the collation weight tables, including improved handling of tertiary weights for characters with decompositions, and changed weights for some pictographic symbols.
The newly encoded Turkish Lira sign, like other currency symbols, is expected to be heavily used in its target environment. The Unicode Consortium accelerated the release of Unicode 6.2, to accommodate the urgent need for this character.
For more details of this release, see http://www.unicode.org/versions/Unicode6.2.0/.
Monday, September 10, 2012
CLDR Version 22 Released
Mountain View, CA, Sept. 10, 2012 - The Unicode® Consortium announced today the release of a new version of the Unicode Common Locale Data Repository (Unicode CLDR 22.0), providing key building blocks for software to support the world's languages.
About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.
Monday, July 23, 2012
Unicode Security Mechanisms, Version 3 Released
Version 3.0 is a major revision. Significant changes include:
- Mixed Script Detection has extensive revisions to its specification.
- Restriction Level now has an explicitly defined process.
- Mixed Number Detection now has an explicitly defined process.
- Conformance requirements have been extended to include Restriction Level and Mixed Number Detection.
http://www.unicode.org/reports/tr39/
Thursday, July 19, 2012
Version 15 is a major revision. Changes include:
- Conformance clauses dealing with non 1:1 equivalences were either retracted or modified.
- A Level 2 conformance clause for full properties was added.
- New properties, including Name_Alias matching and Script_Extensions, were added.
- A recommended compact form of Unicode escapes was added: \u{...}.
- There were many clarifications of the text. See http://www.unicode.org/reports/tr18/tr18-15.html
Monday, July 2, 2012
Dr. Vinton G. Cerf to Keynote IUC 36!
Tuesday, June 26, 2012
Proposed updates for Unicode Collation and IDNA
The data has been updated for the Unicode 6.2 beta review, and the associated CollationAuxiliary.txt file in CollationAuxiliary.zip now includes a description of the implicit fractional weight generation and the context syntax. For more details, see Modifications.
There is also a proposed update of UTS #46 Unicode IDNA Compatibility Processing. The data has been updated for the Unicode 6.2 beta review, with minor changes to the text. See PRI #224
Monday, June 25, 2012
Using the Unicode Glossary
http://unicode.org/glossary/#grapheme_cluster or
http://unicode.org/glossary/#code_point.
Wednesday, June 13, 2012
Tutorials Announced for IUC 36
Santa Clara, Calif., USA; October 22-24, 2012
The Internationalization and Unicode Conference (IUC) covers the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.
Tutorial Sessions Include:
- “An Introduction to Writing Systems & Unicode,” by Richard Ishida, Internationalization Activity Lead, W3C
- “Unicode – A Grand Tour,” by Michael McKenna, International Product Engineer, Zynga, Inc., and Craig Cummings, Globalization Center of Excellence, Rearden Commerce and UTC Vice Chair, Unicode Consortium
- “Internationalizing Domain Names in Applications (IDNA),” by Amit Gupta, Member Technical Staff, Adobe Systems
- “Internationalization, An Introduction (Part I: Character Encoding) (Part II: Enabling),” by Addison Phillips, Globalization Architect, Lab 126
- “Developing an OpenType Font for Complex Scripts Using Fontforge,” by Pravin Dinkar Satpute, Senior Software Engineer, Red Hat
- “I18N in Javascript with iLib,” by Edwin Hoogerbeets, Independent Globalization Consultant
- “Keyboard Design for Tavultesoft Keyman and Unicode,” by Marc Durdin, CEO, Tavultesoft Pty Ltd
- “Web Internationalization – Standards and Best Practices,” by Tex Texin, Chief Globalization Architect, Rearden Commerce, Inc.
- “Using ICU Workshop,” by Steven R. Loomis, Software Engineer, IBM
- “Internationalization and Localization in Ruby and Ruby on Rails,” by Martin J. Dürst, Professor, Aoyama Gakuin University
- “The Road to World-Class Starts with World-Ready,” by Michael Kuperstein, Localization Engineer and Loïc Dufresne de Virel, Localization Strategist, Intel Corporation
- “Building Multilingual Websites in Drupal 7 and Joomla 2.5,” by Jim DeLaHunt, Principal, Jim DeLaHunt & Associates
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.
The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, Rearden Commerce, SAP, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.
For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.
About the Event Producer
OMG® is the Event Producer for the Internationalization & Unicode Conferences. OMG is an open membership, not-for-profit consortium that produces and maintains computer industry specifications for interoperable enterprise applications. Our specifications include MDA®, UML®, CORBA®, MOF™, XMI® and CWM™. OMG’s specifications are all available for download by everyone without charge.
For more information about OMG, visit us online at http://www.omg.org.
Thursday, June 7, 2012
CLDR 21.0.2: New T Extensions for language/locale identifiers
- i0 - Input Method Transformations: bcp47/transform_ime.xml
- k0 - Keyboard Transformations: bcp47/transform_keyboard.xml
- t0 - Machine Translations: bcp47/transform_mt.xml
- x0 - Private Use fields: bcp47/transform_private_use.xml
- "zh-t-i0-pinyin", to indicate Chinese text generated with a pinyin input method
- "en-t-k0-dvorak", to identify a Dvorak keyboard for English
- "it-t-k0-osx-extended", to request an extended Mac keyboard for Italian
- "ru-t-en-x0-mobile", to indicate a translation from English to Russian for use on a mobile device, or
- "ja-t-de-t0-und-x0-medical", to identify a machine translation from German to Japanese with a specialized dictionary for medical terms.
Wednesday, June 6, 2012
PRI #231: Bidi Parenthesis Algorithm
http://www.unicode.org/review/pri231/
PRI #229: Linebreaking Changes for Pictographic Symbols
Please see: http://www.unicode.org/review/pri229/
Tuesday, June 5, 2012
New Public Review Issues: Changes to Character Properties
For details of the proposals, see the PRI pages:
http://www.unicode.org/review/pri227/
http://www.unicode.org/review/pri228/
Monday, June 4, 2012
New Public Review Issues: Changes to the Unihan database
For details of these see the PRI pages:
http://www.unicode.org/review/pri225/
http://www.unicode.org/review/pri226/
Friday, June 1, 2012
Unicode 6.2 Beta Review
Unicode 6.2 is a minor release of the Unicode Standard. The main feature of this release is the inclusion of the newly encoded Turkish lira symbol. However, there are other important changes to Unicode properties and annexes, affecting segmentation and more.
Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML,...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard. Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.2.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they will be ready for the release of Unicode 6.2.0 in September 2012.
• See http://www.unicode.org/versions/beta-6.2.0.html, for information about testing the 6.2.0 beta.
• See http://www.unicode.org/versions/Unicode6.2.0/ for the current draft summary of Unicode 6.2.0.
Thursday, May 31, 2012
Unicode sessions at Localization World Paris
In the morning, Richard Ishida will present “An Introduction to Writing Systems and Unicode”, a tutorial that will introduce the basic functioning of Unicode in dealing with non-Latin writing systems. It is an excellent orientation for people new to these concepts, but it also offers content for people at intermediate and advanced levels due to the breadth of scripts discussed.
In the afternoon, Addison will present "Internationalization: An Introduction", a two-part tutorial covering:
• What is internationalization?
• What is Unicode? Implementing and using the standard.
• How do you prepare software localization and translation?
Finally, Richard and Addison will present " Towards the Promised Land: Globalization Developments in Web Standards", which surveys current developments at the W3C.
You may register for any or all of these sessions via http://localizationworld.com/lwparis2012/registration.php where you will see the sessions in the preconference day.
This is an opportunity to get a taste of the Unicode conference to be held in California on the following October 22-24, and see how the people on your staff can benefit from a deeper knowledge of Unicode and internationalization.
Friday, May 25, 2012
Unicode 6.1 Paperback Available
Responding to requests, the editorial committee has created a modestly-priced print-on-demand volume that contains the complete text of the core specification of Version 6.1 of the Unicode Standard. This 692-page volume may be purchased from Lulu.com for $15.96, plus shipping.
Note that this volume does not include the Version 6.1 code charts, nor does it include the Version 6.1 Standard Annexes and Unicode Character Database, all of which are available only on the Unicode website, http://www.unicode.org/versions/Unicode6.1.0/ .
Purchase The Unicode Standard, Version 6.1 - Core Specification.
Tuesday, May 15, 2012
Unicode 6.2 to Support the Turkish Lira Sign
Recognizing the urgent need to support the new currency symbol in information systems, the Unicode Consortium has scheduled its next release, Unicode 6.2, for the third quarter of 2012. That release will include the new character, U+20BA TURKISH LIRA SIGN.
Additional information regarding the new Turkish lira sign is available from the Central Bank of Turkey: http://www.tcmb.gov.tr/yeni/iletisimgm/TurkishLira.php
Monday, April 23, 2012
Unicode Version 6.1 - Complete Text of Core Specification Published
Version 6.1 of the Unicode Standard continues the Unicode Consortium's long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world.
This version of the Standard brings technical improvements to support implementers, particularly with improvements to property values and their aliases that enable easier programmatic use. Other improvements include line-breaking behavior of Hebrew and Japanese text and segmentation behavior of Thai, Lao, and other similar languages.
In January 2012, the other portions of Unicode 6.1 were released: the Unicode Standard Annexes, code charts, and the Unicode Character Database, to allow vendors to update their implementations of Unicode 6.1 as quickly as possible. The release of the core specification completes the definitive documentation of the Unicode Standard, Version 6.1.
For more information on all of The Unicode Standard, Version 6.1, see http://www.unicode.org/versions/Unicode6.1.0/ .
Wednesday, April 4, 2012
Unicode CLDR Survey Tool now open for data submissions
CLDR provides key software building for the world's languages, with the largest and most extensive standard repository of locale data available. That repository is used in a wide variety of products, including most smart phones.
The survey tool (http://cldr.org/index/survey-tool) is used to submit translations to this repository, and to vote on others’ translations. For Version 22, the survey tool has undergone substantial revision, with dramatic improvements in performance and usability.
The data submission phase is scheduled to run from now until May 30, 2012, after which the vetting stage will begin. During the vetting stage, users can vote on translations, and correct new translations, but cannot otherwise enter translations.
If you have used the survey tool in a previous release of CLDR, your login ID and
password are still active. Otherwise you will need to set up a new account; please see the account instructions (http://cldr.org/index/survey-tool/accounts).
Friday, March 30, 2012
Call for Participation! - The 36th Internationalization & Unicode Conference - October 22-24, 2012
The Program Committee is soliciting proposals for presentations that describe cases studies, best practices, effective software design, innovative technology, or important standards. Tutorial presentations are also welcome. Suitable topics include, but are not limited to:
Application Areas
- Designing software platforms, operating systems, software as a service (SAAS), or programming environments
- Social networks
- Search engines, SEO, discovery and navigation best practices
- Websites and web services
- Libraries and education
- Mobile applications including iPhone, Android, iPad, Kindle, Windows Mobile, tablets, etc.
- Publishing and broadcasting for a global audience
- Internationalized Domain Names and other identifiers
- Security concerns and practices
- Voice to text, text to voice
- Machine translation
- Unicode, encodings, scripts, character properties, and algorithms
General Techniques
- Advances in technologies, algorithms or methodologies
- Using internationalization libraries and programming environments
- Handling bidirectional or other complex scripts
- HTML5 and HTML5-based applications
- Dealing with data formats: XML, JSON, HTML5, DITA, and upcoming standards
- Project management and methodologies for global development teams e.g. Agile
- Best practices in localization process and technology
- Best practices in world-ready development, test, and deployment
- Improving globalization capabilities within organizations
- Approaches for migrating legacy applications to global markets
- Font development and Typography
- Endangered Languages
- Unencoded Languages
- Case studies and research on cross-culture communication
- Digital Divide
- ISO language tag issues
- Languages of Africa, Asia, and the Middle East
- Locales and the Unicode Common Locale Data Repository (CLDR)
- Emoji support
Tutorial presenters receive complimentary conference registration, and two nights lodging. Session presenters receive a fifty percent conference discount and two nights lodging.
To be considered as a presenter for the conference, please submit a brief abstract by the deadline of Friday, May 18th.
The Program Committee will notify authors by Friday, June 1st. Final presentation materials will be required from selected presenters by Friday, August 3.
Wednesday, March 21, 2012
Unicode Releases Common Locale Data Repository, Version 21.0.1
The next major release is CLDR 22, scheduled for late August. The CLDR 22 release does involve general data submission, which will begin soon. For the latest schedule, see http://cldr.unicode.org .
Unicode CLDR Survey Tool Beta
The survey tool has undergone substantial revision, with dramatic improvements in performance and usability. We would appreciate people trying out the tool so that we can identify any remaining problems before we start data submission (currently scheduled for April 4). For more information, see http://goo.gl/7M1IG.
Access
- Production Survey Tool. If you have an existing survey tool account, you can go to the production tool at Production Survey Tool.
- Smoke-Test Survey Tool. If you don’t have an account, you can still try out the survey tool using the Smoke Test version. It will create a test account automatically.
So that you can try out the tool as you wish, none of the data you enter during beta is saved.
The Smoke Test tool may be restarted at any time, because it used for development. If you get disconnected when this happens, then refresh your browser: all of your changes should be saved.
Guide
If you haven’t used the survey tool before, you may want to take a quick look at:- http://cldr.unicode.org/index/survey-tool/guide
- http://cldr.unicode.org/index/survey-tool/walkthrough
This documentation should be updated in the next few days for some of the changes in UI.
How you can help
Visit and randomly vote and enter changes.- Pick your favorite locale
- Visit different Sections of the locale (Code Lists, etc), and different pages in the Section.
- Vote for different choices (including Proposed, Others, Abstain)
- Change a value (Change column)
- Try zooming on different columns (clicking on the following cells)
- St (status, eg error/alert)
- Draft (the approval status)
- Voted (the voting status)
- Proposed / Others (particular values)
- (Clicking on Code shows some internals, not really user-focused item)
- Reset your Coverage Level (at the top). (This changes how many items show).
- Verify that data was accepted, or is rejected (appropriately) because of an error.
- Periodically, refresh the entire page you are on and verify that items previously added remain visible.
- Report any new issues at http://unicode.org/cldr/trac/newticket. (Skip those below). Please include the URL to the page where you found the error.
Known issues
Please read these over so you know what to skip:- Use FireFox/Safari/Chrome, not IE 8 or other browsers.
- Some locales, such as English (en), are read-only.
- Do not post comments in the ‘forums’ or try “Show Coverage”
- Some generated examples use English instead of the local language, or the wrong currency.
- The information in the “Code” column will be simplified during the beta process; some rows will move to different sections or pages.
- There will be a bookmark on each row, for reference.
- Other items may be added to this list during the beta period
What changed?
- Page access is 10-30 times faster, depending on the operation.
- Items are submitted individually (with Return/Enter), instead of having to submit a whole page.
- Pages are not broken up into multiple subpages, simplifying navigation.
- Errors and Warnings appear when you submit an item.
- There is no “zoom” window; instead, zooming is in-place, with separate versions depending on what part of a row you click on.
Wednesday, March 7, 2012
Updates include:
- Mongolian and Egyptian Hieroglyphs changed to U.
- Implementation of recent UTC decisions
- Removal of the East Asian Class property
- East Asian Orientation renamed East Asian Vertical Orientation
- New property, Default Vertical Orientation
Tuesday, March 6, 2012
PRI #182: Unicode Regular Expressions: new proposed update
There are significant additions and changes in the new proposed update of this specification, with the addition of Name_Alias matching, matching rules from UAX #44, use of the new Script_Extensions property, new recommended properties, a compact form of \u{...}, alignment of rule RL1.4 with Appendix C, and the incorporation of text for PRI #179.
There are several of review notes requesting feedback on particular issues. Please submit feedback on those and the rest of this document by May 1 for consideration at the UTC meeting starting on May 7. For details, see:
http://www.unicode.org/review/pri182/
PRI #208, #209: Unicode Security: new proposed updates
There are significant additions and changes in the new proposed updates of these specifications. The definition of Restriction Levels has moved from UTR #36 to UTS #39, which also adds two new conformance clauses and specifications for Restriction Levels and mixed number detection, an amended specification for mixed script detection, and updates for Unicode 6.1.
There are several of review notes requesting feedback on particular issues. Please submit feedback on those and the rest of this document by May 1 for consideration at the UTC meeting on May 7. For details, see:
http://www.unicode.org/review/pri208/
http://www.unicode.org/review/pri209/
Saturday, March 3, 2012
New version of Unicode Ideographic Variation Database released
Thursday, March 1, 2012
IUC 36: October 22-24, 2012, Santa Clara, CA, USA
Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.
This highly rated conference features excellent technical content, industry-tested recommendations and updates on the latest standards and technology. Subject areas include cloud computing, upgrading to HTML5, integrating with social networking software, and implementing mobile apps. This year's conference will also highlight new features in Unicode Version 6.1 and other relevant standards published this year. Reasons to Attend Include:
- tutorials and sessions for beginners, to train you and your staff on basic practices and implementation techniques for creating international software
- learn recommended solutions to difficult problems or sophisticated requirements from industry leaders and experts in attendance
- find help from tool and product vendors to get you to market quickly and cost-effectively
Click here for more information.
Friday, February 17, 2012
Localization World Unicode workshop, June 2012, Paris
The Unicode Consortium’s goal is to enable people around the world to use computers in any language. The Consortium is involved in core internationalization specifications at the heart of all modern software, such as the Unicode Standard for character encoding. The Consortium’s involvement in localization is a key extension of this work. The Unicode Consortium maintains and extends the Common Data Locale Repository (CLDR), and in 2011 established the Unicode Localization Interoperability Technical Committee to improve the interoperability of localization data interchange.
For more information, including the program of the June LocalizationWorld Conference, please see http://www.localizationworld.com/lwparis2012/program.php .
Helena Chapman, chair, Unicode Localization Interoperability Technical Committee
Ulrich Henes, Donna Parrish and Daniel Goldschmidt, chair, vice-chairs, Localization World Conference Program Committee
Friday, February 10, 2012
Unicode Releases Common Locale Data Repository, Version 21.0
Unicode CLDR 21.0 contains data for 193 languages and 170 territories: 528 locales in all. This release did not include a public data submission phase, and focused on improvements to the LDML structure and tools, and consistency of data.
Thursday, February 2, 2012
UTS #10, Unicode Collation Algorithm, Version 6.1 Released
- The collation ordering for the 732 new Unicode characters.
- A major revision to the ordering of "variable" characters into groups, separating punctuation and symbols. This change may present migration issues for some implementations.
- Options added for ignoring spaces and punctuation (but not symbols), and for reordering groupings of characters, such as putting Latin characters before Greek (for Greek users), or digits after letters.
- A new section on asymmetric search (where a query of the base character 'e' matches é, è,…, but a query of the more specific é doesn't match other accented versions or the base character).
- Important restructuring and clarifications of other sections.
Wednesday, February 1, 2012
UTS #46, Unicode IDNA Compatibility Processing, Version 6.1 Released
The specification provides two main features for use with the internationalized domain names specification released in August 2010 (IDNA2008):
- A comprehensive mapping to reflect user expectations for casing and other variants of domain names. This mapping is allowed by IDNA2008, and follows the same principles as in the previous version of that specification (IDNA2003). It thus provides users consistency between old and new versions.
- A compatibility mechanism that supports internationalized domain names valid under the IDNA2003 specification and the IDNA2008 specification. This second feature allows browsers, search engines, and other clients to handle both old and new domain names during the transitional period until registries update their rules to follow IDNA2008.
Tuesday, January 31, 2012
Announcing the Unicode Standard, Version 6.1
This version of the Standard also brings technical improvements to support implementers. Improved changes to property values and their aliases mean that properties now have easy-to-specify labels. The new labels combined with a new script extensions property means that regular expressions can be more straightforward and are easier to validate.
Over 200 new Standardized Variants have been added for emoji characters, allowing implementations to distinguish preferred display styles between text and emoji styles. For example:
26FA FE0E | TENT text style | |
26FA FE0F | TENT emoji style | |
26FD FE0E | FUEL PUMP text style | |
26FD FE0F | FUEL PUMP emoji style |
Among the notable property changes and additions in Unicode 6.1 are two new line break property values, which improve the line-breaking behavior of Hebrew and Japanese text. Segmentation behavior was also improved for Thai, Lao, and similar languages.
Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.1. These will be finalized in February:
- UTS #10, Unicode Collation Algorithm
- UTS #46, Unicode IDNA Compatibility Processing
Friday, January 6, 2012
Release candidate for Unicode 6.1 character data
- Unicode
- http://unicode.org/Public/6.1.0/ucd/ (data, semicolon-delimited)
- http://unicode.org/Public/6.1.0/ucdxml/ (data, xml)
- http://www.unicode.org/reports/tr44/proposed.html (documentation)
- UCA
- http://unicode.org/Public/UCA/6.1.0/ (data)
- http://www.unicode.org/reports/tr10/proposed.html (documentation)
- IDNA compatibility
- http://unicode.org/Public/idna/6.1.0/ (data)
- http://www.unicode.org/reports/tr46/proposed.html (documentation)
- a problem is found in carrying out the actions directed by the Unicode Technical Committee for the release, or
- an editorial problem is found in the data comments or documentation.