Showing posts with label security. Show all posts
Showing posts with label security. Show all posts

Wednesday, March 2, 2022

Avoiding Source Code Spoofing

Unicode has convened a group of experts in programming languages, tooling, and security to provide guidance and recommendations on how to better handle international text in source code, as well as providing code to help implementations.

Recent reports have highlighted problems in the review of source code containing non-ASCII Unicode characters (the so-called “Trojan Source exploit”). A person reviewing a submission of source code could be fooled into thinking that the code was okay, when it was actually malicious. The basic problem occurs when the actual text is different from what the reader perceives it to be, based on what is displayed. This can result either from the presence of characters used in right-to-left scripts (such as Arabic or Hebrew) that can change the visual ordering of text, or from the presence of characters that look like others (also known as “confusables”).

The problems here are not solely a security issue: text with different writing directions or confusable characters can be hard to work with. Finding a solution here is important from both security and usability points of view. Developers of source code editors or compilers should not be required to have a deep knowledge of Unicode to provide good user experience and robust security mitigations.

Unicode’s mission is to allow everyone to use their own languages on computers and mobile devices. The above issues are part and parcel of a character set that covers all the writing systems of the world – and have been documented in the Unicode Standard since its very first version in 1991. Unicode’s past efforts have focused on misleading URLs and identifiers, and correct visual ordering of plain text. And while much of this material is relevant to source code, this group of experts will now collect, curate, and supplement that early documentation with concrete recommendations to support source code editors and compilers.

While it may seem that it is easiest to simply go back to limiting source code to only ASCII characters, ASCII-only environments make it much harder to write and maintain software that can be used all over the world – a fundamental requirement for modern software. Moreover, this approach disadvantages software developers who use languages other than English.

More details on the source code spoofing issue, the proposed plan, and formation of this group are found in document L2/22-007R2.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Thursday, February 12, 2015

Unicode Security Mechanisms (UTS #39) Proposed Update Available


UTS #39, Unicode Security Mechanisms has a proposed update available, with modifications for alignment with the future Unicode 8.0, and some revisions to data and format. Most notable are the new Unicode 8.0 characters, the removal of the SL, SA, and ML data, and some changes in the values of the General Security Profile Identifier Types, and changes in the format for both the Status and Types.

Feedback is welcome through April 27, 2015. For further information and details about how to provide feedback, please see Public Review Issue #292.

Wednesday, May 28, 2014

Unicode Security Data: Beta Review

The documents and data for two Public Review Issues have been recently revised. The issues are PRI #272 Proposed Update UTR #36, Unicode Security Considerations and PRI #273 Proposed Update UTS #39, Unicode Security Mechanisms.

We have revised the draft data for UTS #39: Unicode Security Mechanisms (confusables and identifier restrictions) for review. There are also some small changes to the text of  UTR #36: Unicode Security Considerations, mostly cleanup in preparation for publication. The most important data files for review are:
These files can be downloaded from http://unicode.org/Public/security/7.0.0 and diffed against the corresponding files in http://unicode.org/Public/security/6.3.0
 
The comment period ends July 28, 2014.

Monday, March 31, 2014

Proposed Updates for Unicode Security-Related Publications

Proposed updates are now available for UTR #36, Unicode Security Considerations, and UTS #39, Unicode Security Mechanisms. These are both being updated to correspond with Unicode 7.0.

PRI #272, Proposed Update UTR #36, Unicode Security Considerations:
This UTR is being updated. In this draft, a description has been added about the downside of displaying URLs as Punycode. A note has also been added on the use of Catalan in identifiers.

PRI #273, Proposed Update UTS #39, Unicode Security Mechanisms:
This UTS is being updated to correspond with Unicode 7.0. Text has been added about the use of NFC, and on the use of Catalan in identifiers. A note has been added on the collection of confusable data outside of Status=allowed, such as for non-NFKC characters.

Review notes solicit feedback on whether to (a) add multi-character sequences to the data file, (b) change some of the Type values, and (c) base the data more on CLDR exemplars, and/or (d) change the format of the data files.

The closing date for both of these issues is April 28, 2014. For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions on the PRI pages.

The Public Review Issues page is: http://www.unicode.org/review/

Monday, July 23, 2012

Unicode Security Mechanisms, Version 3 Released

Version 3.0 of UTS #39, Unicode Security Mechanisms has been released by the Unicode Consortium, together with a new version of the associated UTR #36, Unicode Security Considerations. Because the Unicode Standard contains such a large number of characters for the writing systems of the world, caution is necessary to avoid exposing programs and systems to possible security attacks. These revised documents describe security considerations for Unicode and specify improved mechanisms for reducing the risk of problems.

Version 3.0 is a major revision. Significant changes include:
  • Mixed Script Detection has extensive revisions to its specification.
  • Restriction Level now has an explicitly defined process.
  • Mixed Number Detection now has an explicitly defined process.
  • Conformance requirements have been extended to include Restriction Level and Mixed Number Detection.
http://www.unicode.org/reports/tr36/
http://www.unicode.org/reports/tr39/