You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unicode 16 introduces changes to UTS46 that require refactoring our UTS46 implementation. Specifically:
Reissued for Unicode 16.0.0.
The handling of UseSTD3ASCIIRules has been simplified. Conditional data involving disallowed_STD3_* Status values has been replaced with simple checking for a subset of ASCII characters in the Validity Criteria. This simplifies the data format and data lookup, makes standard UseSTD3ASCIIRules=true handling consistent with custom UseSTD3ASCIIRules, and avoids unnecessarily disallowing certain labels that contain disallowed_STD3_mapped characters but which do not contain non-LDH ASCII characters when the mappings are applied.
Behavior for UseSTD3ASCIIRules=false is unchanged.
Examples for UseSTD3ASCIIRules=true behavior changes:
Example for a label which continues to fail the Validity Criteria despite the change in Processing: In Unicode 15.1, input label "⑷" was unchanged in Processing and failed the Validity Criteria. (U+2477 disallowed_STD3_mapped was resolved to disallowed, and its mapping was not applied.) In Unicode 16.0, "⑷" is Mapped to "(4)", which still fails the Validity Criteria, except if a custom set of valid ASCII characters is used that includes the parentheses.
Example for a label which newly passes the Validity Criteria due to the change in Processing: In Unicode 15.1, input label "\uFF1D\u0338" (fullwidth equals + combining solidus overlay) was unchanged in Processing and failed the Validity Criteria. (U+FF1D disallowed_STD3_mapped was resolved to disallowed, and its mapping was not applied.) In Unicode 16.0, "\uFF1D\u0338" is Mapped to "\u003D\u0338" and Normalized to "\u2260" (not equal to), which is valid.
In Section 4, Processing, if the label starts with “xn--”, and the conversion from Punycode yields either an empty label or an all-ASCII label, then an error is now recorded, consistent with IDNA2008.
Changed Section 6 Mapping Table Derivation, Table 3. Base Valid Set, replacing \p{Block=Ideographic_Description_Characters} and \u31EF with the equivalent [\p{IDS_Unary_Operator}\p{IDS_Binary_Operator}\p{IDS_Trinary_Operator}].
Changed Section 6 Mapping Table Derivation, Step 3: Specify the base exclusion set, to define a small, fixed base exclusion set. Previously, the base exclusion set had been derived from differences between IDNA2003 data and UTS46 principles.
Changes in the Unicode 15.1 version led to unexpected edge cases in processing. At the same time, transitional processing was deprecated.
The UTC concluded that it was no longer necessary to disallow characters on the basis of differences from IDNA2003, and decided to simplify the definition of the base exclusion set.
As a result, a number of characters that were disallowed before are now ignored, mapped, or (in the one case of U+1806 MONGOLIAN TODO SOFT HYPHEN) valid. In xn-- Punycode labels, characters with Status ignored and mapped are still not valid. The recent edge cases and processing complications are no longer present.
For details see the proposal in document L2/24-064 item 6.2.
Removed the content of Section 7, IDNA Comparison, which is no longer applicable.
Noted in Section 8.3, Migration new syntax in the test file: "" means an empty string. There are also other test data corrections and improvements.
The text was updated successfully, but these errors were encountered:
Unicode 16 introduces changes to UTS46 that require refactoring our UTS46 implementation. Specifically:
Behavior for UseSTD3ASCIIRules=false is unchanged.
Examples for UseSTD3ASCIIRules=true behavior changes:
Changes in the Unicode 15.1 version led to unexpected edge cases in processing. At the same time, transitional processing was deprecated.
The UTC concluded that it was no longer necessary to disallow characters on the basis of differences from IDNA2003, and decided to simplify the definition of the base exclusion set.
As a result, a number of characters that were disallowed before are now ignored, mapped, or (in the one case of U+1806 MONGOLIAN TODO SOFT HYPHEN) valid. In xn-- Punycode labels, characters with Status ignored and mapped are still not valid. The recent edge cases and processing complications are no longer present.
For details see the proposal in document L2/24-064 item 6.2.
The text was updated successfully, but these errors were encountered: