login
A327894
Unicode codes for digit characters in the Basic Multilingual Plane (BMP).
2
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 1632, 1633, 1634, 1635, 1636, 1637, 1638, 1639, 1640, 1641, 1776, 1777, 1778, 1779, 1780, 1781, 1782, 1783, 1784, 1785, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 2406, 2407, 2408, 2409, 2410, 2411, 2412, 2413, 2414, 2415, 2534
OFFSET
1,1
COMMENTS
A digit is a character in a given Unicode block that can be combined with other digits in that same Unicode block to form zero-padded nonnegative integers in a positional number base, which can then be sorted arithmetically according to their Unicode values.
That's not the case for numerals, such as for example Roman Numeral Ten (U+2169), nor ASCII letters used as Roman numerals. For example, XIV, XVI, XIX would be sorted as XIV, XIX, XVI (14, 19, 16, rather than 14, 16, 19).
This listing does not include digit characters in the BMP Private Use Area, such as, for example, Klingon Digit Zero (U+F8F0, corresponding to 63728) as assigned by ConScript.
Unicode consistently assigns digit characters so that the code point modulo 16 corresponds to the numerical value of the digit, so that digit zero is U+xxx0 and digit nine is U+xxx9.
Except for the digits for Indian languages and the Limbu language, which are assigned so that the code point modulo 16 corresponds to the numerical value of the digit plus 6 (thus digit zero is U+xxx6 and digit nine is U+xxxF).
This is a principle that ConScript for the most part follows, such as for example with Klingon and Ferengi. Perhaps the only exception is U+E033, Tengwar Letter Stemless Vilya which also serves as Tengwar Digit One. Interestingly, however, take note of Tengwar Duodecimal Digit Ten (U+E06A) and Tengwar Duodecimal Digit Eleven (U+E06B).
The largest number that could theoretically be in this sequence is 65535. There are digits in the supplementary planes, e.g. Osmanya Digit Zero, U+104A0, giving a theoretical maximum of 1114111.
However, since the assignment of the supplementary planes is ongoing for many years to come, to include those in this entry might require occasional insertions. Computation of that sequence would pose only one minor and easily solved problem for Java's isDigit() function, though.
On the other hand, the assignment of Arabic Extended-B (U+0870 through U+089F) has not been finalized yet, though that proposal contains neither digits nor numerals.
Also, no proposal nor suggestion has been made for U+2FE0 through U+2FEF, so there is a small chance those could be assigned digits.
LINKS
Oracle Corporation, Javadoc for Character.isDigit(). isDigit() "determines if the specified character is a digit. ... Note: This method cannot handle supplementary characters. To support all Unicode characters, including supplementary characters, use the isDigit(int) method" instead.
EXAMPLE
The following digits zero are encoded in the BMP (excluding Private Use Area).
U+0030 (48) [ISO-LATIN-1] Digit Zero
U+0660 (1632) Arabic-Indic Digit Zero
U+06F0 (1776) Extended Arabic-Indic Digit Zero
U+07C0 (1984) Nko Digit Zero
U+0966 (2406) Devanagari Digit Zero
U+09E6 (2534) Bengali Digit Zero
U+0A66 (2662) Gurmukhi Digit Zero
U+0AE6 (2790) Gujarati Digit Zero
U+0B66 (2918) Oriya Digit Zero
U+0BE6 (3046) Tamil Digit Zero
U+0C66 (3174) Telugu Digit Zero
U+0CE6 (3302) Kannada Digit Zero
U+0D66 (3430) Malayalam Digit Zero
U+0E50 (3664) Thai Digit Zero
U+0ED0 (3792) Lao Digit Zero
U+0F20 (3872) Tibetan Digit Zero
U+1040 (4160) Myanmar Digit Zero
U+1090 (4240) Myanmar Shan Digit Zero
U+17E0 (6112) Khmer Digit Zero
U+1810 (6160) Mongolian Digit Zero
U+1946 (6470) Limbu Digit Zero
U+19D0 (6608) New Tai Lue Digit Zero
U+1A80 (6784) Tai Tham Hora Digit Zero
U+1A90 (6800) Tai Tham Tham Digit Zero
U+1B50 (6992) Balinese Digit Zero
U+1BB0 (7088) Sundanese Digit Zero
U+1C40 (7232) Lepcha Digit Zero
U+1C50 (7248) Ol Chiki Digit Zero
U+A620 (42528) Vai Digit Zero
U+A8D0 (43216) Saurashtra Digit Zero
U+A900 (43264) Kayah Li Digit Zero
U+A9D0 (43472) Javanese Digit Zero
U+AA50 (43600) Cham Digit Zero
U+ABF0 (44016) Meetei Mayek Digit Zero
U+FF10 (65296) Fullwidth Digit Zero
Although U+3007, Ideographic Number Zero, from the CJK Symbols & Punctuation block, does have an associated numeric value of 0, it returns false for isDigit(). Thus 12295 is not included in this sequence.
PROG
(Scala) (0 to 65535).filter(_.toChar.isDigit)
CROSSREFS
KEYWORD
nonn,fini
AUTHOR
Alonso del Arte, Sep 29 2019
STATUS
approved