ãã®è¨äºã¯ 2024 TSG Advent Calendar 3æ¥ç®ã®è¨äºã§ããæ¨æ¥ã®è¨äºã¯ @__dAi00 ããã®è¨äº AivisSpeechã使ã£ãDiscordãããã®ä½æãâ AivisSpeechãGoogle Cloud Runã«ãããã¤ãã ã§ããã12/5 å ¬éäºå®ã®ç¶ç·¨ã楽ãã¿ã§ãã
ä»åã¯ã忥ã«å ¬éãã以ä¸ã®è¨äºã®å¯ç£ç©ã§ãã
import unicodedata # Python 3.12 ã¾ã§ 1000000000000.0 # Python 3.13 ãã 1000000.0 print(unicodedata.numeric("å "))
大å¤ã ãPython 3.13 ããã5000 å åãã 50 ååã«ãªã£ã¦ãã¾ãï¼ï¼ï¼
unicodedata.numeric ã¡ã½ãã㨠Unicode
ä¾ã«ãã£ã¦ Unicode ãé¢ä¿ãã¦ãã¾ããåç·¨ã§ã触ããã¨ãããUnicode ã¯ããããã®æåãã©ããªæ§è³ªãæã¤ã®ãããUnicode æåãã¼ã¿ãã¼ã¹ (UCD)ã1 ã«è¨é²ãã¦ãã¾ãããã¼ã¿ãã¼ã¹ã«ã¯ãçµµæåãã©ãã (Emoji)ããã大æåãã©ãã (Uppercase)ãã¨ãã£ã屿§ãæåãã¨ã«è¨é²ããã¦ãã¾ãã
ä»åã®è¨äºã§éµã«ãªãæ
å ±ã¯ Numeric_Value ããããã£2 ã§ãããã®ããããã£ã¯ãæåãã©ãããæ°ã表ãã¦ãããã示ãããã®ã§ããPython ã® unicodedata.numeric ã¡ã½ããã¯ãæåãã¼ã¿ãã¼ã¹ã« Numeric_Value ããããã£ã®ç»é²ããã£ãå ´åã«ãã®å¤ãè¿ãã¾ã3ã
æ¼¢åã® Numeric_Value ãæ±ºããã®ã¯ãUnicode ã® CJK çµ±åæ¼¢åã®æ§è³ªã管çãã Unihan ãã¼ã¿ãã¼ã¹ã® kPrimaryNumeric ããããã£ã§ããã²ã¨ã¤ã®æ¼¢åã«è¤æ°ã® kPrimaryNumeric ãç»é²ããã¦ããå ´åãNumeric_Value ã«ã¯æåã®å¤ãæ¡ç¨ããã¾ã4ã
kPrimaryNumeric
... If an ideograph has more than one numeric value, the first one is to be considered the most common one, and that first value is used for the Numeric_Value property of the ideograph.
ãªãå¤ãã£ãã®ãï¼
Python 3.13 ãã unicodedata.numeric("å
") ã®çµæãå¤ãã£ãã®ã¯ãUnicode 15.1 ã§ãå
ã(U+5146) ã® kPrimaryNumeric ã夿´ãããããã§ããåèã¾ã§ã«ãUnicode 15.05 㨠Unicode 15.16 ã§ã® Unihan ãã¼ã¿ãã¼ã¹ã§ãå
ãã®æ±ããæ²è¼ãã¦ããã¾ãã
| ãã¼ã¸ã§ã³ | kPrimaryNumeric | Numeric_Value |
|---|---|---|
| 15.0 | 1000000000000 |
1000000000000 |
| 15.1 | 1000000 1000000000000 |
1000000 |
106 ãæ°ãã追å ãããã®ã¯ããå ãã 106 ã¨ãã¦æ±ãä¸å½ããããã ãªã©ã®ã§ã®æ £ä¾ããã¨ã«ãªã£ã¦ãã¾ã7ããªããä¸å½ã§ã¯ç¾å¨ã§ã SI æ¥é èªã®ãã¡ã¬ããè¡¨ãæ¼¢åã¨ãã¦ä½¿ããã¦ããããã§ã8ã
ããã¯æ¥æ¬ã«ããã 1012 ã表ããå ãã®ç¨ä¾ã¨ã¯è¡çªãã¾ããããããã£ãå ´åã¯ä¸å½æ¬åã«ãããä¸å½èªã®ç¨ä¾ãåªå ãããå¾åãããããã§ã9ãããããä¸å½èªã§ã®ç¨ä¾ãåªå ãããçµæããå ãã® Numeric_Value ã 106 ã«ä¸æ¸ããããã¨èãããã¾ãã
- Unicode® Standard Annex #44 - Unicode Character Databaseãåç §æ¥: 2024-11-17↩
- Unicode Character Database - Numeric_Valueãåç §æ¥: 2024-12-02↩
- unicodedata --- Unicode Databaseãåç §æ¥: 2024-11-23↩
- Unicode® Standard Annex #38 - Unicode Han Database (Unihan)ãåç §æ¥: 2024-11-23↩
- DerivedNumericValues-15.0.0.txtãåç §æ¥: 2024-11-24↩
- DerivedNumericValues-15.1.0.txtãåç §æ¥: 2024-11-24↩
- L2/22-223: Proposed Updates and Expansions of Unihan Numeric Fieldsãåç §æ¥: 2024-11-17↩
- å½å¡é¢å ³äºå¨æå½ç»ä¸å®è¡æ³å®è®¡éåä½çå½ä»¤ãåç §æ¥: 2024-12-02↩
- Twitterãåç §æ¥: 2024-11-24↩