æåã³ã¼ãã¯é¢ç½ããï¼ ãã¼ãï¼ ãã®ãã¼ï¼
ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾ð¾
MySQL 㧠utf8mb4_unicode_ci ã³ã¬ã¼ã·ã§ã³ã使ç¨ããæã«ãð£ã=ãðºããããã=ãããã«ãªãåé¡ãããã¾ãã
ãã® utf8mb4_unicode_ci ã£ã¦ãªãããï¼ã¨æã£ã¦ããã¥ã¢ã«ãè¦ã¦ã¿ãã¨ã
MySQL ã¯ãhttp://www.unicode.org/reports/tr10/ ã§èª¬æãã¦ãã Unicode ç §åé åºã¢ã«ã´ãªãºã (UCA) ã«å¾ã£ã¦ xxx_unicode_ci ç §åé åºãå®è£ ãã¾ããç §åé åºã¯ããã¼ã¸ã§ã³ 4.0.0 UCA éã¿ãã¼ (http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt) ã使ç¨ãã¾ãã
https://dev.mysql.com/doc/refman/5.6/ja/charset-unicode-sets.html
ã¨ããã¾ãã
Unicode ã«ã¯ Unicode Collation Algorithm (UCA) ã¨ããæ¨æºããããMySQL ã® utf8mb4_unicode_ci 㯠UCA ã®ãã¼ã¸ã§ã³ 4.0.0 ã使ç¨ãã¦ãã¾ãã
UCAã®ããã¥ã¡ã³ããã¡ããã¨èªãã ããã§ã¯ãªãã®ã§ä»¥ä¸ã®èª¬æã¯ãããã¼ã§ãã
åæåã®æ¯è¼ã¬ãã«ãå®ç¾©ãããã¼ãã«ã¯ Default Unicode Collation Element Table (DUCET)ã¨å¼ã°ã㦠UCA ã®ãã¼ã¸ã§ã³æ¯ã«æä¾ããã¦ãã¾ãã
UCA 4.0.0 ã® DUCET ã®ä¸å³ã¯ãããªæãã§ãã
0061 ; [.0E33.0020.0002.0061] # LATIN SMALL LETTER A
FF41 ; [.0E33.0020.0003.FF41] # FULLWIDTH LATIN SMALL LETTER A; QQK
0363 ; [.0E33.0020.0004.0363] # COMBINING LATIN SMALL LETTER A; QQK
249C ; [*0288.0020.0004.249C][.0E33.0020.0004.249C][*0289.0020.001F.249C] # PARENTHESIZED LATIN SMALL LETTER A; QQKN
1D41A ; [.0E33.0020.0005.1D41A] # MATHEMATICAL BOLD SMALL A; QQK
左端ã®16é²æ°ã¯Unicodeã®ã³ã¼ããã¤ã³ãã表ãããã®æ¬¡ã® [ ] ã§æ¬ããã4ã¤ã®16é²æ°ã¯æåã®æ¯è¼ã¬ãã«ã表ãã¾ãã
ã¬ãã«ã¯å·¦ããé ã«æ¬¡ã®ããã«ãªã£ã¦ãã¾ãã
L1 | Base characters | åºæ¬æå |
L2 | Accents | ã¢ã¯ã»ã³ã |
L3 | Case/Variants | 大æåå°æå/ç°ä½å |
L4 | Punctuation | å¥èªç¹(?) |
ããã¤ãæç²ãã¦ã¿ã¾ããå·¦ã«æåãã¤ãã¾ããã
a |
0061 ; [.0E33.0020.0002.0061] # LATIN SMALL LETTER A |
ï½ |
FF41 ; [.0E33.0020.0003.FF41] # FULLWIDTH LATIN SMALL LETTER A; QQK |
â |
24D0 ; [.0E33.0020.0006.24D0] # CIRCLED LATIN SMALL LETTER A; QQK |
A |
0041 ; [.0E33.0020.0008.0041] # LATIN CAPITAL LETTER A |
A |
FF21 ; [.0E33.0020.0009.FF21] # FULLWIDTH LATIN CAPITAL LETTER A; QQK |
Ã¥ |
00E5 ; [.0E33.0020.0002.0061][.0000.0043.0002.030A] # LATIN SMALL LETTER A WITH RING ABOVE; QQCM |
Ã
|
00C5 ; [.0E33.0020.0008.0041][.0000.0043.0002.030A] # LATIN CAPITAL LETTER A WITH RING ABOVE; QQCM |
b |
0062 ; [.0E4A.0020.0002.0062] # LATIN SMALL LETTER B |
ï½ |
FF42 ; [.0E4A.0020.0003.FF42] # FULLWIDTH LATIN SMALL LETTER B; QQK |
B |
0042 ; [.0E4A.0020.0008.0042] # LATIN CAPITAL LETTER B |
ãaãã£ã½ãæå㯠L1=0E33 ã§ãbãã£ã½ãæå㯠L1=0E4a ã«ãªã£ã¦ãã¾ãã
ãÃ
ãã¯è¤æ°ã®ã¬ãã«ãæã¡ã1åç®ã®ã¬ãã«ã¯ãA
ãã¨ã¾ã£ããåãã§ã2åç®ã®ã¬ãã«ã¯åææåç¨ã®ãË
ãã§ãã
NFDæ£è¦åãããç¶æ
(?)ã§ã¬ãã«ã表ããã¾ãã
L1 ã L1+L2 ã§æ¯è¼ããã¨ãa
ããï½
ããA
ããA
ãã¯åãæåã¨ãã¦æ±ããã¾ãã
L1+L2+L3 ã§æ¯è¼ããã¨ç°ãªãæåã¨ãã¦æ±ããã¾ãã
æåã®æ¯è¼ã«ã©ã®ã¬ãã«ã¾ã§ä½¿ç¨ãããã¯ã¢ããªæ¬¡ç¬¬ã§ãMySQL ã® utf8mb4_unicode_ci ã§ã¯ L1 ãã使ç¨ãã¦ãã¾ããã ãã®ãããè±åã¯å¤§æå/å°æå/å ¨è§/åè§ã¯åºå¥ããã¾ããã
ã¯=ã±=ã°=ã=ã=ã
ã§ãåé¡ã®ãã¯ããã±ããã°ããããããããããã§ããã次ã®ããã«ãªã£ã¦ãã¾ãã æ¿ç¹/åæ¿ç¹ã¤ãã®æåã¯æ£è¦åããã¦ãæ¸ é³æå+æ¿ç¹æåã®2ã¤ã®ã¬ãã«ã®çµã¿åããã§è¡¨ããã¦ã¾ãã
㯠| 306F ; [.1E6B.0020.000E.306F] # HIRAGANA LETTER HA |
ã± | 3071 ; [.1E6B.0020.000E.306F][.0000.0141.0002.309A] # HIRAGANA LETTER PA; QQCM |
ã° | 3070 ; [.1E6B.0020.000E.306F][.0000.0140.0002.3099] # HIRAGANA LETTER BA; QQCM |
ã | 30CF ; [.1E6B.0020.0011.30CF] # KATAKANA LETTER HA |
ã | 30D1 ; [.1E6B.0020.0011.30CF][.0000.0141.0002.309A] # KATAKANA LETTER PA; QQCM |
ã | 30D0 ; [.1E6B.0020.0011.30CF][.0000.0140.0002.3099] # KATAKANA LETTER BA; QQCM |
ãããã®æå㯠L1 ã¬ãã«ã§ã¯åãã¬ãã«ãªã®ã§ãL1 ã§ãã使ç¨ããªã MySQL ã® utf8mb4_unicode_ci ã§ã¯åºå¥ãããªããã¨ã«ãªãã¾ãã
ãã¯ããã±ããã°ãã ãã§ãªãããããããããããããããåºå¥ããã¾ããã
æ¥æ¬èªã¨ãã¦ã¯ãæ¸ é³ãæ¿é³ãåæ¿é³ãããããåºå¥ããã®ãèªç¶ã§ãããUnicode ã®æ¨æºã®è¦åã«ãããã£ã Case insensitive ã ã¨åºå¥ã§ãã¾ããã
utf8mb4_japanese_ci ã®ç»å ´ã«æå¾ ãããã¨ããã§ãã
ð£=ðº
çµµæåã®æ¯è¼ã¯ã¾ãäºæ ãç°ãªãã¾ããDUCET ã«ã¯çµµæåã¯å®ç¾©ããã¦ããªãã®ã§ããå®ã¯æ¼¢åãå®ç¾©ããã¦ãã¾ããã
UCA ã§ã¯ DUCET ã«å®ç¾©ããã¦ããªãæåã®æ±ãæ¹ãå®ãã¦ãã¾ãã(7.1.3)
AAAA = BASE + (CP >> 15);
BBBB = (CP & 0x7FFF) | 0x8000;
CP => [.AAAA.0020.0002.][.BBBB.0000.0000.]
BASE:
FB40 CJK Ideograph
FB80 CJK Ideograph Extension A/B
FBC0 Any other code point
ãæ¼¢ãã¨ããæåã®CP(Code point)ã¯U+6F22ãªã®ã§ã[.FB40.0020.0002.][.EF22.0000.0000]
ã¨ãªãã¾ãã
ãã®2ã¤ã®ã¬ãã«ãçµã¿ããã¦ä½¿ç¨ãã¾ãã
mysql> SELECT HEX(WEIGHT_STRING('æ¼¢')); +---------------------------+ | HEX(WEIGHT_STRING('æ¼¢')) | +---------------------------+ | FB40EF22 | +---------------------------+
åãããã«ãð£ãã¨ãðºãã®å¤ãæ±ããã¨
ãð£ã(U+1F363)㯠FBC3F363
ã¨ãªãããðºã(U+1F37A)ã¯FBC3F37A
ã¨ãªãã®ã§ãåºå¥ã§ããã¯ãã§ãã
ã¨ããã MySQL ã® utf8mb4_unicode_ci ã§ã¯ãçµµæåã«ã¤ãã¦ã¯ããã«å¾ãããFFFD
ã«ãã¦ãã¾ã£ã¦ãã¾ãã
ä¸è¬çãªç §åé åºã®è£å©æåã®å ´åãéã¿ã¯ 0xfffd REPLACEMENT CHARACTER ã®éã¿ã§ããUCA 4.0.0 ç §åé åºã®è£å©æåã®å ´åãç §åéã¿ã¯ 0xfffd ã§ãã
https://dev.mysql.com/doc/refman/5.6/ja/charset-unicode-sets.html
mysql> SELECT HEX(WEIGHT_STRING('ð£')); +-------------------------+ | HEX(WEIGHT_STRING('?')) | +-------------------------+ | FFFD | +-------------------------+
ã¤ã¾ããutf8mb4_unicode_ci 㧠ð£=ðº ã¨ãªãã®ã¯ Unicode ã®ããã§ã¯ãªããMySQL ã®åé¡ã§ãã
ãªããutf8mb4_unicode_520_ci ã§ã¯ã¡ããã¨è¨ç®ãããå¤ã使ç¨ãã¦ãã¾ãã
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_unicode_520_ci; mysql> SELECT HEX(WEIGHT_STRING('ð£')); +-------------------------+ | HEX(WEIGHT_STRING('?')) | +-------------------------+ | FBC3F363 | +-------------------------+ mysql> SELECT HEX(WEIGHT_STRING('ðº')); +-------------------------+ | HEX(WEIGHT_STRING('?')) | +-------------------------+ | FBC3F37A | +-------------------------+