Unicodeã®ä¼¼ãæåãæ´çãã¦ã¿ã
XMLãCSVçã®ãã¼ã¿ãJavaã§è²ã
å å·¥ãã¦åºåãããã¨ãã£ããã¨ããã¦ãã¨å¿
ããããã®ãæ³¢ç·ãªã©ã®æååãåé¡ã§ãã
æååããçºè¦ãããã³ã«ã°ã°ã£ã¦å ´å½ããçãªå¯¾å¦ãç¹°ãè¿ãã®ã«ç²ããã®ã§ããåé¡ã«ãªãæåã¨å½¢ãä¼¼ãæåããªã¹ãã¢ãããã¦ãæ´ã«ãããJavaã§å種ã¨ã³ã³ã¼ãã£ã³ã°ã«å¤æãããã©ã®æåã«ãªãããé å¼µã£ã¦çºãã¾ããã
ã¤ãã§ã«æååãããªãããä¸æããã¨åºåå¯è½ãªæåã«ç½®æããé¢æ°ãä½ã£ã¦ã¿ã¾ããã
Javaã®å¤æãã¼ãã«
- 表ä¸ã® U,S,W,E,J ã¯ãããããUTF-8ãShift_JISãWindows-31JãEUC-JPãISO-2022-JP ã§åºåããéã®æåã§ãã
- è¦ãç®ã§åãããªããããä¼¼ãæåã°ãããªã®ã§ãåã»ã«ã«ãã¦ã¹ã«ã¼ã½ã«ãä¹ããããã¼ã«ãããã§ç¢ºèªã§ããããtitleã«ã³ã¼ããã¤ã³ããæ¸ãã¦ããã¾ããã
- åããããããããéã¯æååããªããé»ã¯ä¼¼ãå½¢ã®å¥ã³ã¼ããã¤ã³ãã®æåã赤ã¯åºåä¸å¯*1ãã¨ãã風ã«è²åããã¦ãã¾ãã
- ã³ã¡ã³ãã¯ã°ã°ã£ãããã¦åãç解ããéãã®æåã®ç¨éã§ãã
- ç°å¢ã«ãã£ã¦æå¾ éãã«æåãè¦ããªããã¨ãããã¨æãã®ã§Chromeã§è¦ãキャプチャãæ®ã£ã¦ããã¾ãããåèã«ãã¦ãã ããã
- 確èªã¯ jre1.6.0_22 ã§è¡ãã¾ããã
ã³ã¼ã | å¹ | U | S | W | E | J | Name | ã³ã¡ã³ã | ||
---|---|---|---|---|---|---|---|---|---|---|
æ¨ªç· | ãã¤ãã³ | U+002D | 1 | - | - | - | - | - | HYPHEN-MINUS | ãã¤ãã³ãããã¯è² è¨å· |
U+00AD | 0or1 | | ? | ? | ? | ? | SOFT HYPHEN | èªä¸ã®æãè¿ãå¯è½åæã«è¡¨ç¤ºããããã¤ãã³ããã®ä½ç½®ã§æ¹è¡ããã¨ãã®ã¿è¡¨ç¤ºãããããã ã表示ãã¦ãè¯ãã | ||
U+2011 | 1 | ‑ | ? | ? | ? | ? | NON-BREAKING HYPHEN | å³ç«¯ã§ãæãè¿ãããªããã¤ãã³ | ||
U+2012 | 1 | ‒ | ? | ? | ? | ? | FIGURE DASH | æ°åã¨åãå¹ ã®ããã·ã¥ | ||
U+2013 | 1 | – | ? | ? | ? | ? | EN DASH | æ°å¤ã®ç¯å²ãä¾:1973–1984 | ||
U+2043 | 1 | ⁃ | ? | ? | ? | ? | HYPHEN BULLET | |||
U+FE63 | 0.5 | ﹣ | ? | ? | ? | ? | SMALL HYPHEN-MINUS | 1/4è§ãã¤ãã³ï¼Chromeã§è¦ãã¨åè§ã®ããã«ååå¹ ãIEãFirefoxã§è¦ãã¨å ¨è§å¹ ã«è¦ããã | ||
ãã¤ãã¹ | U+2212 | 1 | − | - | ? | - | - | MINUS SIGN | è² è¨å·ããã¤ãã¹ | |
U+207B | 1 | ⁻ | ? | ? | ? | ? | SUPERSCRIPT MINUS | ä¸ä»ããã¤ãã¹ | ||
U+208B | 1 | ₋ | ? | ? | ? | ? | SUBSCRIPT MINUS | ä¸ä»ããã¤ãã¹ | ||
U+FF0D | 2 | - | ? | - | ? | ? | FULLWIDTH HYPHEN-MINUS | å ¨è§ãã¤ãã¹ | ||
ç½«ç· | U+2500 | 2 | ─ | ─ | ─ | ─ | ─ | BOX DRAWINGS LIGHT HORIZONTAL | æ¨ªç´°ç½«ç· | |
U+2501 | 2 | ━ | ━ | ━ | ━ | ━ | BOX DRAWINGS HEAVY HORIZONTAL | æ¨ªå¤ªç½«ç· | ||
ä¸ç· | U+005F | 1 | _ | _ | _ | _ | _ | LOW LINE | åè§ã¢ã³ãã¼ãã¼ | |
U+FF3F | 2 | _ | _ | _ | _ | _ | FULLWIDTH LOW LINE | å ¨è§ã¢ã³ãã¼ãã¼ | ||
ä¸ç· | U+00AF | 1 | ¯ | ? |  ̄ | ¯ | ? | MACRON | é·é³ç¬¦å· | |
U+203E | 1 | ‾ | ~ | ~ | ~ | ‾ | OVERLINE | ãªã¼ãã¼ã©ã¤ã³ | ||
U+FFE3 | 2 |  ̄ |  ̄ |  ̄ |  ̄ |  ̄ | FULLWIDTH MACRON | é·é³ç¬¦å· | ||
強調ã»å¼ç¨ | U+2014 | 1 | — | ― | ? | ― | ― | EM DASH | Mã®åã®å¹ ã®ããã·ã¥ãå¼ç¨ãå¯é¡ã説æã«ä½¿ç¨ããã―使ç¨ä¾―*2 | |
U+2015 | 2 | ― | ? | ― | ? | ? | HORIZONTAL BAR | ―å¼ç¨ãªã©ã«ä½¿ã(quotation dash)― âãããªæãï¼ | ||
é³å¼ã | U+30FC | 2 | ー | ー | ー | ー | ー | KATAKANA-HIRAGANA PROLONGED SOUND MARK | é·é³è¨å·ãé³å¼ã | |
æ³¢ç· | U+007E | 1 | ~ | ~ | ~ | ~ | ~ | TILDE | åè§ãã«ã | |
U+223C | 1 | ∼ | ? | ? | ? | ? | TILDE OPERATOR | |||
U+223E | 1 | ∾ | ? | ? | ? | ? | INVERTED LAZY S | |||
U+301C | 2 | 〜 | ~ | ? | ~ | ~ | WAVE DASH | æ³¢ããã·ã¥ | ||
U+3030 | 2 | 〰 | ? | ? | ? | ? | WAVY DASH | |||
U+FF5E | 2 | ~ | ? | ~ | ~ | ? | FULLWIDTH TILDE | å ¨è§ãã«ã | ||
3ç¹ | U+2026 | 2 | … | … | … | … | … | HORIZONTAL ELLIPSIS | 3ç¹ãªã¼ã | |
U+22EF | 1 | ⋯ | ? | ? | ? | ? | MIDLINE HORIZONTAL ELLIPSIS | |||
ä¸ç¹ | U+00B7 | 1 | · | ? | ・ | ? | ? | MIDDLE DOT | ||
U+2022 | 1 | • | ? | ? | ? | ? | BULLET | |||
U+2219 | 1 | ∙ | ? | ? | ? | ? | BULLET OPERATOR | |||
U+22C5 | 1 | ⋅ | ? | ? | ? | ? | DOT OPERATOR | |||
U+30FB | 2 | ・ | ・ | ・ | ・ | ・ | KATAKANA MIDDLE DOT | ä¸ç¹ãä¸é» | ||
U+FF65 | 1 | ・ | ・ | ・ | ・ | ・ | HALFWIDTH KATAKANA MIDDLE DOT | ä¸ç¹(åè§ã«ã¿ã«ã) |
åãæå¾ ããç½®æ表
?ã«ããã¦ãã¾ãã¨è¡¨ç¤ºä¸å°ãã®ã§ããããªããªãããåãæå¾ ããç½®æ表ã以ä¸ã«ãªãã¾ãã
- åºæ¬çã«ã¯?ã«ãªã£ã¦ãã¾ãæåããåæåã®å®éã®ä½¿ããæ¹ãæåå¹ ãè¦ã¦ãããã³ã°ãã¦ãã¾ãã
- ä¸ä»ããä¸ä»ãã®ãã¤ãã¹(U+207B,U+208B)ã«ã¤ãã¦ã¯ãä¸æã«åè§ãã¤ãã³çã«ç½®æãã¦ãã¾ãã¨æ¬æ¥ã®æèä¸ã®æå³ãå£ã表示ã«ãªã£ã¦ãã¾ãå¯è½æ§ãé«ãçºãããã¦ç½®æããæååããæ¾ç½®ãã¦ãã¾ãã
- é·é³ç¬¦å·ã®ãU+00AFã®EUC-JPãããU+203Eã®ISO-2022-JPãã®ããã«å ã ã®ãããã³ã°ã§è¡¨ç¤ºåºæ¥ã¦ããé¨åã¯ãããå°éãã¦æ®ãã¦ã¾ãã
- ä¸ç¹ã®U+00B7ã®Windows-31Jã«é¢ãã¦ã¯ãå ã ã¯å ¨è§ä¸ç¹ã«ãããã³ã°ããã¦ãã¾ããæåå¹ åªå ã§åè§ä¸ç¹ã«æãã¦ãã¾ã
ã³ã¼ã | å¹ | U | S | W | E | J | Name | ã³ã¡ã³ã | ||
---|---|---|---|---|---|---|---|---|---|---|
æ¨ªç· | ãã¤ãã³ | U+002D | 1 | - | - | - | - | - | HYPHEN-MINUS | ãã¤ãã³ãããã¯è² è¨å· |
U+00AD | 0or1 | | SOFT HYPHEN | èªä¸ã®æãè¿ãå¯è½åæã«è¡¨ç¤ºããããã¤ãã³ããã®ä½ç½®ã§æ¹è¡ããã¨ãã®ã¿è¡¨ç¤ºãããããã ã表示ãã¦ãè¯ãã | ||||||
U+2011 | 1 | ‑ | - | - | - | - | NON-BREAKING HYPHEN | å³ç«¯ã§ãæãè¿ãããªããã¤ãã³ | ||
U+2012 | 1 | ‒ | - | - | - | - | FIGURE DASH | æ°åã¨åãå¹ ã®ããã·ã¥ | ||
U+2013 | 1 | – | - | - | - | - | EN DASH | æ°å¤ã®ç¯å²ãä¾:1973-1984 | ||
U+2043 | 1 | ⁃ | - | - | - | - | HYPHEN BULLET | |||
U+FE63 | 0.5 | ﹣ | - | - | - | - | SMALL HYPHEN-MINUS | 1/4è§ãã¤ãã³ï¼Chromeã§è¦ãã¨åè§ã®ããã«ååå¹ ãIEãFirefoxã§è¦ãã¨å ¨è§å¹ ã«è¦ããã | ||
ãã¤ãã¹ | U+2212 | 1 | − | - | - | - | - | MINUS SIGN | è² è¨å·ããã¤ãã¹ | |
U+207B | 1 | ⁻ | ? | ? | ? | ? | SUPERSCRIPT MINUS | ä¸ä»ããã¤ãã¹ | ||
U+208B | 1 | ₋ | ? | ? | ? | ? | SUBSCRIPT MINUS | ä¸ä»ããã¤ãã¹ | ||
U+FF0D | 2 | - | - | - | - | - | FULLWIDTH HYPHEN-MINUS | å ¨è§ãã¤ãã¹ | ||
ç½«ç· | U+2500 | 2 | ─ | ─ | ─ | ─ | ─ | BOX DRAWINGS LIGHT HORIZONTAL | æ¨ªç´°ç½«ç· | |
U+2501 | 2 | ━ | ━ | ━ | ━ | ━ | BOX DRAWINGS HEAVY HORIZONTAL | æ¨ªå¤ªç½«ç· | ||
ä¸ç· | U+005F | 1 | _ | _ | _ | _ | _ | LOW LINE | åè§ã¢ã³ãã¼ãã¼ | |
U+FF3F | 2 | _ | _ | _ | _ | _ | FULLWIDTH LOW LINE | å ¨è§ã¢ã³ãã¼ãã¼ | ||
ä¸ç· | U+00AF | 1 | ¯ |  ̄ |  ̄ | ¯ |  ̄ | MACRON | é·é³ç¬¦å· | |
U+203E | 1 | ‾ | ~ | ~ | ~ | ‾ | OVERLINE | ãªã¼ãã¼ã©ã¤ã³ | ||
U+FFE3 | 2 |  ̄ |  ̄ |  ̄ |  ̄ |  ̄ | FULLWIDTH MACRON | é·é³ç¬¦å· | ||
強調ã»å¼ç¨ | U+2014 | 1 | — | ― | ― | ― | ― | EM DASH | Mã®åã®å¹ ã®ããã·ã¥ãå¼ç¨ãå¯é¡ã説æã«ä½¿ç¨ããã―使ç¨ä¾―*3 | |
U+2015 | 2 | ― | ― | ― | ― | ― | HORIZONTAL BAR | ―å¼ç¨ãªã©ã«ä½¿ã(quotation dash)― âãããªæãï¼ | ||
é³å¼ã | U+30FC | 2 | ー | ー | ー | ー | ー | KATAKANA-HIRAGANA PROLONGED SOUND MARK | é·é³è¨å·ãé³å¼ã | |
æ³¢ç· | U+007E | 1 | ~ | ~ | ~ | ~ | ~ | TILDE | åè§ãã«ã | |
U+223C | 1 | ∼ | ~ | ~ | ~ | ~ | TILDE OPERATOR | |||
U+223E | 1 | ∾ | ~ | ~ | ~ | ~ | INVERTED LAZY S | |||
U+301C | 2 | 〜 | ~ | ~ | ~ | ~ | WAVE DASH | æ³¢ããã·ã¥ | ||
U+3030 | 2 | 〰 | ~ | ~ | ~ | ~ | WAVY DASH | |||
U+FF5E | 2 | ~ | ~ | ~ | ~ | ~ | FULLWIDTH TILDE | å ¨è§ãã«ã | ||
3ç¹ | U+2026 | 2 | … | … | … | … | … | HORIZONTAL ELLIPSIS | 3ç¹ãªã¼ã | |
U+22EF | 1 | ⋯ | … | … | … | … | MIDLINE HORIZONTAL ELLIPSIS | |||
ä¸ç¹ | U+00B7 | 1 | · | ・ | ・ | ・ | ・ | MIDDLE DOT | ||
U+2022 | 1 | • | ・ | ・ | ・ | ・ | BULLET | |||
U+2219 | 1 | ∙ | ・ | ・ | ・ | ・ | BULLET OPERATOR | |||
U+22C5 | 1 | ⋅ | ・ | ・ | ・ | ・ | DOT OPERATOR | |||
U+30FB | 2 | ・ | ・ | ・ | ・ | ・ | KATAKANA MIDDLE DOT | ä¸ç¹ãä¸é» | ||
U+FF65 | 1 | ・ | ・ | ・ | ・ | ・ | HALFWIDTH KATAKANA MIDDLE DOT | ä¸ç¹(åè§ã«ã¿ã«ã) |
æååãåé¿é¢æ°
ä¸è¨è¡¨ã®ãããªåºåãããçºã«ä½æããé¢æ°ã以ä¸ã§ããJavaã§Writerçã«æååãåºåããåã«åãencodingãæå®ãã¦ãã®é¢æ°ã«ããã¦ããã°æåãã?ãã«ãããã³ã°ããã¦ãã¾ããã¨ãé¿ãããã¾ãã
ãªãããã®å¤æãã¼ãã«ã¯ããã¾ã§ãåãããã®æåã¯ãã表示ãã¦ããã°å¤§æµã®ã±ã¼ã¹ã§æºè¶³ã ããã¨èãããã®ã§ãã表示ä¸ã®ãã©ãã«åé¿éè¦ã®ãã®ãªã®ã§ãåèã«ããéã¯ãã®ç¹ãè¯ãè¸ã¾ããä¸ã§ãå©ç¨ä¸ããã
ãã¨ã³ã¼ãä¸ã®ArrayUtilsã¨StringUtilsã¯commons-langã«å«ã¾ããã¯ã©ã¹ã§ãã
/** * æååãã®åå ã«ãªãæåããæååããªãæåã«ç½®æãã¾ãã * @param str * @param encoding å¤é¨åºåäºå®ã®æåã³ã¼ã(ãã®å¤ã«ããç½®æãã¼ãã«ã代ããã¾ã) * @return */ public static String normalizeSimilarCharacter(String str, String encoding) { if(str == null || encoding == null) { return str; } encoding = encoding.toLowerCase(); if("windows-31j".equals(encoding)) { return StringUtils.replaceEach(str, SIMILAR_CHARS_W31J_FROM, SIMILAR_CHARS_W31J_TO); } else if("shift_jis".equals(encoding)) { return StringUtils.replaceEach(str, SIMILAR_CHARS_SJIS_FROM, SIMILAR_CHARS_SJIS_TO); } else if("euc-jp".equals(encoding)) { return StringUtils.replaceEach(str, SIMILAR_CHARS_EUCJP_FROM, SIMILAR_CHARS_EUCJP_TO); } else if("iso-2022-jp".equals(encoding)) { return StringUtils.replaceEach(str, SIMILAR_CHARS_ISO2022JP_FROM, SIMILAR_CHARS_ISO2022JP_TO); } return str; } //å ±éç½®æãã¼ãã« private static final String[] SIMILAR_CHARS_COMMON_FROM = new String[]{ "\u00AD", "\u2011", "\u2012", "\u2013", "\u2043", "\uFE63", //åè§ãã¤ãã³ "\u223C", "\u223E", //åè§æ³¢ç·âåè§ãã«ã "\u22EF", //3ç¹ "\u00B7", "\u2022", "\u2219", "\u22C5" //åè§ä¸ç¹ }; private static final String[] SIMILAR_CHARS_COMMON_TO = new String[]{ "\u002D", "\u002D", "\u002D", "\u002D", "\u002D", "\u002D", //åè§ãã¤ãã³ "\u007E", "\u007E", //åè§æ³¢ç·âåè§ãã«ã "\u2026", //3ç¹ "\uFF65", "\uFF65", "\uFF65", "\uFF65" //åè§ä¸ç¹ }; //ã¨ã³ã³ã¼ãã£ã³ã°å¥ç½®æãã¼ãã« private static final String[] SIMILAR_CHARS_SJIS_FROM; private static final String[] SIMILAR_CHARS_SJIS_TO; private static final String[] SIMILAR_CHARS_W31J_FROM; private static final String[] SIMILAR_CHARS_W31J_TO; private static final String[] SIMILAR_CHARS_EUCJP_FROM; private static final String[] SIMILAR_CHARS_EUCJP_TO; private static final String[] SIMILAR_CHARS_ISO2022JP_FROM; private static final String[] SIMILAR_CHARS_ISO2022JP_TO; static { SIMILAR_CHARS_SJIS_FROM = (String[]) ArrayUtils.addAll(SIMILAR_CHARS_COMMON_FROM, new String[]{ "\uFF0D"/*å ¨è§ãã¤ãã¹*/, "\u00AF"/*é·é³ç¬¦å·*/, "\u2015"/*強調å¼ç¨*/, "\u3030", "\uFF5E"/*æ³¢ç·*/ }); SIMILAR_CHARS_SJIS_TO = (String[]) ArrayUtils.addAll(SIMILAR_CHARS_COMMON_TO, new String[]{ "\u2212"/*å ¨è§ãã¤ãã¹*/, "\uFFE3"/*é·é³ç¬¦å·*/, "\u2014"/*強調å¼ç¨*/, "\u301C", "\u301C"/*æ³¢ç·*/ }); SIMILAR_CHARS_W31J_FROM = (String[]) ArrayUtils.addAll(SIMILAR_CHARS_COMMON_FROM, new String[]{ "\u2212"/*å ¨è§ãã¤ãã¹*/, "\u2014"/*強調å¼ç¨*/, "\u3030", "\u301C"/*æ³¢ç·*/ }); SIMILAR_CHARS_W31J_TO = (String[]) ArrayUtils.addAll(SIMILAR_CHARS_COMMON_TO, new String[]{ "\uFF0D"/*å ¨è§ãã¤ãã¹*/, "\u2015"/*強調å¼ç¨*/, "\uFF5E", "\uFF5E"/*æ³¢ç·*/ }); SIMILAR_CHARS_EUCJP_FROM = (String[]) ArrayUtils.addAll(SIMILAR_CHARS_COMMON_FROM, new String[]{ "\uFF0D"/*å ¨è§ãã¤ãã¹*/, "\u2015"/*強調å¼ç¨*/, "\u3030"/*æ³¢ç·*/ }); SIMILAR_CHARS_EUCJP_TO = (String[]) ArrayUtils.addAll(SIMILAR_CHARS_COMMON_TO, new String[]{ "\u2212"/*å ¨è§ãã¤ãã¹*/, "\u2014"/*強調å¼ç¨*/, "\uFF5E"/*æ³¢ç·*/ }); SIMILAR_CHARS_ISO2022JP_FROM = (String[]) ArrayUtils.addAll(SIMILAR_CHARS_COMMON_FROM, new String[]{ "\uFF0D"/*å ¨è§ãã¤ãã¹*/, "\u00AF"/*é·é³ç¬¦å·*/, "\u2015"/*強調å¼ç¨*/, "\u3030", "\uFF5E"/*æ³¢ç·*/ }); SIMILAR_CHARS_ISO2022JP_TO = (String[]) ArrayUtils.addAll(SIMILAR_CHARS_COMMON_TO, new String[]{ "\u2212"/*å ¨è§ãã¤ãã¹*/, "\uFFE3"/*é·é³ç¬¦å·*/, "\u2014"/*強調å¼ç¨*/, "\u301C", "\u301C"/*æ³¢ç·*/ }); }