ãæååãã«ã¤ãã¦
åº
「文字列を文字の列とみなす単純化」について議論ãããã¾ãããåæãæãè½ã¡ã¦ãããã«æãã®ã§æ¸ããã¨ã«ãã¾ãã
ãããããã®è©±ã¯ã©ã®ãããªæèã®ä¸ã«ãããã¨ããã¨ãããã¹ãå¦ç (wikipedia:en:Text_processing) ã®æèã«ãªãã¾ããããã§ãããããã¹ãå¦çãã¨ã¯ plain text (wikipedia:ãã¬ã¼ã³ããã¹ã) ã®æ¤ç´¢ã»å å·¥ã®ãã¨ã§ãããã§ã¯ç¹ã« UNIX Text Processing ã®ç³»èã念é ã«ç½®ããã¦ãã¾ããã¤ã¾ããè¤éãªè£
飾ãå«ããªããããã¹ãã§ã¯ãªããå¦çã®å¯¾è±¡ã ASCII æååã¨ããã¤ãã®å¶å¾¡æåã¸ã¨æ½è±¡åãããã¨ã§ãæ£è¦è¡¨ç¾ã®ãããªå¼·åãªéå
·ãç¨ããå¦çãå¯è½ã¨ããä¸çã§ããUNIX ã§ã®ã話ã§ããããããã§ã®å
·ä½çãªå¦çã®åä½ã¯ char ã§ãããå
¨ä½ã¨ãã¦ã¯ char[] ã«ãªãã¾ãããã® char ã®ä¸èº«ã¯ä¸ã§è¿°ã¹ãã¨ãããASCII æååã§ãã®ã§ãå¤ã¯ 0〜127 ã§ãã (å®éåæã® UNIX tools 㯠8-bit clean ã§ã¯ãªãã£ã)
ãã¦ãã¢ã¡ãªã«ä»¥å¤ã®å½ã§ã¯ (å¶å¾¡æåè¾¼ã¿ã§) 128æåã§ã¯å½ç¶è¶³ããªãã®ã§ãæ¡å¼µãå¿
è¦ã§ããæ¡å¼µã®æ¹éã«ã¯
- ã¯ã¤ãæå: char ã大ãããã
- ãã«ããã¤ãæå: è¤æ°ã® char ãçµã¿åããã¦ãã1æåãã«ãã
ã®2種é¡ãããã¾ãã
ã¯ã¤ãæå
ã¨ã¼ãããã®è¨èªã§ã¯ã¢ã¯ã»ã³ãä»ãæåãç¨ãã¾ãããããã㯠ASCII ã«åé²ããã¦ãã¾ãããããã§åãããçã 7 bit ãã 8 bit ã¸ã®æ¡å¤§ã§ããã¨ã¯ãããUNIX ã«ãã㦠char ã¯ããã¦ãã®å ´åãã¨ãã 8 bit ã§ãããããä½ãæ°ãããã®ãå°å
¥ããå¿
è¦ã¯ããã¾ããã§ããã
ãã®å¾ã® wchar_t ã®å°å
¥ã TRON ã³ã¼ããUnicode ãªã©ã§ã¯ã16 bit ã 32 bit ãªã©ã®ãã大ããªæååãææ¡ããã¦ãã¾ãã
ã¯ã¤ãæåã®ã¡ãªããã¯ããã°ã©ãã³ã°ã¢ãã«ã¯ãã®ã¾ã¾ã§ãåã¨é¢æ°ã ããåãæ¿ããã°ããå¤ãã®æåãæ±ãããã¨ã§ããããã«ãããæåã³ã¼ãã®ãã¨ãããç¥ããªã人ã§ãå½éåã«å¯¾å¿ããããã°ã©ã ãéçºåºæ¥ã¾ãã
ãã¡ãªããã¯æåååãå¤æ´ããããã¨ã§ããã°ã©ã ã®ã¤ã³ã¿ã¼ãã§ã¤ã¹ãå¤ãã£ã¦ãã¾ããã¨ã¨ããããã¯ã¼ã¯ãªã©ã®ãã¤ãã¹ããªã¼ã ã§ã¯ãã®ã¾ã¾ã§ã¯ä½¿ããªããã¨ã«ãªãã¾ãã
ãã«ããã¤ãæå
è¤æ°ã® char ãçµã¿åããã¦ä¸ã¤ã®æåã表ãããã«ããã®ãããã«ããã¤ãæåãã§ãã
ãã®æ¹æ³ã®å ´åãããã°ã©ãã³ã°ã¢ãã«ã¯è¤éã«ãªãã¾ãããæ¢åã®ããã°ã©ã ãæ¹ä¿®ããéã«ããã®ã¤ã³ã¿ã¼ãã§ã¤ã¹ãç¶æããã¾ã¾æåæ°ãæ°ãããªã©å¿
è¦æå°éã®å¤æ´ã§æ¥æ¬èªå¯¾å¿ããããããã¨ãã¡ãªããã§ããã
ä¸çã®é¸æ
å½åãä¸ã®äººã
ã¯ã¯ã¤ãæåãããé·æçã«ã¯æ£ããé¸æã ã¨èãã¦ãã¾ããã1 wchar_t = 1æå ã¯æããã«ããããããæ½è±¡åã ããã§ããã¾ããåä¸ã®æåã³ã¼ãUnicodeãæ¡ç¨ãããã¨ã§å½éåã¯æ ¼æ®µã«å®¹æã¨ãªããã¨ãæå¾
ããã¾ããããããããã®èãã¯ä½éã«ãééã£ã¦ãããã¨ããã®å¾æããã«ãªãã¾ãã
ã¾ãã¯Unicodeã16bitã«åã¾ããªãã£ããã¨ã§ããããã«ãããã·ã³ãã«ãª16bitåºå®é·ã ã£ãUCS-2ã¯ãµãã²ã¼ããã¢ãç¨ããUTF-16ã¸ã¨æ¹ä¿®ããããå¾ãªããªããè¤æ°ã®ã¯ã¤ãæå (code unit) ã§1æåã表ãã¨ãã両è
ã®ãã¡ãªãããå
¼ãåããæ¹å¼ã¸ã¨å ã¡ã¾ããã
ã¾ããUnicodeãæ¬æ ¼çã« IVS ãã¯ããã¨ããæåã®çµåãæ±ãããã«ãªã£ããã¨ã«ãããè¤æ°ã®Unicodeæå (Unicode scalar value) ãçµã¿åããã¦1æåã表ããçµåæååããæ±ãå¿
è¦ãå¢ããçµå±ããã°ã©ãã³ã°ã¢ãã«ã®å¤æ´ãå¿
è¦ã¨ãªã£ã¦ãã¾ãã¾ãããï¼ãªããçµµæåå¨ãã®è©±ã¯æåã³ã¼ãã®ããªãä¸ã®æ¹ã®ã¬ã¤ã¤ã¼ã®è©±ãªã®ã§ãä»åã®è©±ã¨ã¯å
¨ãé¢ä¿ãªãã¨æãï¼
å ãã¦ãç¾ä»£ã®å¤ãã®ããã°ã©ã ã§ã¯ãããã¯ã¼ã¯éä¿¡ãæ±ããã¨ãå¤ãã¨æãã¾ãããã¨ãããHTTPãæ±ãã¨æãã¾ãããHTTPã¯ASCIIããã¼ã¹ã¨ãã¤ã¤ãã¤ããªãæ··ããæ··æ²ã¨ããä¸çã§ããè¨ãæããã¨ãã¤ãåãªã®ã«æååå¦çããããã¨ããã¯ã¤ãæåã¨ãã¦ã¯æ±ãã¥ãããã¼ã¿ã«ãªãã¾ããï¼byte[]ã¨Stringã®ç¸äºå¤æãä¹±ãé£ã¶ã³ã¼ããç®ã«ãã人ãå¤ãã®ã§ã¯ãªãã§ããããï¼
çµå±ã®æãã¯ã¤ãæåã¨ãã試ã¿ã¯å¤±æã§ãã£ãããä»ã¨ãªã£ã¦ã¯ç¡æå³ãªè¤éåã§ããã¨çµè«ã¥ãã¦ããã®ã§ã¯ãªããã¨æãã¾ãã
Ruby ã®é¸æ
Ruby ã¯ãã«ããã¤ãæååã®ã¢ããã¼ããæ¡ç¨ãã¦ãã¾ããï¼ãUnicodeè·¯ç·ã«è¦åããã¤ãããã®ã§ã¯ãªããã¯ã¤ãæåè·¯ç·ã«è¦åããä»ããã®ã§ããï¼ããã«ãããString#each_byte, String#each_char, String#each_codepoint ãªã©ã®ã¡ã½ãããéãã¦ããã¤ãåä¸ã«æ§ç¯ãããæ§ã
ãªæ½è±¡åã¬ã¤ã¤ãèªå¨ã«æ±ããã¨ãåºæ¥ã¾ããã¾ããç¾å¨ã¯è¦æãã¦ã¼ã¹ã±ã¼ã¹ãçç¡ãªããã«å®è£
ã¯è¦éã£ã¦ãã¾ãããIVS ãèæ
®ãã each äºç¨®ã容æã«è¿½å ãããã¨ãåºæ¥ã¾ãã
å ãã¦ãè¤æ°ã®æåã³ã¼ããåæã«æ±ãâ¦ãã¨ã¯æ®éã®äººã¯ãããªãã¨æãã¾ããããããã¯ã¼ã¯ããã°ã©ãã³ã°ãè¡ãéã«ãã¤ãåã¨æååãã·ã¼ã ã¬ã¹ã«æ±ããã¨ãåºæ¥ã¾ããã¤ã¾ããHTTPãããçãæ±ãéã«byte[]ã¨Stringãå¤æããå¿
è¦ãããã¾ããã
ã¾ã¨ã
ã¯ã¤ãæåã¯ãªã¯ã³ã³ã