ãã¹ã¦ã®æ¼¢åãåãåºãæ£è¦è¡¨ç¾
http://www.unixuser.org/~euske/doc/python/sample.py.html
# æ¥æ¬èªãã¼ã¯ã³ãåãåºãããã®æ£è¦è¡¨ç¾ã
JP_TOKEN = re.compile(u"[ä¸-é¾ ]+|[ã-ã]+|[ã¡-ã´]+|[a-zA-Z0-9]+")
http://www.ascii.co.jp/pb/ascii/archive/aftercare/1999.html
ï¼»äº-ç ï¼½ã¯JISæ¼¢åãæ¤åºããã¨ãã«ä½¿ãæ£è¦è¡¨ç¾ã«ãªãã¾ãã
ãæ¬æä¸ã§è§¦ãã¦ãããä¸å¤ªé Lite2ãã®æ£è¦è¡¨ç¾ã¯Unicodeä»æ§ãªã®ã§ï¼ãã¹ã¦ã®æ¼¢åãæ¤åºããã«ã¯ï¼[ä¸-é¾ ]ã使ç¨ãã¦ãã ããï¼é¾ ã¯é³èªã¿ã§ãã¤ã¯ãï¼è¨èªã¿ã§ããµããï¼Unicodeã§ã¯ã9FA0ãã«ãããã¾ãï¼ã
追è¨
- â»ããã¹ã¦ã®æ¼¢åãåãåºãæ£è¦è¡¨ç¾ãã«ã¤ãã¦ã¯ãid:toton:20051105 ã«è¨äºã追å ãã¾ããã
- [ä¸-é¾ ]ã¯æ¼¢åæ½åºã®æ£è¦è¡¨ç¾ã¨ãã¦ã¯ééãã§ãæ£è§£ã¯Unicodeã¹ã¯ãªãã\p{Han}(perl)ãããã§ãã http://tama-san.com/?p=196
- Unicodeãããã¯\p{InCJKUnifiedIdeographs}(java)ã\p{IsCJKUnifiedIdeographs}(.net)ãå©ç¨ã§ãããããã§ããhttp://module.jp/blog/regex_unicode_prop.html
http://java.sun.com/j2se/1.5.0/ja/docs/ja/api/java/util/regex/Pattern.html#cg
http://msdn.microsoft.com/ja-jp/library/20bw873z(VS.80).aspx