This presentation explores common mistakes made by programmers when dealing with Unicode support and character encodings on the Web. For each mistake, Iâ¦
ãããã°ã©ãã®ããã®æåã³ã¼ãæè¡å ¥éããèªãã§èªåãªãã«ç解ããç¹ãã¶ãã¯ãªã¨ã¾ã¨ãã¦ã¿ãã ããã»ã©æ£ç¢ºæ§ãæ±ãã¦æ¸ãã¦ããããã§ã¯ãªãã®ã§ãééã£ã¦ãå¯è½æ§å¤§ã§ãã ééããªã©ããã°ã³ã¡ã³ããªã©é ããã¨ãããããã§ãã ããããã®æåã³ã¼ãã¯ã©ãéãã®ãï¼ æ¥æ¬èªã®æåã³ã¼ãã¯å¤§ãã以ä¸ã®ï¼ã¤ã«åãããã JIS X 0208 æåéåããã¼ã¹ã«ãããã® Unicodeæåéåããã¼ã¹ã«ãããã® JIS X 0208 æåéåããã¼ã¹ã«ããæåã³ã¼ãã«ã¯ãEUC-JP, Shift_JIS, ISO-2022-JP ãããã Unicodeæåéåããã¼ã¹ã«ããæåã³ã¼ãã«ã¯ãUTF-8, UTF-16 ãªã©ãããã ä¸ã§æãããæåã³ã¼ããã¨ã¯æ£ç¢ºã«ã¯ãã¨ã³ã³ã¼ãã£ã³ã°ï¼æå符å·åæ¹å¼ï¼ãã®äºãæãã æå符å·åæ¹å¼ æåéåã£ã¦ï¼ èªãã§ãã®ã¾ãã¾âæåã®ç¨®é¡ã®éã¾ãâãããã£ã©
ä»ç®ç´Greasemonkeyã§ãGM_setValueã«ç»é²ããæ¥æ¬èªã®æååã対çã«encodeURIãã¤ãã£ããã©ãencodeURIã¯URIãã¨ã³ã³ã¼ãããããã®é¢æ°ãªã®ã§ããã¾ãè¤ãããã使ãæ¹ã§ã¯ãªãã§ãããããããencodeURIã®ãããªURLã¨ã³ã³ã¼ãã£ã³ã°ã¯æååãUTF-8ã«ãã¦ãããã¨ã³ã³ã¼ã対象ã®åãã¤ãã%xxå½¢å¼*1ã§è¡¨ç¾ããã®ã§ãæ¥æ¬èªï¼æåãããããã®ã«ããã¦ãã®å ´åASCII9æåãå¿ è¦ã«ãªã£ã¦å¹çãæªãã§ãã*2 ãªã®ã§ãåè¿°ã®ãããªåã«éASCIIæåãã¨ã¹ã±ã¼ããããã ãã®ãããªã±ã¼ã¹ã§ã¯Unicodeã¨ã¹ã±ã¼ãã使ã£ãæ¹ãããã§ããJavaã®propertiesã¨ãnative2asciiã¨ãã®ãã¤ã§ãã Unicodeã¨ã¹ã±ã¼ãã¯\uxxxx*3ã®å½¢å¼ã§ããããã®ã§ãããã¦ãã®æ¥æ¬èªï¼æåã¯ASCII6æåã«ãªã£ã¦ãURLã¨ã³ã³ã¼ãã£ã³ã°ã«æ¯ã¹
id:tomi-ru ããã [http://e8y.net/mag/015-encode/:title] ã¨ããã¨ã¦ããã©ã¯ãã£ã«ã«ãª [http://search.cpan.org/perldoc?Encode:title=Encode] å ¥éããæ¸ãã«ãªã£ãã®ã§ï¼ããããéãåãå£ã§æ¸ãã¦ã¿ãããªãã¾ããã ãã¡ããã®åºç¤ï¼èªã¿é£ã°ãå¯ï¼ æåã»ãã, ãã£ã©ã¯ã¿ã»ãã, æåéå, æåéå - Wikipedia ã¨ã³ã³ã¼ãã£ã³ã°, 符å·åæ¹å¼, æå符å·åæ¹å¼ - Wikipedia ãã®2ã¤ã¯ç°ãªãã¾ããã¨ãã«ç¥ããªãã¦ãä¸è¨ã®ææ¸ãèªããã¨ã¯ã§ãã¾ããï¼ç解ãã¦ããã¨ããã«ãªãã¾ããããããç¥ããã人ã¯èªç¿ãã¦ãã ããã æåã»ããã®ä¾ Unicode JIS X 0208 ã²ãããªã¨ãã«ã¿ã«ãã¨ãæ¼¢åã¨ã ASCII æå ã¨ã³ã³ã¼ãã£ã³ã°ã®ä¾ UTF-8 ISO-202
Unicode ã®16é²æ°ã®å®ä½åç §ãæ£è¦è¡¨ç¾ãªã©ã§å ã«æ»ã 2008-05-10-3 [Programming] æ®æ®µã¯é©åã«å¦çãããã¦ã¦åé¡ã«ãªããªããã ãã©ã ã¨ãã©ãã²ãããªå±é¢ã§åºä¼ã£ã¦ãã¾ãã®ãã "�" ã¨ããå½¢å¼ã®æååã16é²æ°ã®å®ä½åç §ã§ãã Perl ã§ãã³ã¼ãããæ¹æ³ãã¡ã¢ã pack 㨠Encode::decode ã使ãã¨è¯ãã¿ããã #!/usr/bin/perl use strict; use warnings; use Encode; use utf8; binmode STDOUT, ":utf8"; my $a = "情報時代"; $a =~ s/&#x([0-9A-F]{4});/decode('UCS2', pack('H*', $1))/ge; print "$a\n";
2008å¹´05æ11æ¥21:00 ã«ãã´ãªLightweight LanguagesTips perl - æååç §ã(en|de)codeãã ãã§ã«æ£è§£ãæ¸ããã¦ãã¾ããã [ã] Unicode ã®16é²æ°ã®å®ä½åç §ãæ£è¦è¡¨ç¾ãªã©ã§å ã«æ»ã pack 㨠Encode::decode ã使ãã¨è¯ãã¿ããã ã¯ã¦ãªããã¯ãã¼ã¯ - miyagawaã®ããã¯ãã¼ã¯ / 2008å¹´05æ11æ¥ ãã HTML::Entities::decode / regexp ã§ã chr(hex($1)) ã®ã»ãããããããããªãã㪠繰ãè¿ãã¦ããã ãã®ä¾¡å¤ã¯ããã®ã§ã HTML::Entitiesã使ã ã¾ããHTML::Entitiesã®decode_entities()ã使ãã¨ããæ¹æ³ãããã¾ããããããã¹ããã©ã¯ãã£ã¹ããªã #!/usr/local/bin/perl use strict;
2008å¹´05æ08æ¥04:00 ã«ãã´ãªLightweight Languages perl - Encode ä¸ç´ 以åæ¸ãã 404 Blog Not Found:perl - Encode å ¥é ã¯å¤§å¥½è©ã§ãããã ã¦ã§ãã§å©ç¨ãããæåã³ã¼ããUnicodeãASCIIãä¸åã--ã°ã¼ã°ã«ãæããã«:ãã¼ã±ãã£ã³ã° - CNET Japan UnicodeãASCIIã追ãè¶ããWorld Wide Webä¸ã§æãå¤ãå©ç¨ããã¦ããæåã³ã¼ãä½ç³»ã«ãªã£ãã¨Googleã®ã·ãã¢ã¤ã³ã¿ã¼ãã·ã§ãã«ã½ããã¦ã§ã¢ã¢ã¼ããã¯ãMark Davisæ°ãããã°ã§è¿°ã¹ã¦ããã ã¨ããæ代ã«å®å ¨å¯¾å¿ããã«ã¯ãå ¥é以ä¸ã®ç¥èãã¡ãã£ã¨å¿ è¦ã«ãªãã¾ãã ä¾ãã°ãæ¬blogããã¹ããã¦ããã¦ããlivedoor blogã®æåã³ã¼ãã¯EUC-JPããæ代ã¯Unicodeãã ã¨è¨ã£ã¦ããããããäºæ ãã¾ã
2008å¹´05æ02æ¥04:00 ã«ãã´ãªLightweight Languages Unicode - ä¼¼ãæåå士ã«ãç¨å¿ å¾è ã¯ãã¤ãã³ã§ãªãã¦ãã¤ãã¹è¨å·ã§ãããªã [ã] UTF-8 ã®å ¨è§ãã¤ãã³ã Perl ã®æ£è¦è¡¨ç¾ã«ãããããªãã¦æ©ãã ã§ãå ã®ããã¹ããã¡ã¤ã«ã®å ¨è§ãã¤ãã³ããod -t x1ã ã§è¦ã¦ã¿ãã¨ãef bc 8dãã¨ãe2 88 92ãã®ï¼ç¨®é¡ãæ··ãã£ã¦ãã¾ããã åè ã¯ã\p{Hyphen}ãã«ãããããã®ã§ããå¾è ã¯ãã¡ã ã¾ãåå ã¯åãã£ãã®ã§ãåå¦çã§ãã¤ããªç½®æãã¦è§£æ±ºãã¾ããã ã§ãç´ããããããªã®ãååã®HYPHENã¨MINUS SIGNã§grepããã¨ãããªæãã«ãªãã egrep '(HYPHEN|MINUS SIGN)' /usr/local/lib/perl5/5.10.0/unicore/Name.pl -002DHYPHEN-MI
2008å¹´02æ18æ¥10:00 ã«ãã´ãªLightweight Languages perl - utf8::is_utf8("\x{ff}") == 0 ã¡ããã©ããæ©ä¼ãªã®ã§ãPerl 5.8以éã«ãããutf8ãã©ã°ã®ç«ã¡æ¹ãã unknownplace.org - 2008/02/17 - utf8::is_utf8 ã¨ãããã¨ã§ã"\x{6751}\x{702c}\x{5927}\x{8f14}" ãªã©ã¨ããData::Dumper表è¨ã§ããªãããã utf-8ãã©ã°ããã¤ãããããªããã¨ãããã¨ãããããã£ããã ã¨æãã®ã ããã©ã \x{UUUUUU}ã¨utf8 flag ã¾ãã¯ã¯ã¤ãºã§ãã以ä¸ãã©ãåºåãããããçããªããã sub pfrag{ print utf8::is_utf8($_[0]) ? 1 : 0, "\n" } pfrag "Hell\xC3, worl
Redirecting⦠Click here if you are not redirected.
Python ã® unicodedata ã¢ã¸ã¥ã¼ã« - bkããã° ãèªãã§ï¼ãã Perl ã ã¨ã©ããã¨æã£ããã§å¯¾å¿ãã¾ã¨ãã¦ã¿ã¾ãããutf8 flag ã Encode ã¢ã¸ã¥ã¼ã«ã«ã¤ãã¦ã¯ç¹ã«è§£èª¬ãã¾ããã®ã§ã æåã®ååãåå¾ãã charnames ã¢ã¸ã¥ã¼ã«ï¼Perl 5.6 ããä»å±ï¼ã® viacode é¢æ°ã使ãã¨æåã®ååãåå¾ãããã¨ãã§ãã¾ããå¼æ°ã¨ãã¦æååã§ã¯ãªãæåã³ã¼ãã渡ãã¨ãããè¦æ³¨æã§ãã use utf8; use charnames qw( :full ); print charnames::viacode(ord 'A')), "\n"; # 'LATIN CAPITAL LETTER A' print charnames::viacode(ord 'ã')), "\n"; # 'HIRAGANA LETTER A' æååãªãã©ã«ã¨ãã¦æ¨æº
2007å¹´03æ10æ¥17:30 ã«ãã´ãªLightweight Languages javascript - encodeURIUnicode()ã¨%uXXXXåé¡ ãããè¦ã¦ã(de|en)codeURIUnicodeãããã°ããã¨æã£ãã®ã§ä½ã£ã¦ã¿ã sawatã®æ¥è¨ - Unicodeã¨ã¹ã±ã¼ã ãªã®ã§ãåè¿°ã®ãããªåã«éASCIIæåãã¨ã¹ã±ã¼ããããã ãã®ãããªã±ã¼ã¹ã§ã¯Unicodeã¨ã¹ã±ã¼ãã使ã£ãæ¹ãããã§ããJavaã®propertiesã¨ãnative2asciiã¨ãã®ãã¤ã§ãã Decoded: Dan å¼¾ 𪲠Encoded: Dan%20%u5F3E%20%uD869%uDEB2 è¦ã¯ãU+00ffã¾ã§ã¯encodeURIComponentäºæããã以ä¸ã¯escape()äºæã¨ãããã®ããããã¦encodeããããã®ã¯ãCGI.pmã¨ãã§ãã»ã¼ãã®ã¾ã¾ä½¿ããã
ããã Unicode ã§ããªã©ã§ç´¹ä»ããã¦ãããUnicode ã® U+202E (RIGHT-TO-LEFT OVERRIDE; RLO)ã使ã£ã¦æ¡å¼µåãå½è£ ããã exe ãã¡ã¤ã«ã®å®è¡ãææ¢ããæ¹æ³ãæãã¤ããã ã¡ã¢å¸³ãéãã¦ã"**"ã¨å ¥åãã(åå¾ã®å¼ç¨ç¬¦ã¯ä¸è¦)ã "*"ã¨"*"ã®éã«ãã£ã¬ãã(ã«ã¼ã½ã«)ã移åããã å³ã¯ãªãã¯ã§ãUnicode å¶å¾¡æåã®æ¿å ¥ããããRLO Start to right-to-left overrideããRLO Start of right-to-left overrideããé¸æ Ctrl-A ã§å ¨ã¦é¸æãCtrl-C ã§ã¯ãªãããã¼ãã«ã³ãã¼ã ãã¼ã«ã«ã»ãã¥ãªãã£ããªã·ã¼ãéã ç»é¢å·¦å´ã®ã追å ã®è¦åããå³ã¯ãªã㯠ãæ°ãããã¹ã®è¦åããé¸æ ããã¹ãæ¬ã§ Ctrl-V ããã¦ãã¡ã¢å¸³ã®å 容ãè²¼ãä»ããã ã»ãã¥ãªãã£ã¬ãã«ãã
ãªãªã¼ã¹ãé害æ å ±ãªã©ã®ãµã¼ãã¹ã®ãç¥ãã
ææ°ã®äººæ°ã¨ã³ããªã¼ã®é ä¿¡
å¦çãå®è¡ä¸ã§ã
j次ã®ããã¯ãã¼ã¯
kåã®ããã¯ãã¼ã¯
lãã¨ã§èªã
eã³ã¡ã³ãä¸è¦§ãéã
oãã¼ã¸ãéã
{{#tags}}- {{label}}
{{/tags}}