HTTPã®ã‚¯ã‚¨ãƒªãƒ‘ãƒ©ãƒ¡ãƒ¼ã‚¿ã«ã‚³ãƒãƒ³(:)ã‚’æ›¸ãã®ã¯ä¸æ£ãªã®ã‹ã€‚

PHP の $_SERVER['REQUEST_URI'] と parse_url() の予想外な動作について。 - こせきの技術日記

ã®ç¶šãã€‚

PHPã®parse_url()ã¯ã€

"/abc?a=x&time=09:00&x=y" ã¯ãƒ‘ãƒ¼ã‚¹ã§ãã‚‹ã®ã«ã€
"/abc?a=x&time=09:00" ã ã¨å¤±æ•—ã™ã‚‹ã€‚

ç›¸å¯¾URIã§ã€Œå‹•ä½œã—ãªã„ã€ä»•æ§˜ã ã‹ã‚‰ã‚‰ã—ã„ã®ã ãŒã€ãã‚Œã¯ã¨ã‚‚ã‹ãã€ã‚³ãƒãƒ³ã®ãƒ‘ãƒ¼ã‚»ãƒ³ãƒˆã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ãŒå¿…é ˆãªã®ã‹æ°—ã«ãªã£ãŸã®ã§èª¿ã¹ãŸã€‚

URIã®ä»•æ§˜ RFC 3986

ã¾ãšã€åŸºç¤Žã¨ãªã‚‹ URI ã®ä»•æ§˜ RFC 3986 ãŒã‚ã‚‹ã€‚

RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
Uniform Resource Identifier (URI): 一般的構文 æ—¥æœ¬èªžè¨³
- RFC 1738 - A Gopher URL Format å¤ã„URLä»•æ§˜ (Updated by 3986)
- RFC1738 æ—¥æœ¬èªžè¨³
- RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax å¤ã„URIä»•æ§˜ (Obsoleted by 3986)
- RFC2396J RFC2396 æ—¥æœ¬èªžè¨³
情報処理推進機構：情報セキュリティ：調査・研究報告書：情報セキュリティ技術動向調査（2009 年下期） URIã®ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—
- IPAã®è¨˜äº‹ã€‚éžå¸¸ã«è©³ã—ãã¦å‚è€ƒã«ãªã£ãŸã€‚

RFC 3986 ã§ã€ã‚¯ã‚¨ãƒªã«ä½¿ãˆã‚‹æ–‡å—ã‚’å®šç¾©ã—ã¦ã„ã‚‹ABNFã¯ä»¥ä¸‹ã®é€šã‚Šã€‚ã‚¯ã‚¨ãƒªã¯?ã‹ã‚‰#ã¾ãŸã¯æœ«å°¾ã¾ã§ã¨å®šç¾©ã•ã‚Œã¦ã„ã‚‹ã€‚

query         = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

çµæ§‹ã„ã‚ã„ã‚ä½¿ãˆã‚‹ã€‚ãŸã ã—ã€ã“ã‚Œã‚‰ã®æ–‡å—ã‚’è‡ªç”±ã«ä½¿ãˆã‚‹ã¨ã„ã†ã‚ã‘ã§ã¯ãªã„ã€‚

ã“ã‚Œã¨ã¯åˆ¥ã«2.2ã§äºˆç´„æ–‡å—ã¨ã„ã†ã®ãŒå®šç¾©ã•ã‚Œã¦ã„ã‚‹ã€‚

reserved    = gen-delims / sub-delims
gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
            / "*" / "+" / "," / ";" / "="

queryã¯ã‚†ã‚‹ã„ã®ã«ã€resesrvedã¯ãã³ã—ã„ã€‚

ä»¥ä¸‹ã€reservedã«ã¤ã„ã¦èª¬æ˜Žã—ã¦ã„ã‚‹ç®‡æ‰€ã‚’å¼•ç”¨ã—ã€è‡ªåˆ†ã®ç†è§£ã‚’ã‚³ãƒ¡ãƒ³ãƒˆã™ã‚‹(æ—¥æœ¬èªžè¨³ã§ã¯ãªã„)ã€‚ã€Œã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã€ã¨ã„ã†ã®ã¯ã€ã‚¹ã‚ãƒ¼ãƒ ã€ãƒ‘ã‚¹ã€ã‚¯ã‚¨ãƒªãªã©ã®ã€URIã‚’æ§‹æˆã™ã‚‹éƒ¨å“ã®ã“ã¨ã€‚

A component's ABNF syntax rule will not use the reserved or gen-delims rule names directly;
reservedã‚·ãƒ³ã‚¿ãƒƒã‚¯ã‚¹ã¯ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã®ABNFã‚·ãƒ³ã‚¿ãƒƒã‚¯ã‚¹ã§ã¯ç›´æŽ¥ä½¿ç”¨ã•ã‚Œãªã„ã€‚

each syntax rule lists the characters allowed within that component (i.e., not delimiting it),
å„ã‚·ãƒ³ã‚¿ãƒƒã‚¯ã‚¹ãƒ«ãƒ¼ãƒ«ã¯ã€ãã®ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã§è¨±å¯ã•ã‚ŒãŸæ–‡å—ã‚’ãƒªã‚¹ãƒˆã™ã‚‹ã€‚

and any of those characters that are also in the reserved set are "reserved" for use as subcomponent delimiters within the component.
ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã§è¨±å¯ã•ã‚Œã¦ã„ã¦reservedã«ã‚‚å«ã¾ã‚Œã‚‹æ–‡å—ã¯ã€ã‚µãƒ–ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã®ãƒ‡ãƒªãƒŸã‚¿ã¨ã—ã¦ä½¿ã†ãŸã‚ã€äºˆç´„ã•ã‚Œã¦ã„ã‚‹ã€‚

Only the most common subcomponents are defined by this specification;
ã‚‚ã£ã¨ã‚‚å…±é€šã®ã‚µãƒ–ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã ã‘ã‚’ã€ã“ã®ä»•æ§˜ã§å®šç¾©ã™ã‚‹ã€‚

other subcomponents may be defined by a URI scheme's specification, or
ãã‚Œä»¥å¤–ã®ã‚µãƒ–ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã¯URIã‚¹ã‚ãƒ¼ãƒ ã®ä»•æ§˜ã‚„ã€

by the implementation-specific syntax of a URI's dereferencing algorithm,
URIå‚ç…§è§£æ±ºã®ã‚¢ãƒ«ã‚´ãƒªã‚ºãƒ å®Ÿè£…ã®ã‚·ãƒ³ã‚¿ãƒƒã‚¯ã‚¹ã«ã‚ˆã£ã¦å®šç¾©ã•ã‚Œã‚‹ã ã‚ã†ã€‚

provided that such subcomponents are delimited by characters in the reserved set allowed within that component.
ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã¯ã€äºˆç´„æ–‡å—ã§åŒºåˆ‡ã£ã¦ã‚µãƒ–ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã«ã§ãã‚‹ã€‚
RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

è¦ã™ã‚‹ã«ã€

äºˆç´„æ–‡å—ã¯ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã‚’ã‚µãƒ–ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã«åˆ†å‰²ã™ã‚‹ãŸã‚ã«ä½¿ã†ã€‚
ã‚µãƒ–ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã®ä»•æ§˜ã¯å„URIã‚¹ã‚ãƒ¼ãƒ ã®ä»•æ§˜ã‚„ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã§æ±ºã‚ã‚‹ã€‚

ã•ã‚‰ã«ã€ä»¥ä¸‹ã®èª¬æ˜ŽãŒã‚ã‚‹ã€‚URIã‚’çµ„ã¿ç«‹ã¦ã‚‹ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã«ã¤ã„ã¦ã€‚

URI producing applications should percent-encode data octets that correspond to characters in the reserved set
URIã‚’çµ„ã¿ç«‹ã¦ã‚‹ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã¯ã€reservedã®æ–‡å—ã‚’ãƒ‘ãƒ¼ã‚»ãƒ³ãƒˆã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã™ã¹ãã€‚

unless these characters are specifically allowed by the URI scheme to represent data in that component.
ã§ã‚‚ã€ç‰¹åˆ¥ã«ã€URIã‚¹ã‚ãƒ¼ãƒ ãŒè¨±å¯ã—ã¦ã„ã‚Œã°ä½¿ã£ã¦ã‚‚ã‚ˆã„ã€‚
RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

åŸºæœ¬ã€äºˆç´„æ–‡å—ã¯ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã—ãªã‘ã‚Œã°ãªã‚‰ãªã„ã€‚ãŸã ã—ã€httpã‚¹ã‚ãƒ¼ãƒ ã®ä»•æ§˜ã§ã€ã‚¯ã‚¨ãƒªã«ã‚³ãƒãƒ³ã‚„ã‚¹ãƒ©ãƒƒã‚·ãƒ¥ã‚’ä½¿ã£ã¦ã‚‚ã„ã„ã‚ˆã€ã¨ã„ã†ãªã‚‰ã€ç”Ÿã®ã¾ã¾ä½¿ãˆã‚‹ã€‚

ã¾ãŸã€URIã‚’ãƒ‘ãƒ¼ã‚¹ã™ã‚‹ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã«ã¤ã„ã¦ã€‚

If a reserved character is found in a URI component and no delimiting role is known for that character,
ãƒ‡ãƒªãƒŸã‚¿ã®å½¹å‰²ãŒçŸ¥ã‚‰ã‚Œã¦ã„ãªã„äºˆç´„æ–‡å—ãŒã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã«è¦‹ã¤ã‹ã£ãŸå ´åˆã¯ã€

then it must be interpreted as representing the data octet corresponding to that character's encoding in US-ASCII.
ASCIIã®è©²å½“æ–‡å—ã¨ã—ã¦è§£é‡ˆã—ãªã‘ã‚Œã°ãªã‚‰ãªã„ã€‚
RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

ã“ã‚Œã ã¨ã€PHPã®parse_uri()ãŒã‚³ãƒãƒ³(:)ã‚’ç†ç”±ã«ãƒ‘ãƒ¼ã‚¹ã«å¤±æ•—ã—ãŸã‚‰ãƒ€ãƒ¡ã ã¨æ€ã†(ãã†ã„ã†ä¸»å¼µã¯ã•ã‚Œã¦ãªã„ã‘ã©)ã€‚

ã§ã€æ¬¡ã«èªã‚€ã®ã¯ http URIã‚¹ã‚ãƒ¼ãƒ ã®ä»•æ§˜ã ã€ã¨æ€ã£ã¦èª¿ã¹ãŸã‚“ã ã‘ã©ã€ãã‚“ãªã®ã¯è¦‹ã¤ã‹ã‚‰ãªã‹ã£ãŸã€‚httpã‚¹ã‚ãƒ¼ãƒ ã®ä»•æ§˜ã¯å˜ä½“ã§å˜åœ¨ã—ãªã„ã®ï¼Ÿ

HTML 4.01

è‡ªåˆ†ã¯ä»Šã‚¦ã‚§ãƒ–ã‚µãƒ¼ãƒ“ã‚¹ã‚’ä½œã£ã¦ã„ã‚‹ã®ã§ã€HTMLã¯ä¸€åˆ‡é–¢ä¿‚ãªã„ã€‚

é–¢ä¿‚ç„¡ã„ã‚“ã ã‘ã©ã€ä»–ã«è©²å½“ã—ãã†ãªä»•æ§˜ãŒè¦‹ã¤ã‹ã‚‰ãªã„ã®ã§ã€å‚è€ƒã«ãªã‚Šãã†ãªã¨ã“ã‚ã‚’è¦‹ã¦ã¿ã‚‹ã€‚

If the method is "get" and the action is an HTTP URI,
ãƒ¡ã‚½ãƒƒãƒ‰ãŒGETã§actionã®å…ˆãŒHTTP URIã ã£ãŸã‚‰ã€

the user agent takes the value of action,
actionã®URIã«ã€

appends a `?' to it,
?ã‚’ã²ã£ã¤ã‘ã¦ã€

then appends the form data set, encoded using the "application/x-www-form-urlencoded" content type.
ãƒ•ã‚©ãƒ¼ãƒ ã®ãƒ‡ãƒ¼ã‚¿ã‚’application/x-www-form-urlencoded ã§ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã—ã¦ãã£ã¤ã‘ã‚‹ã€‚

The user agent then traverses the link to this URI.
ãã‚“ã§ãã®URIã«ã‚¢ã‚¯ã‚»ã‚¹ã—ã‚ã€‚

In this scenario, form data are restricted to ASCII codes.
ã“ã®ã‚·ãƒŠãƒªã‚ªã§æ‰±ãˆã‚‹ã®ã¯ASCIIã®ãƒ‡ãƒ¼ã‚¿ã ã‘ã ã‘ã©ã€‚ã‚ºã‚³ãƒ¼
Forms in HTML documents

æœ€å¾Œã®è¡Œã¯ã•ã¦ãŠãã€‚HTML 4.01 ã® GET ãƒªã‚¯ã‚¨ã‚¹ãƒˆã§ã¯ã€URI ã® query ã« x-www-form-urlencoded ã‚’ä½¿ãˆã¨è¨€ã£ã¦ã„ã‚‹ã€‚urlencodedã¨ã„ã†åå‰ã‚’è€ƒãˆã‚‹ã¨ã€å½“ãŸã‚Šå‰ã®ã‚ˆã†ãªæ°—ã‚‚ã™ã‚‹ãŒã€ã€ã“ã‚Œã¯RFC 3986ã® http ã‚¹ã‚ãƒ¼ãƒ ã«ã¤ã„ã¦ã‚‚è¨€ãˆã‚‹è©±ãªã‚“ã ã‚ã†ã‹ã€‚

HTML 4.01ã«ãŠã‘ã‚‹ x-www-form-urlencoded ã®ä»•æ§˜ã¯ã€ä»¥ä¸‹ã®é€šã‚Šã€‚RFC 1738ã€1994å¹´ã®URLä»•æ§˜ã‚’å‚ç…§ã—ã¦ã„ã‚‹ã€‚

Control names and values are escaped.
åå‰ã¨å€¤ã¯ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—ã™ã‚‹ã€‚

Space characters are replaced by `+',
ã‚¹ãƒšãƒ¼ã‚¹ã¯+ã«ã€‚

and then reserved characters are escaped as described in [RFC1738], section 2.2:
äºˆç´„æ–‡å—ã¯ RFC1738 2.2 ã«å¾“ã£ã¦ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—ã™ã‚‹ã€‚ã™ãªã‚ã¡ã€

Non-alphanumeric characters are replaced by `%HH',
è‹±æ•°æ–‡å—ä»¥å¤–ã¯%HHã«ç½®æ›ã™ã‚‹ã€‚ (å…¨éƒ¨ï¼Ÿ)

a percent sign and two hexadecimal digits representing the ASCII code of the character.
%ã¨åå…é€²æ•°ã®ASCIIã‚³ãƒ¼ãƒ‰ã§äº‘ã€…ã€‚

Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
æ”¹è¡Œã¯CRLF %0D%0Aã€‚

The control names/values are listed in the order they appear in the document.
åå‰ã¨å€¤ã¯æ–‡æ›¸ã®é †ã«ä¸¦ã¹ã‚‹ã€‚

The name is separated from the value by `=' and
åå‰ã¨å€¤ã¯=ã§åŒºåˆ‡ã‚Šã€

name/value pairs are separated from each other by `&'.
å„ãƒšã‚¢ã¯&ã§åŒºåˆ‡ã‚‹ã€‚
Forms in HTML documents

ä¾‹ã®ã€å€¤ã‚’ã‚»ãƒŸã‚³ãƒãƒ³ã§åŒºåˆ‡ã‚‹è©±ã¯ã¹ã¤ã®ã¨ã“ã‚ã«å‡ºã¦ãã‚‹ã€‚

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.
Performance, Implementation, and Design Notes

ã§ã‚‚ã‚»ãƒŸã‚³ãƒãƒ³ã§åŒºåˆ‡ã£ãŸã‚‰ x-www-form-urlencoded ã®ä»•æ§˜ã«é©åˆã—ãªããªã‚‹ã®ã§ã¯ï¼Ÿ

RFC 1738 URLä»•æ§˜ (1994 å¤ã„)

HTML 4.01ãŒå‚ç…§ã—ã¦ã„ã‚‹ RFC 1738 ã®2.2ã¯ä½•ã¦è¨€ã£ã¦ã‚‹ã‹ã€‚

Octets must be encoded
Octetsã¯ä»¥ä¸‹ã®å ´åˆã«ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã—ãªã‘ã‚Œã°ãªã‚‰ãªã„ã€‚

if they have no corresponding graphic character within the US-ASCII coded character set,
ASCIIã®è¡¨ç¤ºå¯èƒ½ãªæ–‡å—ã§ãªã„å ´åˆã€‚

if the use of the corresponding character is unsafe, or
ãã®æ–‡å—ãŒå®‰å…¨ã§ãªã„å ´åˆã€‚

if the corresponding character is reserved for some other interpretation within the particular URL scheme.
ç‰¹å®šã®URLã‚¹ã‚ãƒ¼ãƒ ã§äºˆç´„ã•ã‚Œã¦ã‚‹å ´åˆã€‚
RFC 1738 - A Gopher URL Format

Unsafeã§æŒ™ã’ã‚‰ã‚Œã¦ã‚‹ã®ã¯ã€ã‚¹ãƒšãƒ¼ã‚¹ã¨

<>"#%{}|\^~[]`

ã§ã€

All unsafe characters must always be encoded within a URL.
unsafeãªæ–‡å—ã¯å¸¸ã«ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã—ãªã‘ã‚Œã°ãªã‚‰ãªã„ã€‚
RFC 1738 - A Gopher URL Format

Reservedã§æŒ™ã’ã‚‰ã‚Œã¦ã„ã‚‹ã®ã¯ã€

;/?:@=&

ã§ã€

Thus, only alphanumerics,
è‹±æ•°ã€

the special characters "$-_.+!*'(),",
éžäºˆç´„æ–‡å—($-_.+!*'(),)ã€

and reserved characters used for their reserved purposes
äºˆç´„ã•ã‚ŒãŸç›®çš„ã§ä½¿ã‚ã‚Œã‚‹äºˆç´„æ–‡å—ã ã‘ã¯ã€

may be used unencoded within a URL.
ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã›ãšã«ä½¿ãˆã‚‹ã€‚
RFC 1738 - A Gopher URL Format

ã¨ã„ã†ã“ã¨ãªã®ã§ã€ãƒ‡ãƒ¼ã‚¿ã«äºˆç´„æ–‡å—ãŒå«ã¾ã‚Œã‚‹ãªã‚‰ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã¯å¿…é ˆã ã‚ã†ã¨æ€ã†ã€‚

RFC 3986 ã®ã€Œã‚¹ã‚ãƒ¼ãƒ ãŒäºˆç´„æ–‡å—ã®ä½¿ç”¨ã‚’è¨±å¯ã§ãã‚‹ã€ã¨ã„ã†è©±ã¯ã€ RFC 1738ã«ã¯å‡ºã¦ã“ãªã„ã€‚

application/x-www-from-urlencoded

HTML 4.01 ä»¥å¤–ã® application/x-www-from-urlencoded ã®ä»•æ§˜ã€‚

ç‹¬ç«‹ã—ã¦ application/x-www-form-urlencoded ã‚’è¦å®šã™ã‚‹ä»•æ§˜æ›¸ã¯ã¾ã å˜åœ¨ã—ã¾ã›ã‚“ã€‚
application/x-www-form-urlencoded

application/x-www-form-urlencoded ‐ 通信用語の基礎知識

RFC 1866 (HTML 2.0)ä»¥æ¥ã€HTML5è‰æ¡ˆã¾ã§ä½¿ã‚ã‚Œç¶šã‘ã¦ããŸã€‚
ãƒˆãƒ©ãƒƒã‚¯ãƒãƒƒã‚¯pingã§ã‚‚ã€ã“ã®Content-Typeåã‚’ä½¿ç”¨ã™ã‚‹ã€‚
ã—ã‹ã—ã€x-ã¨ã„ã†å•é¡ŒãŒã‚ã‚‹ã€‚ã“ã®æ”¹å–„ã®ãŸã‚ã€application/www-form-urlencodedã‚’IANAã«ç™»éŒ²ã™ã‚‹ææ¡ˆã¯ä»¥å‰ã‹ã‚‰ãªã•ã‚Œã¦ã„ãŸãŒã€HTML5ã®ãŸã‚ã«å†ã³è‰æ¡ˆãŒå¾©æ´»ã—ãŸ(I-D[hoehrmann-urlencoded-01] [å¤–éƒ¨ãƒªãƒ³ã‚¯] )ã€‚
application/www-form-urlencodedã®ãƒ‰ãƒ©ãƒ•ãƒˆä»•æ§˜ã§ã¯ã€8ãƒ“ãƒƒãƒˆã§ã‚ã‚Šã€ç¬¦å·ã¯UTF-8ã«å›ºå®šã€‚ã“ã®ãŸã‚charsetãƒ‘ãƒ©ãƒ¡ãƒ¼ã‚¿ãƒ¼ã¯ä¸æ£ã§ã‚ã‚‹ã¨ã™ã‚‹ã€‚
http://www.wdic.org/w/WDIC/application/x-www-form-urlencoded

draft-hoehrmann-urlencoded-01 - The application/www-form-urlencoded format

URIã®queryã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã§ä½¿ãˆã‚‹ã‚ˆã†ã«ã¯è¦‹ãˆãªã„ã€‚å…¨ç„¶ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—ãŒè¶³ã‚Šã¦ãªã„ã€‚

ã“ã“ã¾ã§èª¿ã¹ãŸã“ã¨ã®ã¾ã¨ã‚ã€‚

äºˆç´„æ–‡å—(:ã¨ã‹/ã¨ã‹)ã¯åŸºæœ¬çš„ã«ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã™ã¹ãã‚‚ã®ã€‚

RFC 1738 ã¯ã€äºˆç´„æ–‡å—ã‚’å¸¸ã«ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã™ã‚‹ã€‚
RFC 3986 ã¯ã€ã‚¹ã‚ãƒ¼ãƒ ãŒç‰¹åˆ¥ã«è¨±å¯ã™ã‚‹ãªã‚‰ç”Ÿã®äºˆç´„æ–‡å—ã‚’ãƒ‡ãƒ¼ã‚¿è¡¨ç¾ã«ä½¿ã£ã¦ã‚ˆã„ã€‚

HTML 4.01ã¯RFC 1738ã‚’å‚ç…§ã—ã¦ã„ã‚‹ã®ã§å¸¸ã«ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã™ã‚‹ã€‚

ã§ã‚‚ã•ãƒ¼

URIã®å¯èªæ€§ã‚’è€ƒãˆãŸã‚‰ã€å°‘ãªãã¨ã‚‚httpã‚¹ã‚ãƒ¼ãƒ ã«ã¤ã„ã¦ã¯ã€ã‚‚ã£ã¨ç·©ã‚ã¦ã‚‚ã„ã„ã‚ˆã†ã«æ€ãˆã‚‹ã€‚RFC 3986 ãªã‚‰ãã‚ŒãŒå¯èƒ½ãªã®ã ã—ã€‚

ã¡ãªã¿ã«ã€Googleã¯ã‚³ãƒãƒ³(:)ã‚’ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã—ãªã„å‡¦ç†ã‚’å…¥ã‚Œã¦ã„ã‚‹ã‚ˆã†ã ã£ãŸã€‚Googleã§a:bã¨æ¤œç´¢ã™ã‚‹ã¨ã€ãƒ–ãƒ©ã‚¦ã‚¶ã®URLæ¬„ã«ã¯q=a:bã¨å‡ºã‚‹ã€‚ç”»åƒæ¤œç´¢ã ã¨a%3Abã«ãªã‚‹ã€‚

ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—ã™ã‚‹æ–‡å—ä¸€è¦§

å…¨ã¦ã®ASCIIè¨˜å·ã‹ã‚‰ã€ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—ã™ã‚‹æ–‡å—ã ã‘è¡¨ç¤ºã™ã‚‹ã‚¹ã‚¯ãƒªãƒ—ãƒˆã€‚

#! /usr/bin/env ruby
# -*- coding: utf-8 -*-

ascii = []

# è¡¨ç¤ºå¯èƒ½ãªASCIIæ–‡å—å…¨éƒ¨ã€‚ç©ºç™½(32)ã¯ç„¡ã—ã€‚
(33..126).each do |i|
  ascii << i.chr
end

puts "* ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰å¯¾è±¡ã®æ–‡å—ä¸€è¦§"
puts

# è¨˜å·ã ã‘æ®‹ã™ã€‚
ascii.reject!{|c| c =~ /[a-zA-Z0-9]/ }
puts "           all: " +  ascii.join

# RFC3986 éžäºˆç´„æ–‡å—
unreserved = %q{-._~}

# RFC2396 éžäºˆç´„æ–‡å—
unrsvd2396 = unreserved + %q{!*'()} #'

# RFC1738 éžäºˆç´„æ–‡å—
unrsvd1738 = %q{-._!*'()$,+} #'

# RFC3936 ã‚¯ã‚¨ãƒªæ–‡å— %ã¯ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰å½¢å¼ã§ã—ã‹ä½¿ãˆãªã„ã®ã§é™¤å¤–ã€‚
query    = %q{/?:@-._~!$&'()*+,;=} #'

# ECMAScript encodeURI()
encodeuri = %q{-._~:/?#@!$&'()*+,;=} #'

# éžãƒ»éžäºˆç´„æ–‡å—
puts "       RFC3986: " + ascii.map {|c| unreserved.index(c).nil? ? c : ' ' }.join

# RFC2396ã®éžãƒ»éžäºˆç´„æ–‡å—ã€‚ECMAScript encodeURIComponent()ã¯ã“ã‚Œã‚’ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—ã™ã‚‹ã€‚
puts "       RFC2396: " + ascii.map {|c| unrsvd2396.index(c).nil? ? c : ' ' }.join

# RFC1738ã®éžãƒ»éžäºˆç´„æ–‡å—ã€‚
puts "       RFC1738: " + ascii.map {|c| unrsvd1738.index(c).nil? ? c : ' ' }.join

# ECMA encodeURI()
puts "ECMA encodeURI: " + ascii.map {|c| encodeuri.index(c).nil? ? c : ' ' }.join

# ã‚¯ã‚¨ãƒªã§ä½¿ãˆãªã„æ–‡å—
puts "     not query: " + ascii.map {|c| query.index(c).nil? ? c : ' ' }.join

# Ruby URI::UNSAFE /[^-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]/n ã‚ˆã‚Š
rubysafe = %q{-_.!~*'();/?:@&=+$,[]} #'
puts "rubyURI.escape: " + ascii.map {|c| rubysafe.index(c).nil? ? c : ' ' }.join

çµæžœã¯ã€

* ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰å¯¾è±¡ã®æ–‡å—ä¸€è¦§

           all: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
       RFC3986: !"#$%&'()*+,  /:;<=>?@[\]^ `{|}
       RFC2396:  "#$%&    +,  /:;<=>?@[\]^ `{|}
       RFC1738:  "# %&        /:;<=>?@[\]^ `{|}~
ECMA encodeURI:  "  %            < >  [\]^ `{|}
     not query:  "# %            < >  [\]^ `{|}
rubyURI.escape:  "# %            < >   \ ^ `{|}

ä¸Šã‹ã‚‰é †ã«ã€

å…¨ASCIIè¨˜å·
RFC3986 ã§ã‚¹ã‚ãƒ¼ãƒ ã§è¨±å¯ã•ã‚Œã¦ã„ãªã‘ã‚Œã°ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã™ã¹ãæ–‡å—ã€‚unreservedãªæ–‡å—ä»¥å¤–ã€‚PHP rawurlencode()ãŒå®Ÿè£…ã€‚
RFC2396 ã¯Obsoleteã€‚å‚è€ƒã¾ã§ã€‚ECMAScript encodeURIComponent()ãŒå®Ÿè£…ã€‚
RFC1738 ã§ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã™ã¹ãæ–‡å—ã€‚
ECMAScriptã®encodeURI()ãŒã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã™ã‚‹æ–‡å—ã€‚
RFC3986 ã®queryã§ä½¿ãˆãªã„æ–‡å—ã€‚ã“ã‚Œã‚‰ã¯ã‚¯ã‚¨ãƒªã§ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰å¿…é ˆã€‚

Rubyã®URI.escape()ã¯ECMAScriptã®encodeURI()ã¨åŒã˜ãã€URIã‚’ã¾ã‚‹ã”ã¨ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã™ã‚‹ãŸã‚ã‚‚ã®ã ã¨æ€ã†ã‚“ã ã‘ã©ã€ã€[]ã‚’ã‚¨ãƒ³ã‚³ãƒ¼ãƒ‰ã—ã¦ã„ãªã„ã€‚ã©ã“ã‹ã‚‰å‡ºã¦ããŸä»•æ§˜ï¼Ÿæ–‡å—ã‚»ãƒƒãƒˆã‚’è‡ªç”±ã«è¨å®šã§ãã‚‹ã®ã¯ã„ã„ãŒã€ãƒ‡ãƒ•ã‚©ãƒ«ãƒˆã®ä½¿ã„é“ã¯ãªã•ãã†ã€‚

URIã®ä»•æ§˜ RFC 3986

HTML 4.01

RFC 1738 URLä»•æ§˜ (1994 å¤ã„)

application/x-www-from-urlencoded

ã“ã“ã¾ã§èª¿ã¹ãŸã“ã¨ã®ã¾ã¨ã‚ã€‚

ã§ã‚‚ã•ãƒ¼

ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—ã™ã‚‹æ–‡å­—ä¸€è¦§

URIã®ä»•æ§˜ RFC 3986

RFC 1738 URLä»•æ§˜ (1994 å¤ã„)

ã“ã“ã¾ã§èª¿ã¹ãŸã“ã¨ã®ã¾ã¨ã‚ã€‚

ã§ã‚‚ã•ãƒ¼

ã‚¨ã‚¹ã‚±ãƒ¼ãƒ—ã™ã‚‹æ–‡å—ä¸€è¦§