HTTPã®ã¯ã¨ãªãã©ã¡ã¼ã¿ã«ã³ãã³(:)ãæ¸ãã®ã¯ä¸æ£ãªã®ãã
ã®ç¶ãã
PHPã®parse_url()ã¯ã
- "/abc?a=x&time=09:00&x=y" ã¯ãã¼ã¹ã§ããã®ã«ã
- "/abc?a=x&time=09:00" ã ã¨å¤±æããã
ç¸å¯¾URIã§ãåä½ããªããä»æ§ã ãããããã®ã ããããã¯ã¨ããããã³ãã³ã®ãã¼ã»ã³ãã¨ã³ã³ã¼ããå¿ é ãªã®ãæ°ã«ãªã£ãã®ã§èª¿ã¹ãã
URIã®ä»æ§ RFC 3986
ã¾ããåºç¤ã¨ãªã URI ã®ä»æ§ RFC 3986 ãããã
- RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
- Uniform Resource Identifier (URI): 一般的構文 æ¥æ¬èªè¨³
- RFC 1738 - A Gopher URL Format å¤ãURLä»æ§ (Updated by 3986)
- RFC1738 æ¥æ¬èªè¨³
- RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax å¤ãURIä»æ§ (Obsoleted by 3986)
- RFC2396J RFC2396 æ¥æ¬èªè¨³
- 情報処理推進機構:情報セキュリティ:調査・研究報告書:情報セキュリティ技術動向調査(2009 年下期) URIã®ã¨ã¹ã±ã¼ã
- IPAã®è¨äºãé常ã«è©³ããã¦åèã«ãªã£ãã
RFC 3986 ã§ãã¯ã¨ãªã«ä½¿ããæåãå®ç¾©ãã¦ããABNFã¯ä»¥ä¸ã®éããã¯ã¨ãªã¯?ãã#ã¾ãã¯æ«å°¾ã¾ã§ã¨å®ç¾©ããã¦ããã
query = *( pchar / "/" / "?" ) pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded = "%" HEXDIG HEXDIG sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
çµæ§ãããã使ããããã ãããããã®æåãèªç±ã«ä½¿ããã¨ããããã§ã¯ãªãã
ããã¨ã¯å¥ã«2.2ã§äºç´æåã¨ããã®ãå®ç¾©ããã¦ããã
reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
queryã¯ãããã®ã«ãresesrvedã¯ãã³ããã
以ä¸ãreservedã«ã¤ãã¦èª¬æãã¦ããç®æãå¼ç¨ããèªåã®ç解ãã³ã¡ã³ããã(æ¥æ¬èªè¨³ã§ã¯ãªã)ããã³ã³ãã¼ãã³ããã¨ããã®ã¯ãã¹ãã¼ã ããã¹ãã¯ã¨ãªãªã©ã®ãURIãæ§æããé¨åã®ãã¨ã
A component's ABNF syntax rule will not use the reserved or gen-delims rule names directly;
RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
reservedã·ã³ã¿ãã¯ã¹ã¯ã³ã³ãã¼ãã³ãã®ABNFã·ã³ã¿ãã¯ã¹ã§ã¯ç´æ¥ä½¿ç¨ãããªãã
each syntax rule lists the characters allowed within that component (i.e., not delimiting it),
åã·ã³ã¿ãã¯ã¹ã«ã¼ã«ã¯ããã®ã³ã³ãã¼ãã³ãã§è¨±å¯ãããæåããªã¹ãããã
and any of those characters that are also in the reserved set are "reserved" for use as subcomponent delimiters within the component.
ã³ã³ãã¼ãã³ãã§è¨±å¯ããã¦ãã¦reservedã«ãå«ã¾ããæåã¯ããµãã³ã³ãã¼ãã³ãã®ããªãã¿ã¨ãã¦ä½¿ããããäºç´ããã¦ããã
Only the most common subcomponents are defined by this specification;
ãã£ã¨ãå ±éã®ãµãã³ã³ãã¼ãã³ãã ããããã®ä»æ§ã§å®ç¾©ããã
other subcomponents may be defined by a URI scheme's specification, or
ãã以å¤ã®ãµãã³ã³ãã¼ãã³ãã¯URIã¹ãã¼ã ã®ä»æ§ãã
by the implementation-specific syntax of a URI's dereferencing algorithm,
URIåç §è§£æ±ºã®ã¢ã«ã´ãªãºã å®è£ ã®ã·ã³ã¿ãã¯ã¹ã«ãã£ã¦å®ç¾©ãããã ããã
provided that such subcomponents are delimited by characters in the reserved set allowed within that component.
ã³ã³ãã¼ãã³ãã¯ãäºç´æåã§åºåã£ã¦ãµãã³ã³ãã¼ãã³ãã«ã§ããã
è¦ããã«ã
- äºç´æåã¯ã³ã³ãã¼ãã³ãããµãã³ã³ãã¼ãã³ãã«åå²ããããã«ä½¿ãã
- ãµãã³ã³ãã¼ãã³ãã®ä»æ§ã¯åURIã¹ãã¼ã ã®ä»æ§ãã¢ããªã±ã¼ã·ã§ã³ã§æ±ºããã
ããã«ã以ä¸ã®èª¬æããããURIãçµã¿ç«ã¦ãã¢ããªã±ã¼ã·ã§ã³ã«ã¤ãã¦ã
URI producing applications should percent-encode data octets that correspond to characters in the reserved set
RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
URIãçµã¿ç«ã¦ãã¢ããªã±ã¼ã·ã§ã³ã¯ãreservedã®æåããã¼ã»ã³ãã¨ã³ã³ã¼ããã¹ãã
unless these characters are specifically allowed by the URI scheme to represent data in that component.
ã§ããç¹å¥ã«ãURIã¹ãã¼ã ã許å¯ãã¦ããã°ä½¿ã£ã¦ãããã
åºæ¬ãäºç´æåã¯ã¨ã³ã³ã¼ãããªããã°ãªããªãããã ããhttpã¹ãã¼ã ã®ä»æ§ã§ãã¯ã¨ãªã«ã³ãã³ãã¹ã©ãã·ã¥ã使ã£ã¦ãããããã¨ãããªããçã®ã¾ã¾ä½¿ããã
ã¾ããURIããã¼ã¹ããã¢ããªã±ã¼ã·ã§ã³ã«ã¤ãã¦ã
If a reserved character is found in a URI component and no delimiting role is known for that character,
RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
ããªãã¿ã®å½¹å²ãç¥ããã¦ããªãäºç´æåãã³ã³ãã¼ãã³ãã«è¦ã¤ãã£ãå ´åã¯ã
then it must be interpreted as representing the data octet corresponding to that character's encoding in US-ASCII.
ASCIIã®è©²å½æåã¨ãã¦è§£éããªããã°ãªããªãã
ããã ã¨ãPHPã®parse_uri()ãã³ãã³(:)ãçç±ã«ãã¼ã¹ã«å¤±æããããã¡ã ã¨æã(ãããã主張ã¯ããã¦ãªããã©)ã
ã§ã次ã«èªãã®ã¯ http URIã¹ãã¼ã ã®ä»æ§ã ãã¨æã£ã¦èª¿ã¹ããã ãã©ããããªã®ã¯è¦ã¤ãããªãã£ããhttpã¹ãã¼ã ã®ä»æ§ã¯åä½ã§åå¨ããªãã®ï¼
HTML 4.01
èªåã¯ä»ã¦ã§ããµã¼ãã¹ãä½ã£ã¦ããã®ã§ãHTMLã¯ä¸åé¢ä¿ãªãã
é¢ä¿ç¡ããã ãã©ãä»ã«è©²å½ããããªä»æ§ãè¦ã¤ãããªãã®ã§ãåèã«ãªããããªã¨ãããè¦ã¦ã¿ãã
If the method is "get" and the action is an HTTP URI,
Forms in HTML documents
ã¡ã½ãããGETã§actionã®å ãHTTP URIã ã£ããã
the user agent takes the value of action,
actionã®URIã«ã
appends a `?' to it,
?ãã²ã£ã¤ãã¦ã
then appends the form data set, encoded using the "application/x-www-form-urlencoded" content type.
ãã©ã¼ã ã®ãã¼ã¿ãapplication/x-www-form-urlencoded ã§ã¨ã³ã³ã¼ããã¦ãã£ã¤ããã
The user agent then traverses the link to this URI.
ããã§ãã®URIã«ã¢ã¯ã»ã¹ããã
In this scenario, form data are restricted to ASCII codes.
ãã®ã·ããªãªã§æ±ããã®ã¯ASCIIã®ãã¼ã¿ã ãã ãã©ããºã³ã¼
æå¾ã®è¡ã¯ãã¦ãããHTML 4.01 ã® GET ãªã¯ã¨ã¹ãã§ã¯ãURI ã® query ã« x-www-form-urlencoded ã使ãã¨è¨ã£ã¦ãããurlencodedã¨ããååãèããã¨ãå½ããåã®ãããªæ°ããããããããã¯RFC 3986ã® http ã¹ãã¼ã ã«ã¤ãã¦ãè¨ãã話ãªãã ãããã
HTML 4.01ã«ããã x-www-form-urlencoded ã®ä»æ§ã¯ã以ä¸ã®éããRFC 1738ã1994å¹´ã®URLä»æ§ãåç §ãã¦ããã
Control names and values are escaped.
Forms in HTML documents
ååã¨å¤ã¯ã¨ã¹ã±ã¼ãããã
Space characters are replaced by `+',
ã¹ãã¼ã¹ã¯+ã«ã
and then reserved characters are escaped as described in [RFC1738], section 2.2:
äºç´æå㯠RFC1738 2.2 ã«å¾ã£ã¦ã¨ã¹ã±ã¼ããããããªãã¡ã
Non-alphanumeric characters are replaced by `%HH',
è±æ°æå以å¤ã¯%HHã«ç½®æããã (å ¨é¨ï¼)
a percent sign and two hexadecimal digits representing the ASCII code of the character.
%ã¨åå é²æ°ã®ASCIIã³ã¼ãã§äºã ã
Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
æ¹è¡ã¯CRLF %0D%0Aã
The control names/values are listed in the order they appear in the document.
ååã¨å¤ã¯ææ¸ã®é ã«ä¸¦ã¹ãã
The name is separated from the value by `=' and
ååã¨å¤ã¯=ã§åºåãã
name/value pairs are separated from each other by `&'.
åãã¢ã¯&ã§åºåãã
ä¾ã®ãå¤ãã»ãã³ãã³ã§åºåã話ã¯ã¹ã¤ã®ã¨ããã«åºã¦ããã
We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.
Performance, Implementation, and Design Notes
ã§ãã»ãã³ãã³ã§åºåã£ãã x-www-form-urlencoded ã®ä»æ§ã«é©åããªããªãã®ã§ã¯ï¼
RFC 1738 URLä»æ§ (1994 å¤ã)
HTML 4.01ãåç
§ãã¦ãã RFC 1738 ã®2.2ã¯ä½ã¦è¨ã£ã¦ããã
Octets must be encoded
RFC 1738 - A Gopher URL Format
Octetsã¯ä»¥ä¸ã®å ´åã«ã¨ã³ã³ã¼ãããªããã°ãªããªãã
if they have no corresponding graphic character within the US-ASCII coded character set,
ASCIIã®è¡¨ç¤ºå¯è½ãªæåã§ãªãå ´åã
if the use of the corresponding character is unsafe, or
ãã®æåãå®å ¨ã§ãªãå ´åã
if the corresponding character is reserved for some other interpretation within the particular URL scheme.
ç¹å®ã®URLã¹ãã¼ã ã§äºç´ããã¦ãå ´åã
Unsafeã§æãããã¦ãã®ã¯ãã¹ãã¼ã¹ã¨
<>"#%{}|\^~[]`
ã§ã
All unsafe characters must always be encoded within a URL.
RFC 1738 - A Gopher URL Format
unsafeãªæåã¯å¸¸ã«ã¨ã³ã³ã¼ãããªããã°ãªããªãã
Reservedã§æãããã¦ããã®ã¯ã
;/?:@=&
ã§ã
Thus, only alphanumerics,
RFC 1738 - A Gopher URL Format
è±æ°ã
the special characters "$-_.+!*'(),",
éäºç´æå($-_.+!*'(),)ã
and reserved characters used for their reserved purposes
äºç´ãããç®çã§ä½¿ãããäºç´æåã ãã¯ã
may be used unencoded within a URL.
ã¨ã³ã³ã¼ãããã«ä½¿ããã
ã¨ãããã¨ãªã®ã§ããã¼ã¿ã«äºç´æåãå«ã¾ãããªãã¨ã³ã³ã¼ãã¯å¿ é ã ããã¨æãã
RFC 3986 ã®ãã¹ãã¼ã ãäºç´æåã®ä½¿ç¨ã許å¯ã§ãããã¨ãã話ã¯ã RFC 1738ã«ã¯åºã¦ããªãã
application/x-www-from-urlencoded
HTML 4.01 以å¤ã® application/x-www-from-urlencoded ã®ä»æ§ã
ç¬ç«ã㦠application/x-www-form-urlencoded ãè¦å®ããä»æ§æ¸ã¯ã¾ã åå¨ãã¾ããã
application/x-www-form-urlencoded
RFC 1866 (HTML 2.0)以æ¥ãHTML5èæ¡ã¾ã§ä½¿ããç¶ãã¦ããã
http://www.wdic.org/w/WDIC/application/x-www-form-urlencoded
ãã©ãã¯ããã¯pingã§ãããã®Content-Typeåã使ç¨ããã
ããããx-ã¨ããåé¡ãããããã®æ¹åã®ãããapplication/www-form-urlencodedãIANAã«ç»é²ããææ¡ã¯ä»¥åãããªããã¦ããããHTML5ã®ããã«åã³èæ¡ã復活ãã(I-D[hoehrmann-urlencoded-01] [å¤é¨ãªã³ã¯] )ã
application/www-form-urlencodedã®ãã©ããä»æ§ã§ã¯ã8ãããã§ããã符å·ã¯UTF-8ã«åºå®ããã®ããcharsetãã©ã¡ã¼ã¿ã¼ã¯ä¸æ£ã§ããã¨ããã
2011/03ã®ãã©ãããè¦ã¦ã¿ããã©ã
URIã®queryã³ã³ãã¼ãã³ãã§ä½¿ããããã«ã¯è¦ããªããå ¨ç¶ã¨ã¹ã±ã¼ãã足ãã¦ãªãã
ããã¾ã§èª¿ã¹ããã¨ã®ã¾ã¨ãã
äºç´æå(:ã¨ã/ã¨ã)ã¯åºæ¬çã«ã¨ã³ã³ã¼ããã¹ããã®ã
- RFC 1738 ã¯ãäºç´æåã常ã«ã¨ã³ã³ã¼ãããã
- RFC 3986 ã¯ãã¹ãã¼ã ãç¹å¥ã«è¨±å¯ãããªãçã®äºç´æåããã¼ã¿è¡¨ç¾ã«ä½¿ã£ã¦ããã
HTML 4.01ã¯RFC 1738ãåç §ãã¦ããã®ã§å¸¸ã«ã¨ã³ã³ã¼ãããã
RFC 3986 ãæ¡ç¨ããå ´åã http ã¹ãã¼ã ãã¯ã¨ãªãã©ã®ããã«å®ç¾©ãã¦ãããã¯ä¸æã
ã§ããã¼
httpã®ã¯ã¨ãªã«çã®ã³ãã³ãã¹ã©ãã·ã¥ãå«ã¾ãããã¨ã§ãã©ããªå®³ãããã®ããããããªãã
URIã®å¯èªæ§ãèããããå°ãªãã¨ãhttpã¹ãã¼ã ã«ã¤ãã¦ã¯ããã£ã¨ç·©ãã¦ãããããã«æãããRFC 3986 ãªããããå¯è½ãªã®ã ãã
ã¡ãªã¿ã«ãGoogleã¯ã³ãã³(:)ãã¨ã³ã³ã¼ãããªãå¦çãå ¥ãã¦ããããã ã£ããGoogleã§a:bã¨æ¤ç´¢ããã¨ããã©ã¦ã¶ã®URLæ¬ã«ã¯q=a:bã¨åºããç»åæ¤ç´¢ã ã¨a%3Abã«ãªãã
ã¨ã¹ã±ã¼ãããæåä¸è¦§
å
¨ã¦ã®ASCIIè¨å·ãããã¨ã¹ã±ã¼ãããæåã ã表示ããã¹ã¯ãªããã
#! /usr/bin/env ruby # -*- coding: utf-8 -*- ascii = [] # 表示å¯è½ãªASCIIæåå ¨é¨ã空ç½(32)ã¯ç¡ãã (33..126).each do |i| ascii << i.chr end puts "* ã¨ã³ã³ã¼ã対象ã®æåä¸è¦§" puts # è¨å·ã ãæ®ãã ascii.reject!{|c| c =~ /[a-zA-Z0-9]/ } puts " all: " + ascii.join # RFC3986 éäºç´æå unreserved = %q{-._~} # RFC2396 éäºç´æå unrsvd2396 = unreserved + %q{!*'()} #' # RFC1738 éäºç´æå unrsvd1738 = %q{-._!*'()$,+} #' # RFC3936 ã¯ã¨ãªæå %ã¯ã¨ã³ã³ã¼ãå½¢å¼ã§ãã使ããªãã®ã§é¤å¤ã query = %q{/?:@-._~!$&'()*+,;=} #' # ECMAScript encodeURI() encodeuri = %q{-._~:/?#@!$&'()*+,;=} #' # éã»éäºç´æå puts " RFC3986: " + ascii.map {|c| unreserved.index(c).nil? ? c : ' ' }.join # RFC2396ã®éã»éäºç´æåãECMAScript encodeURIComponent()ã¯ãããã¨ã¹ã±ã¼ãããã puts " RFC2396: " + ascii.map {|c| unrsvd2396.index(c).nil? ? c : ' ' }.join # RFC1738ã®éã»éäºç´æåã puts " RFC1738: " + ascii.map {|c| unrsvd1738.index(c).nil? ? c : ' ' }.join # ECMA encodeURI() puts "ECMA encodeURI: " + ascii.map {|c| encodeuri.index(c).nil? ? c : ' ' }.join # ã¯ã¨ãªã§ä½¿ããªãæå puts " not query: " + ascii.map {|c| query.index(c).nil? ? c : ' ' }.join # Ruby URI::UNSAFE /[^-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]/n ãã rubysafe = %q{-_.!~*'();/?:@&=+$,[]} #' puts "rubyURI.escape: " + ascii.map {|c| rubysafe.index(c).nil? ? c : ' ' }.join
çµæã¯ã
* ã¨ã³ã³ã¼ã対象ã®æåä¸è¦§ all: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ RFC3986: !"#$%&'()*+, /:;<=>?@[\]^ `{|} RFC2396: "#$%& +, /:;<=>?@[\]^ `{|} RFC1738: "# %& /:;<=>?@[\]^ `{|}~ ECMA encodeURI: " % < > [\]^ `{|} not query: "# % < > [\]^ `{|} rubyURI.escape: "# % < > \ ^ `{|}
ä¸ããé ã«ã
- å ¨ASCIIè¨å·
- RFC3986 ã§ã¹ãã¼ã ã§è¨±å¯ããã¦ããªããã°ã¨ã³ã³ã¼ããã¹ãæåãunreservedãªæå以å¤ãPHP rawurlencode()ãå®è£ ã
- RFC2396 ã¯Obsoleteãåèã¾ã§ãECMAScript encodeURIComponent()ãå®è£ ã
- RFC1738 ã§ã¨ã³ã³ã¼ããã¹ãæåã
- ECMAScriptã®encodeURI()ãã¨ã³ã³ã¼ãããæåã
- RFC3986 ã®queryã§ä½¿ããªãæåããããã¯ã¯ã¨ãªã§ã¨ã³ã³ã¼ãå¿ é ã
- Rubyã®URI.escape()ãããã©ã«ãã§ã¨ã³ã³ã¼ãããæåãURI::UNSAFEã
Rubyã®URI.escape()ã¯ECMAScriptã®encodeURI()ã¨åãããURIãã¾ããã¨ã¨ã³ã³ã¼ããããããã®ã ã¨æããã ãã©ãã[]ãã¨ã³ã³ã¼ããã¦ããªããã©ãããåºã¦ããä»æ§ï¼æåã»ãããèªç±ã«è¨å®ã§ããã®ã¯ããããããã©ã«ãã®ä½¿ãéã¯ãªãããã