ãã¡ã¤ã«ã®ãã¦ã³ãã¼ã
é£çªãéãã¹ã¯ãªãããæ¸ããã®ã§ããã£ããã ããéããurlãããã¡ã¤ã«ããã¦ã³ãã¼ãããã¹ã¯ãªãããã¤ãã£ã¦ã¿ãããã¨æã£ãã®ã ããã©å·§ããããªãã
urllibã®ãªãã¡ã¬ã³ã¹ã«
urlretrieve( url[, filename[, reporthook[, data]]])
URL ã§è¡¨ããããããã¯ã¼ã¯ä¸ã®ãªãã¸ã§ã¯ãããå¿ è¦ã«å¿ãã¦ãã¼ã«ã«ãªãã¡ã¤ã«ã«ã³ãã¼ãã¾ã
ã£ã¦ãã£ãã®ã§ä½¿ã£ã¦ã¿ãã®ã ããã©ãã³ã¬ã404ãã¡ã¤ã«ããã¦ã³ãã¼ã(?)ãã¦ãã¡ããã
æå®ããurlãééã£ã¦ãã¦ãæ°ä»ããªããå°ã£ãã
ã¨ãããããåãåºãã¢ã³ããã¶åãåºããã¦æå®ããurlã®ãã¡ã¤ã«æç¡ãå¤å®ã§ããã調ã¹ã¦ã¿ãã
#!/bin/env python # -*- coding: shift_jis -*- import urllib def feedback(count,size, total): print "count :%d" % count print "size :%d" % size print "total :%d" % total (file, header) = urllib.urlretrieve("http://www.google.co.jp/","test.html",feedback) print file print header
feedback ãã¦ã³ãã¼ãã®ã¬ãã¼ã(ãããã¯ã®ã«ã¦ã³ãã¨ãµã¤ãºããã¼ã¿ã«)
file ããæå®ãããã¡ã¤ã«å
header HTTPå¿çããã
å¿çãããã§å¤å®ããã°ããã®ããªï¼
ã¨ããããã°ã°ãã¦ã¿ãããå¤äººããã®ã½ã¼ã¹ã«ãããªã®ããã£ãã
(tmp, headers) = urllib.urlretrieve("http://www.google.co.jp/","test.html")
if str(headers).count("Content-Length") == 0:
print "ERROR: File not found (404 error)"
headersã®ä¸ã®Content-Lengthã®æ°ãæ°ãã¦ãããã0ã ã£ãã404ã¨ã©ã¼ã¨å¤å®ãã¦ããã¿ããã
ã§ããããã ã¨Content-Lengthããå¿çãããã«å«ã¾ãã¦ããã°ãã£ã±ãåå¨ããªããã¡ã¤ã«ã§ããã¦ã³ãã¼ããã¦ãã¡ããããã
ã¨ãããã試ãã¦ã¿ã
#!/bin/env python # -*- coding: shift_jis -*- import urllib (tmp, headers) = urllib.urlretrieve("http://www.google.co.jp/aa/bb/cc.gif","test.gif") print str(headers).count("Content-Length") print headers if str(headers).count("Content-Length") == 0: print "ERROR: File not found (404 error)" else: print "OK"
ã¡ããã¨"ERROR: File not found (404 error)"ãã§ãã
ã§ããããè¦ãã¨googleã®å¿çãããã«ã¯ãContent-length: 1223ãã¨ããã
ããããLengthãã§ã¯ãlengthããã«ã¦ã³ãããªãã®ã§404ãåºãã¦ãã¾ãããã£ã±ãé§ç®ã¸ã£ã³ã
è¦åèã
追è¨urllib2ã¢ã¸ã¥ã¼ã«ã§ã¯ãã¡ããã¨404ãæ¤åºãããããã
>>> import urllib >>> urllib.urlopen("http://www.google.co.jp/aa/bb/cc.gif")> >>> import urllib2 >>> urllib2.urlopen("http://www.google.co.jp/aa/bb/cc.gif") Traceback (most recent call last): File " ", line 1, in -toplevel- urllib2.urlopen("http://www.google.co.jp/aa/bb/cc.gif") File "c:\Python23\lib\urllib2.py", line 129, in urlopen return _opener.open(url, data) File "c:\Python23\lib\urllib2.py", line 326, in open '_open', req) File "c:\Python23\lib\urllib2.py", line 306, in _call_chain result = func(*args) File "c:\Python23\lib\urllib2.py", line 901, in http_open return self.do_open(httplib.HTTP, req) File "c:\Python23\lib\urllib2.py", line 895, in do_open return self.parent.error('http', req, fp, code, msg, hdrs) File "c:\Python23\lib\urllib2.py", line 352, in error return self._call_chain(*args) File "c:\Python23\lib\urllib2.py", line 306, in _call_chain result = func(*args) File "c:\Python23\lib\urllib2.py", line 412, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 404: Not Found >>>
ã¡ããã¨ä¾å¤ãã§ããurllib2ã使ã£ãã»ããããã¿ããã
â»åè
urllib.urlopen() fails to raise exception
http://mail.python.org/pipermail/python-bugs-list/2004-July/023990.html