pythonåãHTTPé¢é£ã¢ã¸ã¥ã¼ã«ã®requestsã便å©ããã
(è¿½è¨ 2015/04/19) ã³ã¼ãã¯æ¢ã«äºææ§ãç¡ããªã£ã¦ããã®ã§ã以ä¸ãåèã«èªã¿æ¿ãã¦ä¸ããã
pythonã®requestsã©ã¤ãã©ãªã®æ´æ°ã«è¿½å¾
大å¹
ã«æ¹åå¼·åãããurllib2ãã¨requestsã¢ã¸ã¥ã¼ã«ã使ã£ã¦ã¿ãã便å©ããã¦é©ããã®ã§å ±åã
urllib2ã§é¢åã ã£ãå¦çãã©ãã ãç°¡åã«ãªãã®ãã
ã¯ããã¼ã®å¦ç
ã»ãã·ã§ã³ãç¶æããã¾ã¾HTTPã¢ã¯ã»ã¹ãããå ´åãurllib2ã ã¨
- cookielib.HTTPCookieJarã®ã¤ã³ã¹ã¿ã³ã¹ä½æãã
- urllib2.HTTPCookieProcessorã®ã³ã³ã¹ãã©ã¯ã¿ã«æ¸¡ãã¦ã
- urlib2.build_openerã§OpenerDirectorã¤ã³ã¹ã¿ã³ã¹ãä½æãã
- ãã®ã¤ã³ã¹ã¿ã³ã¹ã®add_handler()ã®å¼ã³åºãã§2.ã§ä½æããurllib2.HTTPCookieProcessorã®ã¤ã³ã¹ã¿ã³ã¹ã渡ããããã¾ã§ãåæºåã
- urllib2.Requestã®ã¤ã³ã¹ã¿ã³ã¹ãä½æãã
- HTTPã®ãã©ã¡ã¼ã¿ãèªåã§urllib.quoteãªã©ã§URLã¨ã³ã³ã¼ããã
- Requestãªãã¸ã§ã¯ããOpenDirector.openã®å¼æ°ã«æ¸¡ã
ã¨ããæé ãå¿ è¦ã
åã«ãurllib2ã使ã£ã¦ã¯ã¦ãªã«ãã°ã¤ã³ããã¨ã³ããªãæ¸ãã¾ããããåãã³ã¼ããrequestsã§æ¸ããªã以ä¸ã®ããã«ãªãã¾ãã
import requests # requests.Sessionã¤ã³ã¹ã¿ã³ã¹ãä½æãã¦ã s = requests.session() # HTTPã®ãã©ã¡ã¼ã¿ã表ãdictã渡ã params = { 'name': 'your user id', 'password': 'your password', } r = s.post('https://www.hatena.ne.jp/login', params=params) print r.text
楽åã§ããã¨ãããurllib2ã訳åãããã§ãã.textã§è¿ãããæååã¯ãHTTPããããããã¯chardetã¢ã¸ã¥ã¼ã«ã§æåã³ã¼ããå¤å¥ãã¦unicodeã®ã¤ã³ã¹ã¿ã³ã¹ãè¿ãã®ã§ãããããæåã§å¦çããå¿
è¦ãããã¾ããã
ããã«ãã¬ã¹ãã³ã¹ãjsonæååã®å ´åã.jsonãåç
§ããã¨ã¬ã¹ãã³ã¹ãdictã®ã¤ã³ã¹ã¿ã³ã¹ã«ãã·ãªã¢ã©ã¤ãºããè¿ããã¾ãã便å©ã§ãã
OAuthã®ãµãã¼ã
requestsã§ã¯OAuthãæ¨æºã§ãµãã¼ãããã¦ãã¾ããrequestsã¯ãå ¬å¼ã®ããã¥ã¡ã³ãã«githubãtwitterã¨ã®é£æºã³ã¼ããæ¸ããã¦ããããã¦ãä»ãµã¼ãã¹ã®APIãç°¡åã«ä½¿ããããæèããã¦ããããã§ãã
import requests from requests.auth import OAuth1 import pprint url = u'https://api.twitter.com/1/statuses/home_timeline.json' client_key = u'consumer key' client_secret = u'consumer secret' resource_owner_key = u'access token' resource_owner_secret = u'access secret' queryoauth = OAuth1(client_key, client_secret, resource_owner_key, resource_owner_secret, signature_type='query') for s in requests.get(url, auth=queryoauth).json: print s['text']
.iter_line()ã§ã¬ã¹ãã³ã¹ã1è¡ãã¤èªãäºãã§ããã®ã§ãStreaming APIã®ãããªããµã¼ãã«ã¤ãªãã£ã±ãªãã§é 次å¦çãããããªå¦çãç°¡åã«ããã¾ãã
url = 'https://userstream.twitter.com/2/user.json' r = requests.get(url, auth=queryoauth, prefetch=False) for line in r.iter_lines(): if line == '': continue print json.loads(line)
éåæ
ããã«ãgeventã¨ããIOãã³ããããã³ã°ã©ã¤ãã©ãªã使ãäºã§ï¼éåæãªã¯ã¨ã¹ããã§ãã¾ãããã®æ©è½ã¯GRequestsã¨ããã¾ãå¥ã®ã¢ã¸ã¥ã¼ã«ã¨ãã¦å¤åºãã«ããã¦ãã¾ããã
# coding: utf-8 import grequests import urllib import time # èµ°æ»å¯¾è±¡ã®URLä¸è¦§ urls = [ 'http://www.heroku.com', 'http://tablib.org', 'http://httpbin.org', 'http://python-requests.org', 'http://kennethreitz.com', ] # åæã§ã®URLãã§ãã class O(urllib.FancyURLopener, object): version = 'alternative user-agent' o = O() def sync(): for u in urls: conn = o.open(u) assert conn.getcode() == 200 # éåæã§ã®URLãã§ãã def async(): rs = (grequests.get(u) for u in urls) g = grequests.imap(rs) for r in g: assert r.status_code == 200 # æéè¨æ¸¬é¢æ° def measure(fn): s = time.time() fn() e = time.time() return e-s # 10åãä½åéããªããè¨æ¸¬(åå3åã¯ç¡è¦) print 'sync(sec)\tasync(sec)\tsync/async' for n in range(13): if n >= 3: s = measure(sync) a = measure(async) print '%.2f\t%.2f\t%.2f' % (s, a, s/a) """ $ python g.py sync(sec) async(sec) sync/async 2.47 1.86 1.32 2.53 1.90 1.33 2.55 1.96 1.30 2.52 1.85 1.36 2.52 2.00 1.26 2.57 1.83 1.40 2.62 1.99 1.32 2.70 1.98 1.36 2.50 1.80 1.39 2.56 1.85 1.38 """
ä»ã«ã
å°å³ã«
- ãªãã¤ã¬ã¯ããèªåã§follow
- gzipå§ç¸®ãããã¬ã¹ãã³ã¹ãèªåã§å±é
ã¨ããã£ãããã¾ãã
ãªãã ãè²ã ã§ãã¡ãã£ã¦ãpyqueryã¨ãBeautifulSoupã¨ãmecabã¨çµã¿åãããã ãã§ã¯ãã¼ã©ã®ãã¬ã¼ã ã¯ã¼ã¯ã¨ãä½ãããã§ããã