ãクローラー/スクレイピング Advent Calendar 2014ã®6æ¥ç®ã§ãããã¨ã全部俺Advent Calendarãéå¬ä¸ã§ãã
ããªã¯ã¨ã¹ãããã£ãã®ã§ãMechanizeã使ã£ãã¹ã¯ã¬ã¤ãã³ã°æ¹æ³ã«ã¤ãã¦ã§ããMechanizeã¯ã対話åã®å¦çãå¾æã¨ããã¹ã¯ã¬ã¤ãã³ã°ã®è£å©ãã¼ã«ã§ãã対話åã¨ã¯ãä¾ãã°IDã¨ãã¹ã¯ã¼ãã使ã£ã¦ãã°ã¤ã³ãããããªãµã¤ããªã®ã§ãããããé ç®ã«å ¥åãã¦æ¬¡ã®ãã¼ã¸ã«é·ç§»ãããããªè¡çºã§ãã
Mechanizeã®ãµã³ãã«ã½ã¼ã¹
ãä¸è¨ã¯ãAmazonã¢ã½ã·ã¨ã¤ããµã¤ããã売ä¸ãåå¾ãããµã³ãã«ã§ãã
require 'mechanize' uri=URI.parse('https://affiliate.amazon.co.jp/') agent = Mechanize.new agent.user_agent_alias = 'Mac Safari' page = agent.get(uri) next_page = page.form_with(:name => 'sign_in') do |form| form.username = 'your_username' form.password = 'your_password' end.submit puts next_page.search('//*[@id="mini-report"]/div[5]/div[2]').text
ãMechanizeã®ãªãã¸ã§ã¯ããä½æããã¦ã¼ã¶ã¨ã¼ã¸ã§ã³ããªã©ã®å±æ§æ
å ±ãæå®ãã¾ãã次ã«å¯¾è±¡ã®URIãæå®ãã¦ããã¼ã¸ãéãã¾ãããã¼ã¸ãéãã¨ãå
¥åãã対象ã®ãã©ã¼ã ãæ¢ãã¾ããå¹¾ã¤ãæ¢ãæ¹ã¯ããã¾ãããä»åã¯ãã©ã¼ã åã§æ¢ãã¦ãã¾ããããã¦ããã©ã¼ã ä¸ã®ã¦ã¼ã¶åã¨ãã¹ã¯ã¼ãã«ããããå
¥åããµãããããã¦ãã¾ãã
ã次ã®ãã¼ã¸ã§ã¯ãxpathå½¢å¼ã§åå¾å¯¾è±¡ã®æ
å ±ãæå®ãã¦ãã¼ã¿ãæãåºãã¦ãã¾ããMechanizeãããã¼ã¿ã®ã¹ã¯ã¬ã¤ãã³ã°ã«ã¯Nokogoriãå
é¨çã«å©ç¨ãã¦ãã¾ãã
Mechanizeãå¾æã¨ããå¦çã¨è¦æã¨ããå¦ç
ãMechanizeã®åºæ¬ã¯å¯¾è©±åã®å¦çã«ãªãã¾ãããã®çºããã¼ã¸ãã¨ã«å¿ è¦ãªå¦çãè¨è¿°ããå¿ è¦ãããã¾ãããã®çºãèªè¨¼ãå¿ è¦ãªãã¼ã¸ãPOSTéä¿¡ãå¿ è¦ãªå ´åã«ã¯ã絶大ãªå¨åãçºæ®ãã¾ããåé¢ããµã¤ãå ããã¾ãªãå·¡åãã¦ãå ¨ã¦ã®ãªã³ã¯ãåå¾ããã¨ãã£ããããªå¦çã¯è¦æã¨ãã¾ããåºæ¥ãªããã¨ã¯ãªãã§ãããè¨è¿°ããå¦çéãå¤ããªãéå¹çã¨ãªãã¾ãã
ææ³
ãMechanizeã¯ãRubyによるクローラー開発技法ã§ãå°ãåãä¸ãã¦ãã¾ãããããããã¾ã詳ããã¯æ¸ãã¦ããªãçºããã£ã¨è²ã æ¸ãã¦ã¨ãããªã¯ã¨ã¹ããåãã¾ããæ ¹å¼·ã人æ°ãªãã ãªãã¨åº¦ã å®æãã¦ãã¾ããä»åã¯ãã³ã®ãããé¨åã ããªã®ã§ããã¤ãæãä¸ãã¦åãæ±ããã¨æãã¾ãã

Rubyã«ããã¯ãã¼ã©ã¼éçºææ³ å·¡åã»è§£ææ©è½ã®å®è£ ã¨21ã®éç¨ä¾
- ä½è : ãã³ãã¡,ä½ã æ¨æé
- åºç社/ã¡ã¼ã«ã¼: SBã¯ãªã¨ã¤ãã£ã
- çºå£²æ¥: 2014/08/25
- ã¡ãã£ã¢: 大åæ¬
- ãã®ååãå«ãããã° (2件) ãè¦ã