node.jsã§HTMLã¹ã¯ã¬ã¤ãã³ã°ãjQueryè¨æ³ã§ããã©ã¤ãã©ãªã¿ã£ã¤ã®æ¯è¼
æã¤ãã³ãã®ããã®ã³ã¼ããæ¸ãã®ã«HTMLã®ã¹ã¯ã¬ã¤ãã³ã°ããããããªã¼ãnodeã§æ¸ãããªã¼ãå»å¹´ã¯ jsdom ã§æ¸ãããã©ä»ã¯ã©ããªãã ãããã¨ã調ã¹ã¦ãããã©ãããããããããã®ããããããã
- jsdom
- ãã¶ãä¸çªæåã§é«æ©è½ãã ãã©éãã
- node-jquery
- jQueryèªãã ããªãã³ã¬ï¼ã¿ãããªæããã ãã©ããã¾ãã¢ãããã¼ãããã¦ãªãããã使ãããã¯ãã¡ã°ãã·ã³ãã«ã
- ä¾åé¢ä¿ãã¡ããã¨æ¸ããã¦ãªãã¦ã追å ã§æã§ xmlhttprequest ãã¤ã³ã¹ãã¼ã«ãã¦ãããªãã¨åããªãã£ãã
- cheerio
- ãåãéãï¼ ä¿ºãjQueryè¨æ³ãåå®è£ ãã¦ããï¼ ã¨ããç·ãããã©ã¤ãã©ãªã
- zombie
- ãã©ã¦ã¶ã®æåãã·ãã¥ã¬ã¼ãããçãªã©ã¤ãã©ãªãMechanizeã¿ãããªãã¤ããªããã¾ã®ç¨éã§ã¯å段ã§HTTPã¬ã¤ã¤ããããå¿ è¦ãããã®ã§ä»åã¯ç¨éã«åããªãã
- sqraper (追è¨)
- ãã¼ã¸ãGETãã¦jQueryã§ã¹ã¯ã¬ã¤ãã³ã°ãããcontent-bodyãä¸ãã¦parseããã¢ã¼ãããªã(å¿ ããã¼ã¸URLãä¸ãã)ã®ã§ä»åã¯ç¨éã«åããªã
ã©ããããããªã¨æã£ããã©ãã¨ããããéããéè¦ã ã£ãã®ã§ç°¡åã«ãã³ããã¼ã¯ã¨ã£ããé©å½ãªHTMLç*1ããã¼ã¹ã㦠jQuery è¨æ³ã§idããç¹å®ã®é
ç®ãã²ã£ã±ã£ã¦ãããã¨ããå¦çã100åç¹°ãè¿ãã
jsdomã«ä¸ããjQueryã®ãã¡ã¤ã«ã¯ãã¼ã«ã«ã«ç¨æãã¦èªã¿è¾¼ãã ã
https://gist.github.com/3897132
MacBookAirã§ããã£ã¨èµ°ãããããããªæãã
$ node parse_bench.js jsdom:6734 in 100 times. jQuery:4403 in 100 times. cheerio:594 in 100 times. { err: null, results: [ 100, 100, 100 ] }
è¦ããã« cheerio ããã£ãã10åéãã§ããï¼
ã¾ã¨ã
cheerioã¯jQueryè¨æ³åå®è£
ãªã®ã§ã欲ããå¦ç(ã»ã¬ã¯ã¿ç)ãå®è£
ããã¦ããªãå¯è½æ§ãããã使ãã¨ãã¯ãªãã¡ã¬ã³ã¹ãã¾ãããèªã¿ã¾ãããã
éã«è¨ãã¨ããµãã¼ãããã¦ããè¨æ³ã使ãéãã§ã¯ cheerio ããã£ãã10åéãã®ã§ããã使ã£ã¦ããã¨ããæããã³ã¼ããç°¡æ½ã§ jsdom ã®ã´ãã´ãããæãã«è¼ã¹ãã¨ãã£ããã
node-jquery ã¯ã¡ã³ããæªããå²ã«ç¹ã«ããã¾ã§éãããã§ããªããã®ã§é¿ãã¦ããã¦ããããããã«ã¹ããã¯ã®jQueryãæ¬²ããå ´åããã®ãããã®å·®ã§ããã° jsdom ã使ãã°ããæ°ããããjsdomã使ãå ´å㯠jquery æ¬ä½ã®ãã¡ã¤ã«ã¯ãã¼ã«ã«ã«ç½®ãã¦æååã«èªã¿è¾¼ãã§ãããã¨ã
ãã ãparserã®é度ã¯å¯¾è±¡ã®HTMLã®æ§é ããµã¤ãºãããã³ä½¿ç¨ããjQueryã»ã¬ã¯ã¿ãªã©ã«ãã£ã¦ã ãã¶ç°ãªãã¨æãã®ã§ãèªåã®ã¦ã¼ã¹ã±ã¼ã¹ã§ã¡ããã¨ç¢ºèªãã¾ãããã
*1:è«¸äºæ ãã£ã¦ããã§è¦ããããã«ã¯ãããªãï¼ å²ã¨å°ããã®ãã®