Webã¹ã¯ã¬ã¤ãã³ã°ã¨ã¯Webããæ å ±ãèªåçã«éãã¦ããã¯ãã¼ã©ãå®è£ ããã¨ãããã¨ã§ããããããå®ç¾ããã«ã¯HTTPã¯ã©ã¤ã¢ã³ãã¨HTMLãã¼ãµãããã¦ãã¼ã¹ãããæ¨æ§é ããå¿ è¦ãªæ å ±ãæ¢ç´¢ãæ½åºããã»ã¬ã¯ã¿ãããã°ãããCommon Lispã«ã¯ããããã«è¤æ°ã®ã©ã¤ãã©ãªãããããä»åã¯HTTPクライアントにDexadorãHTML/XMLパーサにPlumpãCSSセレクタにCLSSã使ãããããã®ã©ã¤ãã©ãªã¯å ¨ã¦Quicklispããå ¥ãã
(ql:quickload :dexador) (ql:quickload :plump) (ql:quickload :clss)
ä¾ã¨ãã¦ãã®ãã¤ã¿ã¼ã®è¨äº 堅調地合い、1万8000円へ戻りを試す展開に=来週の東京株式市場 ãåæãã¦ã¿ãã
HTTPã¯ã©ã¤ã¢ã³ã: Dexador
ã¾ãHTTPã¯ã©ã¤ã¢ã³ãã§HTMLãåã£ã¦ãããããã«ã¯dexadorã®geté¢æ°ã使ãã
(defparameter article-html (dex:get "http://jp.reuters.com/article/idJPL3N0U325520141219"))
dex:getã¯åå¾ããHTMLæååãã¹ãã¼ã¿ã¹ãã¡ã¿æ å ±ã®ããã·ã¥è¡¨ãURIãã¹ããªã¼ã ãå¤å¤ã§è¿ãã
"<!doctype html><html><head> <title> å 調å°åãã1ä¸8000åã¸æ»ãã試ãå±éã«ï¼æ¥é±ã®æ±äº¬æ ªå¼å¸å ´ |ãã¤ã¿ã¼</title>| ... ä¸ç¥ ... </html> " 200 #<HASH-TABLE :TEST EQUAL :COUNT 14 {1003F285C3}> #<QURI.URI.URI-HTTP http://jp.reuters.com/article/idJPL3N0U325520141219> #<SB-SYS:FD-STREAM for "socket 192.168.11.12:43208, peer: 52.222.193.218:80" {1003DD4B13}>
HTMLãã¼ãµ: Plump
次ã«ãplumpã®parseé¢æ°ã§HTMLæååããã¼ã¹ãããããã¯æ¨æ§é ã®ã«ã¼ãã«ç¸å½ããCLOSãªãã¸ã§ã¯ããè¿ãã
(defparameter parse-tree (plump:parse article-html)) ;; => #<PLUMP-DOM:ROOT {1006E77F53}>
ãã®ãªãã¸ã§ã¯ãã®åã表示ãã¦ã¿ãã¨ã
(plump:children parse-tree) ;; #(#<PLUMP-DOM:COMMENT {1005D8C563}> #<PLUMP-DOM:TEXT-NODE {1005D8C853}> ;; #<PLUMP-DOM:COMMENT {1005D8CF53}> #<PLUMP-DOM:TEXT-NODE {1005D8D253}> ;; #<PLUMP-DOM:COMMENT {1005D8DB73}> #<PLUMP-DOM:TEXT-NODE {1005D8DE93}> ;; #<PLUMP-DOM:COMMENT {1005D8E4A3}> #<PLUMP-DOM:TEXT-NODE {1005D8E773}> ;; #<PLUMP-DOM:COMMENT {1005D8ECF3}> #<PLUMP-DOM:TEXT-NODE {1005D8F053}> ;; #<PLUMP-DOM:DOCTYPE html> #<PLUMP-DOM:ELEMENT html {1005D8FDC3}> ;; #<PLUMP-DOM:TEXT-NODE {1006274133}>)
ãã®ãã¡text-nodeãªãã¸ã§ã¯ããæååãæã£ã¦ãããæ¨æ§é ãèµ°æ»ãã¦text-nodeã®æã¤æååã ããé£çµããé¢æ°ãå®ç¾©ãã¦ã¿ãã¨ãããªãã
(defun node-text (node) (let ((text-list nil)) (plump:traverse node (lambda (node) (push (plump:text node) text-list)) :test #'plump:text-node-p) (apply #'concatenate 'string (nreverse text-list))))
æ®éã«å帰ã§æ¸ãã¦ãè¡æ°ã¯ãã¾ãå¤ãããªãã¨æããããã£ããtraverseé¢æ°ãç¨æããã¦ããã®ã§ä½¿ã£ã¦ã¿ãã
CSSã»ã¬ã¯ã¿: CLSS
jQueryã®ããã«æ¨æ§é ããCSSè¦ç´ ãæå®ãã¦é¨åæ¨ãæãã¦ãããã¨ãã§ãããä¾ãã°ãPlumpã§ãã¼ã¹ããæ¨ããarticleTextã¨ããIDãæã¤æåã®ãã¼ããåãåºãã«ã¯ä»¥ä¸ã®ããã«ããã
(defparameter sub-tree (aref (clss:select "#articleText" parse-tree) 0))
ãã®é¨åæ¨ã«å¯¾ãã¦å ã»ã©å®ç¾©ããnode-textã使ãã¨è¨äºã®æ¬æãå¾ãããã
(node-text sub-tree) ;; " ;; ï¼»æ±äº¬ãï¼ï¼æ¥ããã¤ã¿ã¼ï¼½ - æ¥é±ã®æ±äº¬æ ªå¼å¸å ´ã¯å 調ãªå°åããç¶ãè¦éãã ã ï¼ä»¥ä¸ç¥ ;; "
åæ§ã«è¨äºã¿ã¤ãã«ãã¸ã£ã³ã«ãªã©ãåã£ã¦ããããã
(node-text (aref (clss:select ".article-headline" parse-tree) 0)) ; => "å 調å°åãã1ä¸8000åã¸æ»ãã試ãå±éã«ï¼æ¥é±ã®æ±äº¬æ ªå¼å¸å ´" (node-text (aref (clss:select ".article-section" parse-tree) 0)) ; => "Markets"
ã¾ã¨ãã¨ã
å®éã®ãã¼ã¸ã®ã½ã¼ã¹ãè¦ã¦ã¿ãã¨æ¬æã®é¨åã¯divãspanãå ¥ãä¹±ãã¦ããã®ã§åç´ãªæååã®ãã¿ã¼ã³ãããã ã¨ããã©ããããã«æããããHTMLããã¼ã¹ãã¦æ¨æ§é ã¨ãããã¨ã§ä¸æ°ã«æ±ãããããªãã
ãã©ã¦ã¶ã®ã¤ã³ã¹ãã¯ã¿ã§ã¯ã©ã¹/IDã調ã¹ã¦clss:selectã§æå®ããã ããªã®ã§ç°¡åã
ãã¤ã¿ã¼ã®å ´åãサイトマップのXMLファイルãããã®ã§ä¸ã¨åæ§ã«åæãã¦URLã®ãªã¹ããåãåºããã¨ãã§ããã