PHPã使ã£ã¦RSSãMySQLã¸ä¿å
RSSãHTTPã§åå¾ãã¦åitemããã¼ã¹âDBã¸æ¸ãè¾¼ã¿
ä¸è¨ãcronã§å®æçã«å®è¡
ä½æããããã°ã©ã
RSSãåå¾ãã¦ãã¼ã¹ããã¯ã©ã¹
DBã¸ã®æ¸ãè¾¼ã¿ã¯ã©ã¹
cronå®è¡ã¹ã¯ãªãã
RSSã®åå¾
phpã®file_get_contentsã§RSSã®URLã渡ãã¦åå¾
ã¹ãã¼ã¿ã¹ã³ã¼ãã200å ´åã®ã¿å¦çãç¶è¡
RSSã«ã¯RSS1.0, RSS2.0, Atomãªã©ããã¤ãã®ç¨®é¡ããããè¦ç´ åãåãã©ã¼ãããã«ã«ãã£ã¦ç°ãªãããããããã«å¿ããParserãå¿ è¦
ã¿ã° | RSS1.0 | RSS2.0 | Atom |
---|---|---|---|
è¦ç´ | item | channel | entry |
ã¿ã¤ãã« | title | title | title |
ãªã³ã¯ | linkã | link | link |
説æ | description | description | description |
æ¥ä» | dc:dateã | pubDate | issued |
ãããã以ä¸ã®ãããªãã©ã¼ããã
RSS1.0(RDF)
<?xml version="1.0" encoding="UTF-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:hatena="http://www.hatena.ne.jp/info/xmlns#" xmlns:media="http://search.yahoo.com/mrss" xmlns:opensearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"> <channel rdf:about="http://b.hatena.ne.jp/hotentry"> <title>ã¯ã¦ãªããã¯ãã¼ã¯ - 人æ°ã¨ã³ããªã¼</title> <link>http://b.hatena.ne.jp/hotentry</link> <atom10:link rel="self" type="application/rdf+xml" href="http://feeds.feedburner.com/hatena/b/hotentry" xmlns:atom10="http://www.w3.org/2005/Atom" /> <atom10:link rel="hub" href="http://pubsubhubbub.appspot.com/" xmlns:atom10="http://www.w3.org/2005/Atom" /> <feedburner:info uri="hatena/b/hotentry" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" /> <description>æè¿ã®äººæ°ã¨ã³ããªã¼</description> <items> <rdf:Seq> <rdf:li rdf:resource="http://anond.hatelabo.jp/20140516005154" /> <rdf:li rdf:resource="http://www.cinematoday.jp/movie/T0019009" /> ã»ã»ã» </rdf:Seq> </items> </channel> <item rdf:about="http://anond.hatelabo.jp/20140516005154"> <title>æç³»å¦é¨ã£ã¦å¿ è¦ãªã®ï¼</title> <link>http://anond.hatelabo.jp/20140516005154</link> <content:encoded>ã»ã»ã»</content:encoded> <dc:date>2014-05-17T12:27:53+09:00</dc:date> <dc:subject>å¦ã³</dc:subject> <hatena:bookmarkcount>68</hatena:bookmarkcount> <description>ããã¡ãã£ã¨è©³ããè¨ãã¨ãæ§å¸å¤§ã¯ã©ã¹ä»¥ä¸ã®ã¬ãã«ã®å¤§å¦ã§æç³»å¦é¨ã£ã¦å¿ è¦ãªã®ï¼å®åæ¿æ¨©ã®å¤§å¦æ¹é©ã§ãæç³»ä¸è¦è«ãå°é ãã¦ãã®ã§ãæç³»ã®çããã大é¨ããã¦ããã ãã©ãçç³»ã®èªåã¨ãã¦ã¯ãè¤éãªå¿å¢ã«ãªããã ãããããã£ã¦èªæ¥èªå¾ã£ã¦ã¨ãããããããããªãã®ï¼ã£ã¦ããããæ§å¸å¤§ã¯ã©ã¹ã®æç³»ã§ãï¼ã¬ãã¾æ¹¯ã«ã¤ããããã ãã£ã¦ãçç³»ã®äººéãªã誰ã§ãæã£ã¦ãããããªãããªãæ¯å¹´æ¯å¹´ææ¥ã¯å¤ããæ ããããã...</description> </item> <item rdf:about="http://www.cinematoday.jp/movie/T0019009"> <title>æ ç»ããã¹ããã° åå ´çã - ã·ãããã¥ãã¤</title> ã»ã»ã»
RSS2.0
<?xml version="1.0" encoding="UTF-8" ?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:hatena="http://www.hatena.ne.jp/info/xmlns#" xmlns:media="http://search.yahoo.com/mrss" xmlns:opensearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"> <channel> <title>ã¯ã¦ãªããã¯ãã¼ã¯ - 人æ°ã¨ã³ããªã¼</title> <link>http://b.hatena.ne.jp/hotentry</link> <description>æè¿ã®äººæ°ã¨ã³ããªã¼</description> <item> <title>æç³»å¦é¨ã£ã¦å¿ è¦ãªã®ï¼</title> <link>http://anond.hatelabo.jp/20140516005154</link> <category>å¦ã³</category> <content:encoded>ã»ã»ã»</content:encoded> <guid isPermaLink="true">http://anond.hatelabo.jp/20140516005154</guid> <hatena:bookmarkcount>63</hatena:bookmarkcount> <pubDate>Sat, 17 May 2014 12:27:53 +0900</pubDate> <description>ããã¡ãã£ã¨è©³ããè¨ãã¨ãæ§å¸å¤§ã¯ã©ã¹ä»¥ä¸ã®ã¬ãã«ã®å¤§å¦ã§æç³»å¦é¨ã£ã¦å¿ è¦ãªã®ï¼å®åæ¿æ¨©ã®å¤§å¦æ¹é©ã§ãæç³»ä¸è¦è«ãå°é ãã¦ãã®ã§ãæç³»ã®çããã大é¨ããã¦ããã ãã©ãçç³»ã®èªåã¨ãã¦ã¯ãè¤éãªå¿å¢ã«ãªããã ãããããã£ã¦èªæ¥èªå¾ã£ã¦ã¨ãããããããããªãã®ï¼ã£ã¦ããããæ§å¸å¤§ã¯ã©ã¹ã®æç³»ã§ãï¼ã¬ãã¾æ¹¯ã«ã¤ããããã ãã£ã¦ãçç³»ã®äººéãªã誰ã§ãæã£ã¦ãããããªãããªãæ¯å¹´æ¯å¹´ææ¥ã¯å¤ããæ ããããã...</description> </item> <item> <title>æ ç»ããã¹ããã° åå ´çã - ã·ãããã¥ãã¤</title> ã»ã»ã»
Atom
<?xml version="1.0" encoding="UTF-8" ?> <feed xmlns="http://purl.org/atom/ns#" version="0.3" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:hatena="http://www.hatena.ne.jp/info/xmlns#" xmlns:media="http://search.yahoo.com/mrss" xmlns:opensearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"> <title type="text/plain">ã¯ã¦ãªããã¯ãã¼ã¯ - 人æ°ã¨ã³ããªã¼</title> <link rel="alternate" type="text/html" href="http://b.hatena.ne.jp/hotentry" /> <author> <name></name> </author> <tagline type="text/html" mode="escaped">æè¿ã®äººæ°ã¨ã³ããªã¼</tagline> <entry> <title type="text/plain">æç³»å¦é¨ã£ã¦å¿ è¦ãªã®ï¼</title> <link rel="alternate" type="text/html" href="http://anond.hatelabo.jp/20140516005154" /> <content type="text/html" mode="escaped">ããã¡ãã£ã¨è©³ããè¨ãã¨ãæ§å¸å¤§ã¯ã©ã¹ä»¥ä¸ã®ã¬ãã«ã®å¤§å¦ã§æç³»å¦é¨ã£ã¦å¿ è¦ãªã®ï¼å®åæ¿æ¨©ã®å¤§å¦æ¹é©ã§ãæç³»ä¸è¦è«ãå°é ãã¦ãã®ã§ãæç³»ã®çããã大é¨ããã¦ããã ãã©ãçç³»ã®èªåã¨ãã¦ã¯ãè¤éãªå¿å¢ã«ãªããã ãããããã£ã¦èªæ¥èªå¾ã£ã¦ã¨ãããããããããªãã®ï¼ã£ã¦ããããæ§å¸å¤§ã¯ã©ã¹ã®æç³»ã§ãï¼ã¬ãã¾æ¹¯ã«ã¤ããããã ãã£ã¦ãçç³»ã®äººéãªã誰ã§ãæã£ã¦ãããããªãããªãæ¯å¹´æ¯å¹´ææ¥ã¯å¤ããæ ããããã...</content> <content:encoded>ããã</content:encoded> <hatena:bookmarkcount>68</hatena:bookmarkcount> <id>http://anond.hatelabo.jp/20140516005154</id> <issued>2014-05-17T12:27:53+09:00</issued> <modified>2014-05-17T12:27:53+09:00</modified> </entry> <entry> <title type="text/plain">æ ç»ããã¹ããã° åå ´çã - ã·ãããã¥ãã¤</title> ã»ã»ã»
RSSã®ãã¼ã¹
simplexml_load_stringã§XMLãæ±ããããã«å¤æ
RSSParserãããã©ã¼ãããã«å¿ãã¦ããããã®XMLããã¼ã¹ã§ããããã«ãã
RSSParser.php
<?php require_once(Common::ROOT_DIR . '/parser/RSS10Parser.php'); require_once(Common::ROOT_DIR . '/parser/RSS20Parser.php'); require_once(Common::ROOT_DIR . '/parser/AtomParser.php'); class RSSParser { private $rss10Parser; private $rss20Parser; private $atomParser; private $items; public function __construct() { $this->rss10Parser = new RSS10Parser(); $this->rss20Parser = new RSS20Parser(); $this->atomParser = new AtomParser(); $this->items = array(); } public function parse($url) { $content = file_get_contents($url); $rss = simplexml_load_string($content, 'SimpleXMLElement', LIBXML_NOCDATA); # Status codeã200以å¤ã¯å¦çããªã $statusCode = $http_response_header[0]; if (strpos($statusCode, '200') == false) { fputs(STDERR, "Unexpected status code: $statusCode\n"); return; } $type = $rss->getName(); if ($type == 'RDF') { foreach ($rss->item as $item) { array_push($this->items, $this->rss10Parser->parse($item)); } } else if ($type == 'rss') { foreach ($rss->channel->item as $item) { array_push($this->items, $this->rss20Parser->parse($item)); } } else if ($type == 'feed') { foreach ($rss->entry as $item) { array_push($this->items, $this->atomParser->parse($item)); } } else { fputs(STDERR, "Unexpected type: $type\n"); } return $this->items; } }
RSS10Parser.php
<?php require_once(Common::ROOT_DIR . '/parser/AbstractParser.php'); class RSS10Parser extends AbstractParser { Const DC_NAMESPACE = 'http://purl.org/dc/elements/1.1/'; public function getTitle($item) { return $item->title; } public function getLink($item) { return $item->link; } public function getDescription($item) { if (isset($item->description)) { return $item->description; } else { return null; } } public function getDate($item) { if (isset($item->children(self::DC_NAMESPACE)->date)) { return $item->children(self::DC_NAMESPACE)->date; } else { return null; } } }
RSS20Parser.php
<?php require_once(Common::ROOT_DIR . '/parser/AbstractParser.php'); class RSS20Parser extends AbstractParser { public function getTitle($item) { return $item->title; } public function getLink($item) { return $item->link; } public function getDescription($item) { if (isset($item->description)) { return $item->description; } else { return null; } } public function getDate($item) { if (isset($item->pubDate)) { return $item->pubDate; } else { return null; } } }
AtomParser.php
<?php require_once(Common::ROOT_DIR . '/parser/AbstractParser.php'); class AtomParser extends AbstractParser { public function getTitle($item) { return $item->title; } public function getLink($item) { return $item->link->attribute()->href; } public function getDescription($item) { if (isset($item->content)) { return $item->content; } else { return null; } } public function getDate($item) { if (isset($item->published)) { return $item->published; } else { return null; } } }
Parserã®æ½è±¡ã¯ã©ã¹ AbstractParser.php
<?php abstract class AbstractParser { abstract protected function getTitle($item); abstract protected function getLink($item); abstract protected function getDescription($item); abstract protected function getDate($item); public function parse($item) { $items['title'] = (string)$this->getTitle($item); $items['link'] = (string)$this->getLink($item); $items['description'] = (string)$this->getDescription($item); $items['date'] = (string)$this->getDate($item); return $items; } }
DBã¸ã®æ¸ãè¾¼ã¿
parseããåæ å ±ãDBã«æ¸ãè¾¼ã
<?php class DBManager { private $pdo; public function __construct($dbname, $host, $user, $pass) { try { $this->pdo = new PDO( "mysql:dbname=$dbname;host=$host", $user, $pass, array( PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC, PDO::ATTR_EMULATE_PREPARES => false, PDO::MYSQL_ATTR_READ_DEFAULT_FILE => '/etc/my.cnf',)); $this->pdo->query("SET NAMES utf8"); } catch (PDOException $e) { die($e->getMessage()); } } public function registerItems($item, $rssUrl) { if($this->itemExists($item['link'])) { // æ¢ã«åå¨ãã¦ããã°æ´æ° $this->updateItems($item, $rssUrl); } else { $this->insertItems($item, $rssUrl); } } public function itemExists($link) { $stmt = $this->pdo->prepare(implode(' ', array( 'SELECT link', 'FROM items', 'WHERE link = ?', 'LIMIT 1', ))); $stmt->execute(array($link)); return (bool)$stmt->fetch(); } public function insertItems($item, $rssUrl) { $this->pdo->beginTransaction(); try { $stmt = $this->pdo->prepare(implode(' ', array( 'INSERT', 'INTO items(link, title, description, date, rss_url, created, modified)', 'VALUES (?, ?, ?, ?, ?, NOW(), NOW())', ))); $stmt->execute(array( $item['link'], $item['title'], $item['description'], $item['date'], $rssUrl)); $id = $this->pdo->lastInsertId(); $this->pdo->commit(); return $id; } catch (Exception $e) { $this->pdo->rollBack(); throw $e; } } public function updateItems($item, $rssUrl) { $this->pdo->beginTransaction(); try { $stmt = $this->pdo->prepare(implode(' ', array( 'UPDATE', 'items SET title=?, description=?, date=?, rss_url=?, modified=NOW()', 'WHERE link = ?' ))); $stmt->execute(array( $item['title'], $item['description'], $item['date'], $rssUrl, $item['link'])); $id = $this->pdo->lastInsertId(); $this->pdo->commit(); return $id; } catch (Exception $e) { $this->pdo->rollBack(); throw $e; } } }
cronå®è¡ã¹ã¯ãªãã
å®æçã«å®è¡ããããã®ã¹ã¯ãªããä½æ rss2db.php
<?php require_once(Common::ROOT_DIR . '/DBManager.php'); require_once(Common::ROOT_DIR . '/RSSParser.php'); $lines = file_get_contents(Common::ROOT_DIR . '/rssList.txt'); // æ¹è¡æåãåãé¤ã $lines = explode("\n", $lines); $rssParser = new RSSParser(); $dbManager = new DBManager(Common::DB_NAME, Common::DB_HOST, Common::DB_USER, Common::DB_PASS); $items = array(); foreach ($lines as $rssUrl) { # 空ã®urlã¯å¦çããªã if (empty($rssUrl)) { continue; } // Parse RSS $items = $rssParser->parse($rssUrl); // To DB foreach ($items as $item) { $dbManager->registerItems($item, $rssUrl); } } // Disconnect DB $dbManager = null;
ãããcronã«è¨å®ã1æéã«ä¸åã¹ã¯ãªãããå®è¡ãããè¨å®
0 * * * * /usr/bin/php /var/www/html/rss2db.php >/tmp/crontab.log 2>&1
ããã§å®æçã«RSSãåå¾ãDBã«ä¿åãããã¨ãã§ãã