å ¨ææ¤ç´¢ã«ã¤ãã¦ç°¡åã«èª¿ã¹ãã¡ã¢
ãã£ãã調ã¹ãã
æ¦è¦ãç¥ã
èªã
ã¡ã¢
- å½¢æ
ç´ è§£æ:
- è¾æ¸å質ã«ããæ¤ç´¢è½ã¡ã
- N-Gram:
- ãã¤ãº: äº¬é½ -> æ±äº¬é½åº
- ã¤ã³ããã¯ã¹ãµã¤ãºè¥å¤§å
- è©ä¾¡ææ¨
- recall (åç¾ç): æ¤ç´¢æ¼ãã®å°ãªã
- precision (é©åç): æ¤ç´¢ãã¤ãºã®å°ãªã
- recall 㨠precision ã¯ãã¬ã¼ããªã
æ¥æ¬èªç¸ããªãå½¢æ ç´ è§£æ ?
- Ngram ã®å©ç¹ã¯è¨èªé¸ã°ãé©ç¨å¯è½ãªãã¨
- ä½ãåççã«ç²¾åº¦ãå½¢æ
ç´ è§£æã«åã°ãªã
- æ¤ç´¢æããåé¿ãããçã®æ確ãªçç±ã«ããæ¤è¨ä½å°ãã
- æ¥æ¬èªã¯ç¹æ®å¦çãè¦ããå¾åããã
å½¢æ ç´ è§£æã®æ¬ ç¹
- å¦çæé
- è¾æ¸ã®åå²åä½ã¨æ¤ç´¢æ¼ã
- è¾æ¸:ãã«ããã¢ããã ã¯ã¨ãª:ããã ã§ãã¼ããã
ã½ãªã¥ã¼ã·ã§ã³
Solr
- ECãã, Mapion éç¨å®ä¾ http://d.hatena.ne.jp/Kishi/20090722/1248281864
groonga
Sphinx
ï¼æç²ï¼ä¸¦ã³æ¿ãï¼
http://www.sphinxsearch.com/about.html
..Sphinx was specially designed to integrate well with SQL databases .. built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe
..supports MySQL natively (MyISAM and InnoDB tables are both supported)
..distributed under GPL version 2. Commercial license is also available for embedded use.
çå
- MyISAM éå®?
- InnoDB ãã¼ãã«ã¨è¶³ãè¾¼ãã§ã®æ¡ä»¶ä»æ¤ç´¢ã¨ãåºæ¥ãã®ï¼
- sphinx ã«å ¨é¨çªã£è¾¼ã㧠filter 使ã
- ã¤ã³ããã¯ã¹ã®é¨åçãªæ´æ°åæ ã£ã¦åºæ¥ãï¼
2008 å¹´ã®è¨äº
Sphinx ã¯ã¤ã³ã¹ãã¼ã«ãç¶æ管çã容æã§ãããã¾ãé常ã«é«æ©è½ã§ããããã Sphinx ã®æè¿ã®ãªãªã¼ã¹ã§ã¯ããã¤ãã£ãã® MySQL ã¨ã³ã¸ã³ãæä¾ãã¦ãããSphinx ãã¼ã¢ã³ãå¥éå®è¡ããå¿ è¦ãããã¾ããã
http://www.ibm.com/developerworks/jp/opensource/library/os-php-apachesolr
Sphinx ã¯æéã¨å ±ã«æ¹åãç¶ãããã¦ãããã·ã§ããã³ã°ã»ãµã¤ããããã°ããã®ä»æ°å¤ãã®ã¢ããªã±ã¼ã·ã§ã³ã«çæ³çãªãã®ã«ãªã£ã¦ãã¾ããSphinx ã®ãµã¤ãã«ããã°ã1 ã¤ã®ã¢ããªã±ã¼ã·ã§ã³ãä»ã 7 åã®ææ¸ãã¾ãã¯ç´ 1.2 ãã©ãã¤ãã®ãã¼ã¿ãç´¢å¼ä»ããããã¨ãã§ãã¾ããç§ã¯è¿·ãã Sphinx ããè¦ããã¾ãã
ããã Sphinx ã¯ãã¢ããªã±ã¼ã·ã§ã³ããµã¤ãã®äººæ°ãé«ã¾ããå©ç¨æ°ãå¢ããã«ã¤ãã¦æ±ããããããããã¯æä¾ãå¿ è¦ã¨ãªããããã¤ãã®æ©è½ãã¾ã ãµãã¼ããã¦ãã¾ããã
ç¹ã«ãSphinx ã¯ã¾ã ç´¢å¼ãèªåçã«è¤è£½ãããé å¸ãããããæ©è½ããªããããSphinx ã®ãã¼ã¢ã³ãåä¸é害ç¹ã«ãªã£ã¦ãã¾ãã
(ãã®åé¿çã¨ãã¦ãæ°å°ã®ãã·ã³ãåããã¼ã¿ãã¼ã¹ã«ç´¢å¼ãä»ããããã«ããããããã·ã¹ãã ãã¯ã©ã¹ã¿ã¼åããæ¹æ³ãããã¾ãã)
Sphinx 㯠(Google ããã£ãã·ã¥ãããã¼ã¸ã表示ããéã«å¼·èª¿ããããã«) æ¤ç´¢çµæã強調表示ãããã¨ã¯ãªããæè¿ã®æ¤ç´¢çµæãä¿æããããã£ãã·ã¥ããããããã¨ã¯ãªããã¾ãæ£è¦è¡¨ç¾ (regex) ãæ¥ä»ã«åºã¥ãæä½ããµãã¼ããã¦ãã¾ããã
å®è·µ MySQL ããã©ã¼ãã³ã¹ç¬¬äºç ä»é²C
C.1ãæ¦è¦ï¼ä¸è¬çãªSphinxæ¤ç´¢
C.2ãSphinxã使ç¨ããçç±
C.2.1ãå¹ççã§ã¹ã±ã¼ã©ãã«ãªå ¨ææ¤ç´¢
C.2.2ãWHEREå¥ã®å¹ççãªé©ç¨
C.2.3ãçµæãä¸ä½ã®ãã®ããé ã«æ¤ç´¢ãã
C.2.4ãGROUP BYã¯ã¨ãªã®æé©å
C.2.5ãçµæã»ããã®ä¸¦è¡çæ
C.2.6ãã¹ã±ã¼ãªã³ã°
C.2.7ãã·ã£ã¼ããã¼ã¿ã®éè¨
C.3ãã¢ã¼ããã¯ãã£ã®æ¦è¦
C.3.1ãã¤ã³ã¹ãã¼ã«ã®æ¦è¦
C.3.2ããã¼ãã£ã·ã§ã³ã®ä¸è¬çãªç¨é
C.4ãç¹å¥ãªæ©è½
C.4.1ããã¬ã¼ãºè¿åã©ã³ãã³ã°
C.4.2ãå±æ§ã®ãµãã¼ã
C.4.3ããã£ã«ã¿ãªã³ã°
C.4.4ãSphinxSEã¹ãã¬ã¼ã¸ã¨ã³ã¸ã³
C.4.5ãé«åº¦ãªããã©ã¼ãã³ã¹å¶å¾¡
C.5ãå®è£ ã®å®ä¾
C.5.1ãMininova.orgã®å ¨ææ¤ç´¢
C.5.2ãBoardReader.comã§ã®å ¨ææ¤ç´¢
C.5.3ãSahibinden.comã§ã®é¸æã®æé©å
C.5.4ãBoardReader.comã§ã®GROUP BYã®æé©å
C.5.5ãGrouply.comã§ã®ã·ã£ã¼ãJOINã¯ã¨ãªã®æé©å
C.6ãã¾ã¨ã
MySQL å¨ãã®ã話
MySQL 5.0
http://www.mysql.gr.jp/frame/modules/bwiki/index.php?plugin=attach&refer=matsunobu&openfile=MySQL_perf.pdf
– Tritonn/Senna
• ä½åæ å ±ã·ã¹ãã ã¨æªæ¥æ¤ç´¢ãã©ã¸ã«ã«ãããµãã¼ãæä¾
• æãå®ç¸¾ããã
MySQL 5.1以é
– Sphinx
• http://www.sphinxsearch.com/
• MySQLã®ã¹ãã¬ã¼ã¸ã¨ã³ã¸ã³ã¨ãã¦åä½ããUTF-8ã§ããã°Bi-gramæ¹å¼ã« ãããæ¥æ¬èªã®å ¨ææ¤ç´¢ãå¯è½
• åæ£æ¤ç´¢ã¨ã³ã¸ã³ãé常ã«é«é
• Craigslistãªã©ãæµ·å¤ã®å¤§è¦æ¨¡ãµã¤ãã§ã®å®ç¸¾ãå¤æ°
• å©ç¨æ¹æ³ãããç¹æ®
– FulltextParserPlugin
• http://mysqlftppc.wiki.sourceforge.net/Home-j
– mroonga/groonga
• Sennaã®å¾ç¶ã«ããããç¾å¨éçºä¸
MongoDB ã®æ¤ç´¢ã¹ãã©ãã¸ã¡ããã£ã¨
- http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo
- http://www.businessinsider.com/how-we-use-mongodb-2009-11
- Business Insider 㯠LAMoP æ§æ
- SimpleMongoPhp
- memcached ã§ãã£ãã·ã³ã°ã¬ã¤ã¤
- åç´ãªãã®ãè¨äºåä½ã¯ãã£ãã·ã¥å¯¾è±¡ã¨ãã
- mongo èªä½ã effective caching layer ã§ãã
- we do still do some caching on more complex queries
- most popular
- å ¨ææ¤ç´¢ã¯ Sphinx ã Lucene ã¨çµã¿åãããã
- èªåã§è»¢ç½®ã¤ã³ããã¯ã¹ä½ã£ã¦ã人éãããã