[B! IR][BM25] InoHiroã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯

InoHiro id:InoHiro

IRã¨BM25ã«é–¢ã™ã‚‹InoHiroã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯ (6)

${{author_name}}$

{{{comment_expanded}}}

{{label}}

{{#is_bookmark}}ãƒªã‚¹ãƒˆ{{/is_bookmark}}{{^is_bookmark}}ãƒªãƒ³ã‚¯{{/is_bookmark}}

${{author_name}}$
{{author_name}}{{created}}
{{ #comment }}{{ comment }}{{ /comment }}
- {{ label }}

${{author_name}}$

{{{comment_expanded}}}

{{label}}

{{#is_bookmark}}ãƒªã‚¹ãƒˆ{{/is_bookmark}}{{^is_bookmark}}ãƒªãƒ³ã‚¯{{/is_bookmark}}

http://sleepyheads.jp/docs/prob_ir.pdf
InoHiro 2013/12/16
IR

BM25
ãƒªãƒ³ã‚¯
ç¢ºçŽ‡çš„æƒ…å ±æ¤œç´¢ãƒŽãƒ¼ãƒˆ â€• Probability Ranking Principleã‹ã‚‰BM25ã¾ã§ â€• - ã‚·ãƒªã‚³ãƒ³ã®è°·ã®ã‚¾ãƒ³ãƒ“
GWä¸ã«ã‚„ã‚‹ã“ã¨ãƒªã‚¹ãƒˆã®ã²ã¨ã¤ã§ã‚ã‚‹ç¢ºçŽ‡çš„æƒ…å ±æ¤œç´¢ãƒŽãƒ¼ãƒˆãŒã§ããŸã®ã§å…¬é–‹ï¼Ž Notes on Probabilistic Information Retrieval â€•Probability Ranking Principleã‹ã‚‰BM25ã¾ã§â€• ç¢ºçŽ‡çš„æƒ…å ±æ¤œç´¢ã¨ã¯ï¼ŒPrbability Ranking Principle (èª¬æ˜Žã¯ãƒŽãƒ¼ãƒˆå‚ç…§) ã‚’ã‚¹ã‚¿ãƒ¼ãƒˆåœ°ç‚¹ã«ã—ã¦é©åˆç¢ºçŽ‡ã‚’ãƒ¢ãƒ‡ãƒ«åŒ–ã—ãŸæƒ…å ±æ¤œç´¢ã®ã„ã¡åˆ†é‡Žï¼ŽBinary independence modelã‚„BM25ãªã©ãŒå«ã¾ã‚Œã‚‹ (BM25ã¯ã„ã‚ã‚“ãªãƒ’ãƒ¥ãƒ¼ãƒªã‚¹ãƒ†ã‚£ã‚¯ã‚¹ãŒå…¥ã£ã¦ã„ã‚‹ã®ã ã‘ã‚Œã©)ï¼Ž BM25ã¨ã¯ï¼Œ [tex:\sum_{t \in q} q_t \cdot \frac{f_{t,d} (k_1 + 1)}{k_1*1 + f_{t,d}} \cdot w_t] ã¨ã„ã† (èª¬æ˜Žã¯ãƒŽãƒ¼ãƒˆå‚ç…§)ï¼Œã±ã£ã¨è¦‹ãƒ¯ã‚±ãƒ¯ã‚«ãƒ©ãƒ³è¨ˆç®—å¼ã ã‘ã‚Œã©å½“ãŸ
InoHiro 2013/05/30
ir

algorithm

BM25
ãƒªãƒ³ã‚¯
Integrating BM25 & BM25F into Lucene1
Integrating BM25 & BM25F into Lucene JoaquÃn PÃ©rez-Iglesias Introduction This document describes the BM25 and BM25F implementation using the Lucene Java Framework. The implementation described here can be downloaded from http://nlp.uned.es/~jperezi/Lucene-BM25/jar/models.jar. Both models have stood out at TREC by their performance and are considered as state-of-the-art in the IR community. BM25 i
InoHiro 2013/05/26
IR

similarity

algorithm

BM25
ãƒªãƒ³ã‚¯
ç¬¬8å›žã€€è»¢ç½®ç´¢å¼•ã«ãŠã‘ã‚‹æ¤œç´¢å‡¦ç† | gihyo.jp
ä»£è¡¨çš„ãªé–¢é€£åº¦æŒ‡æ¨™ã«ã¯ã€ã‚³ã‚µã‚¤ãƒ³é¡žä¼¼åº¦ï¼ˆcosine similarityï¼‰ã‚„Okapi BM25ãªã©ãŒã‚ã‚Šã¾ã™ã€‚å…·ä½“çš„ãªè¨ˆç®—å¼ã‚„è©³ç´°ã¯ã“ã“ã§ã¯çœç•¥ã—ã¾ã™ãŒã€ä¸Šè¨˜ã®å€¤ã‚’çµ„ã¿åˆã‚ã›ã¦ã€é–¢é€£åº¦ã‚’è¨ˆç®—ã—ã¾ã™[3]â ã€‚ ã‚³ã‚µã‚¤ãƒ³é¡žä¼¼åº¦ã¯ã€æ–‡æ›¸ã¨ã‚¯ã‚¨ãƒªã‚’ã‚¿ãƒ¼ãƒ ã‚’æ¬¡å…ƒã¨ã—ãŸãƒ™ã‚¯ãƒˆãƒ«ç©ºé–“ã«ãƒžãƒƒãƒ—ã—ã€æ–‡æ›¸ãƒ™ã‚¯ãƒˆãƒ«ã¨ã‚¯ã‚¨ãƒªãƒ™ã‚¯ãƒˆãƒ«ã®æˆã™è§’åº¦ã«ã‚ˆã‚Šã€æ–‡æ›¸ã¨ã‚¯ã‚¨ãƒªã®é–¢é€£åº¦ï¼ˆé¡žä¼¼åº¦ï¼‰ã‚’æ±‚ã‚ã¾ã™ï¼ˆæˆã™è§’åº¦ãŒå°ã•ã‘ã‚Œã°é–¢é€£åº¦ãŒé«˜ã„â ï¼‰â ã€‚ã¾ãŸOkapi BM25ã¯ã€æ–‡æ›¸ãŒã‚¯ã‚¨ãƒªã«å¯¾ã—ã¦é©åˆã‹ã©ã†ã‹ã¯ç¢ºçŽ‡çš„ã«æ±ºå®šã•ã‚Œã‚‹ã¨ã„ã†çµ±è¨ˆçš„ãªåŽŸç†ã«åŸºã¥ãã€æ–‡æ›¸ã¨ã‚¯ã‚¨ãƒªã®é–¢é€£åº¦ã‚’æ±‚ã‚ã¾ã™ã€‚ æ¤œç´¢æ™‚ã«ã“ã‚Œã‚‰ã‚’è¨ˆç®—ã™ã‚‹ã«ã¯ã€ç´¢å¼•ã®æ§‹ç¯‰æ™‚ã«ä¸Šè¨˜ã®çµ±è¨ˆå€¤ã‚’è¨ˆç®—ã—ä¿æŒã—ã¦ãŠãå¿…è¦ãŒã‚ã‚Šã¾ã™ã€‚å®Ÿè£…ã«ã¯ã•ã¾ã–ã¾ãªæ–¹æ³•ãŒè€ƒãˆã‚‰ã‚Œã¾ã™ãŒã€ãŸã¨ãˆã°fd,tã¯ãƒã‚¹ãƒ†ã‚£ãƒ³ã‚°ãƒªã‚¹ãƒˆã®ä¸ã«åŸ‹ã‚è¾¼ã‚“ã§ãŠã[4]â ã€ftã‚„Ftã¯è¾žæ›¸ã¨ä¸€ç·’ã«ä¿å˜ã—ã¦ãŠãã¨ã„ã£ãŸæ–¹
InoHiro 2013/05/20
IR

BM25
ãƒªãƒ³ã‚¯
Okapi BM25 - Wikipedia
In information retrieval, Okapi BM25 (BM is an abbreviation of best matching) is a ranking function used by search engines to estimate the relevance of documents to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen SpÃ¤rck Jones, and others. The name of the actual ranking function is BM25. The fuller name, Okap
InoHiro 2013/05/20
wikipedia

algorithm

IR

BM25
ãƒªãƒ³ã‚¯
Okapi-BM25
Okapi-BM25 ã¯ï¼Œæ–‡æ›¸æ¤œç´¢ã«ä½¿ç”¨ã•ã‚Œã‚‹ã‚‚ã®ã§ã‚ã‚Šï¼Œã‚¯ã‚¨ãƒª ã«å¯¾ã™ã‚‹æ–‡æ›¸ ã®é–¢é€£åº¦ã‚’é †ä½ä»˜ã‘ã‚‹æ©Ÿèƒ½ã§ã‚ã‚‹ï¼Ž æ¬¡ã®å¼ã§é–¢é€£åº¦ ã‚’è¨ˆç®—ã™ã‚‹ï¼Ž
InoHiro 2013/05/20
IR

BM25
ãƒªãƒ³ã‚¯
1