RAGããã£ã¦ãã¦ç²¾åº¦æ¤è¨¼ç¨ãã¼ã¿ã»ããã®ä½æã§å°ã£ããã¨ã¯ãªãã§ããï¼
精度è©ä¾¡ç¨ã®ãã¼ã¿ã»ãããªãã¦ä½æãããã¨ã ãã§ãè¶ ããã©ãããããããªãã§ããï¼ ã¨ãããã¨ã§ãä»åã¯ãã®ç°¡æ精度è©ä¾¡ã«ä½¿ããã¼ã¿ã»ããä½æããã£ã¦ã¿ããã¨æãã¾ãã
- RAGã®ç²¾åº¦è©ä¾¡ã®ãããã
- ãã¼ã¿ã»ããä½æãèãã
- 試ãã«ãã¼ã¿ã»ãããä½ã£ã¦è©ä¾¡ãã¦ã¿ã
- ææ³
RAGã®ç²¾åº¦è©ä¾¡ã®ãããã
ãã¾ãç´°ãã話ã¯ãã¾ããããRAGã§ã¯Retrieval ãã§ã¼ãºã¨Generationãã§ã¼ãºã®2ã¤ã®æ®µéãåå¨ããå¤ãã®å ´åããã両æ¹ã®ç²¾åº¦ãè©ä¾¡ããããªãã¾ãã(å¤åã¿ãªããããã ã¨æã£ã¦ã¾ãããããéã£ãããã®ããã°æå³ãªãã¨æããã§ãæ¾å¿µãã ãã)
大ä½ã®å ´åããããªæãã®ææ¨ã使ã£ã¦è©ä¾¡ããããããªãã§ããããï¼
- Generation
- Faithfulness
- Answer relevancy
- Retrieval
- Context precision
- Context recall
- Context relevancy
- Context entity recall
ãã®è¾ºã®ææ¨ãããæãã«è¨ç®ã§ãããã¼ã¿ã»ãããã»ããããã§ãã
ãã¼ã¿ã»ããä½æãèãã
ãããã©ããªãµãã«ãã¦è©ä¾¡ç¨ãã¼ã¿ã»ãããä½æãããè¯ããã£ã¦è©±ã«ã¤ãã¦èãã¦ã¿ã¾ãã
åæ
RAGASã¨ãã§ãèªåã§ãã¼ã¿ã»ãããä½æã§ããããã¾ãã
ãã®è¾ºã¯ããã¾ã詳ãããªãã§ãããããã¡ãã£ã¨ç¶æ³ã«å¿ãã¦ãã¹ããã¼ã¿ãä½æããããªãå ´é¢ãããã¾ãã æ£ç´ãã¹ããã¼ã¿èªä½ã®å質ã«ã¯è°è«ã®ä½å°ã¯ããã¾ãããããã§ã¯RAGASãªã©ã使ç¨ããã«æå¾ ããå½¢ã§ãã¹ããã¼ã¿ã»ãããä½æãããã¨ãç®æãããã¨æãã¾ãã
Retrieval
Retrievalã ã¨å¤ãã®å ´åãprecisionãrecallã®ãããªææ¨ãç¨ãããã¾ãããããã測å®ããããã«ã¯ã質åãã®åçã«å¿ è¦ã¨ãªããæ ¹æ ã¨ãªãããã¹ãããå¿ è¦ã«ãªãã¾ãã
chunkã®idãªã©ã§ãè¯ãã®ããç¥ãã¾ããããå ´åã«ãã£ã¦ã¯chunkåå²ã®æ¹æ³ãæ¹åããå ´åã¯è©ä¾¡ã®ãã³ã«chunkã®idãæ¯ãç´ãæéãå¿ è¦ã«ãªãã®ã§ããæ ¹æ ã¨ãªãããã¹ãããç¨æãã¦ããã¦ãããretrieveãããchunkã«å«ã¾ãã¦ãããã§åçã«ã©ããªã³ã°ããã®ãè¯ãããã«æãã¾ãã
Generation
RAGASã¨ãã ã¨å¤ãã®å ´åãRetrieveãããããã¹ãã«å¯¾ãã¦é¢é£ãã¦ãããããã«ã·ãã¼ã·ã§ã³ãã¦ããªããããªã©ãè©ä¾¡ããã¾ãã ãã ç¾å®åé¡ã¨ãã¦ãæçµçãªåçãæ³å®åçã¨ã©ãã ãå 容ãåè´ãã¦ãããã®ã»ããèå³ãããããããªãã§ããããï¼
ãããã£ãæçµçã«åçã§ãã¦ããããè©ä¾¡ããããã«ã¯ã質åãã«å¯¾å¿ããã模ç¯è§£çããå¿ è¦ã§ããã¨èãããã¾ãã
è¦ããã«
ã¾ã¨ããã¨ããã¹ãç¨ã®ãã¼ã¿ã»ããã®ä¸ããä¸è¨ã®ãããªé ç®ãä¸ããããã°è©ä¾¡ã§ããã¯ãã§ãã
- 質å
- Retrievalè©ä¾¡ã®ããã«å¿
è¦ãªæ
å ±
- 模ç¯è§£çã®ããã«å¿ è¦ãªæ ¹æ ã¨ãªãæ¬æã®ããã¹ã
- Generationè©ä¾¡ã®ããã«å¿
è¦ãªæ
å ±
- åçä¾
RAGã§ä½¿ç¨ããæ¬æããã¨ã«ãã®å 容ã®ãã¹ããã¼ã¿ãããã°ç²¾åº¦ã¨ãã¯ç¢ºèªã§ãããã§ãã ãããæåã§ä½æããã®ãä¸ã¤ã®æ¹æ³ã§ãããããã©ããããå±ã®ãããã¨ãã¦ã¯ãããAIã«ãã£ã¦ã»ããããã§ãã
æ¹é
ã¨ããããã§ããããªæãã«ãããèªååã§ããã®ã§ã¯ï¼ã¨æã£ãããã§ãã
æé ã¨ãã¦ã¯ãããªæãã«ããããã¨ã
- ãã¹ã対象ã¨ãªãææ¸ã決ãã
- ãã®ææ¸ãåºã«ãã¹ãã±ã¼ã¹ã®ãã¼ã¿ãLLMãéãã¦èªåçæãã
- ãã¹ã対象ã®ææ¸ã使ã£ã¦æ®éã«RAGãå®è£
- ãã¹ãã±ã¼ã¹ã使ç¨ãã¦RAGã®è©ä¾¡ãå®æ½
ãããªãå·¥æ°ããããã«ç°¡æãã¹ãã±ã¼ã¹ãä½æãããã¨ãã§ããã¯ãã§ãããããããè©ä¾¡ã決ã¾ã£ã¦ããã°ãããããã³ããã«åæ ããããã¨ã§ãã¹ãã±ã¼ã¹ãã«ã¹ã¿ãã¤ãºã§ããã®ã§ãæè»æ§ãé«ããã§ãã
å ´åã«ãã£ã¦ã¯ä¸é¨äººéãä½ã£ããã¼ã¿ãæ··ãã¦ãè¯ãããããã¾ãããLLMããã¹ãã±ã¼ã¹ãä½ãã¨ã©ããã¦ãåçãããã質åã«ãªããããã®ã§ããã®ãããã¯äººéãå®æ ã«å³ãã¦ãã¹ãã±ã¼ã¹ã追å ããã°è¯ãã¨æãã¾ãã
試ãã«ãã¼ã¿ã»ãããä½ã£ã¦è©ä¾¡ãã¦ã¿ã
ã¶ã£ã¡ããçå±ã¨ãã¯ã©ãã§ãè¯ãã®ã§ãå®éã«è©¦ãã«ãã¼ã¿ã»ãããä½ã£ã¦ãã£ã¦ã¿ã¾ãããã
ãã¼ã¿ã»ããä½æ
ãã¼ã¿ã»ããä½æã«é¢ãã¦ã¯ãããªæãã«ãã¦ã¿ã¾ããã è¦ããã«ãã¹ããã¼ã¿ã»ããã
json_schema = { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "question_and_answers": { "type": "array", "items": { "type": "object", "properties": { "question": { "type": "string", }, "reason_text": { "type": "string", }, "answer": { "type": "string", }, }, }, } }, "required": ["question"] } system_message = """\ ããªãã¯åªç§ãªå½èªã®å çã¨ãã¦æ¯ãèã£ã¦ãã ããã ããããææ¸ãå ¥åãã¾ãã ä¸ããããææ¸ã«é¢ãã¦åé¡(qusetion)ã¨æ¨¡ç¯è§£ç(answer)ã模ç¯è§£çãå°ãã®ã«åèã«ããªããã°ãªããªãç®æï¼reason_textï¼ã10åä½ã£ã¦ãã ããã ## æ¡ä»¶ - åé¡ï¼questionï¼ã¯ãä¸ããããææ¸ä¸ã®æ¨¡ç¯è§£çãå°ãã®ã«åèã«ããªããã°ãªããªãç®æï¼reason_textï¼ç¡ãã§ã¯ã¨åçã§ããªã質åã«ããè¤æ°ã®æ¨¡ç¯è§£çãæ³å®ãããåé¡ã«ããªãã§ãã ãã - 模ç¯è§£çï¼answerï¼ã¯ãä¸ããããææ¸ä¸ã®æ¨¡ç¯è§£çãå°ãã®ã«åèã«ããªããã°ãªããªãç®æï¼reason_textï¼ã«åºã¥ããã®ã«ãã¦ãã ãã - 模ç¯è§£çãå°ãã®ã«åèã«ããªããã°ãªããªãç®æï¼reason_textï¼ã¯ãä¸ããããææ¸ãã該å½ããç®æãæ¸ããã¦ããéãã«æãåºãã¦ãã ãã """ completion = client.chat.completions.create( model="gpt-4o-mini", # ã¢ãã«ã®æå® messages=[ {"role": "system", "content": system_message}, {"role": "user", "content": [ {"type": "text", "text": text} ], } ], functions=[ {"name": "question_and_answers", "parameters": json_schema} ], function_call={"name": "question_and_answers"}, ) results = completion.choices[0].message.function_call.arguments
ãããªæãã«ããã°ã¨ããããã»ããã£ããã¹ãã±ã¼ã¹ã¯ä½æã§ãã¾ãã
è©ä¾¡
次ã¯ãã®ãã¹ãã±ã¼ã¹ã使ç¨ãã¦ã©ããªæãã§è©ä¾¡ããã®ããã£ã¦ã¿ããã¨æãã¾ãã
- Generation
- Answer Correctness: äºåã«ç¨æãã模ç¯è§£çã¨RAGã«ããåçãåãå 容ã«ãªã£ã¦ããã
- Retrieval
- Precision
- Recall
- NDCG
ãããªæãã«ãä»åç¨ã«ã«ã¹ã¿ãã¤ãºãã¦ãã£ã¦ã¿ã¾ãã
Retrievalè©ä¾¡æ
模ç¯è§£çä½ææã«æ¨¡ç¯è§£çã®ããã«å¿
è¦ãªæ ¹æ ã¨ãªãæ¬æã®ããã¹ã
ãæ¬æã®å
容ãæ£ç¢ºã«æ½åºã§ãã¦ããã°è¯ãã§ãããå¿
ããããããªã£ã¦ããªããã¨ãããã¾ããï¼ãã¨, ãå¤ãã£ã¦ãã¾ã£ãããæ¹è¡ã³ã¼ããå
¥ã£ããæããããç´°ããã¨ããã¯å¤ãã£ããããã®ã§ï¼
RAGã®Retrievalæã«hitããchunkã模ç¯è§£çã®ããã«å¿
è¦ãªæ ¹æ ã¨ãªãæ¬æã®ããã¹ã
ã¨ã´ã£ããä¸è´ããããã«åå²ããã¦ããªããã¨ãããã®ã§ãå°ãªãã¨ãå®å
¨ä¸è´ãå«æã«ãã£ã¦æ£è§£chunkãç¹å®ããã®ã¯ã¡ãã£ã¨ããããããã§ãã
ã¨ãããã¨ã§ã模ç¯è§£çã®ããã«å¿
è¦ãªæ ¹æ ã¨ãªãæ¬æã®ããã¹ã
ãretrievalãããchunkã«å«ã¾ãã¦ããããLLMã使ã£ã¦å¤å®ãããã¨æãã¾ãã
#@title chunkãæ¬æ¥ã®åç §ããã¹ãã¨é¢é£ãã¦ãããå¤å®ãã def eval_chunk_relevance(reference, true_reason_text): json_schema = { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "relevance": { "type": "boolean", } }, "required": ["relevance"] } system_message = """\ ããããææ¸ãå ¥åãã¾ãã text_1ã«è¨è¿°ããã¦ããå 容ãtext_2ã«å«ã¾ãã¦ããå ´åã¯True, å«ã¾ãã¦ããªãå ´åã¯Falseã¨è¿ãã¦ãã ããã """ message = f"""\ ## text_1 {true_reason_text} ## text_2 {reference} """ completion = client.chat.completions.create( model="gpt-4o-mini", # ã¢ãã«ã®æå® messages=[ {"role": "system", "content": system_message}, {"role": "user", "content": [ {"type": "text", "text": message} ], } ], functions=[ {"name": "eval_relevance", "parameters": json_schema} ], function_call={"name": "eval_relevance"}, ) results = completion.choices[0].message.function_call.arguments return json.loads(results)["relevance"]
Generationè©ä¾¡æ
模ç¯è§£çã¨RAGã«ããåçéã«é¢ãã¦ãä¸è¨ä¸å¥åãã«ãªããã¨ã¯ã¾ãããããªãã®ã§ãåæ¹ã®å 容ãä¸è´ãã¦ããããLLMã§å¤å®ãã¦ããããã¨ã«ãã¾ãã
def eval_answer_correctness(output, answer): json_schema = { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "correct": { "type": "boolean", } }, "required": ["correct"] } system_message = """\ ããããææ¸ãå ¥åãã¾ãã text_1ã«è¨è¿°ããã¦ããå 容ã¨text_2ã«è¨è¿°ããã¦ããå 容ãåãå 容ã®å ´åã¯True, ç°ãªãå 容ã®å ´åã¯Falseã¨è¿ãã¦ãã ããã """ message = f"""\ ## text_1 {output} ## text_2 {answer} """ completion = client.chat.completions.create( model="gpt-4o-mini", # ã¢ãã«ã®æå® messages=[ {"role": "system", "content": system_message}, {"role": "user", "content": [ {"type": "text", "text": message} ], } ], functions=[ {"name": "eval_correctness", "parameters": json_schema} ], function_call={"name": "eval_correctness"}, ) results = completion.choices[0].message.function_call.arguments return json.loads(results)["correct"]
Arize phoenixã§ãããã«ã¢ãã¿ã¼ãã¦ã¿ã
æå¾ã«ãã¹ãã®è©ä¾¡ãããæãã«ãã©ã¦ã¶ã§è¦ã¦ã¿ããã¨æãã¾ãã ããã¯å®å ¨ã«ãã¾ããªã®ã§ãããªãã¦å ¨ç¶OKã§ãã
Arize Phoenixã«ä»åç¬èªï¼ï¼ï¼ã«è¨ç®ããææ¨ã追å ããã«ã¯ãããªæãã«ãã¦ãããã°è¿½å ã§ãã¾ãã
from phoenix.trace import DocumentEvaluations, SpanEvaluations px.Client().log_evaluations( SpanEvaluations(dataframe=ndcg_at_2, eval_name="ndcg@2"), SpanEvaluations(dataframe=precision_at_2, eval_name="precision@2"), DocumentEvaluations(dataframe=retrieved_documents_df, eval_name="relevance"), )
from phoenix.trace import SpanEvaluations px.Client().log_evaluations( SpanEvaluations(eval_name="QA Correctness", dataframe=queries_df), )
ç»é¢ã§ã¿ãã¨ãããªæãã§ãã
åå¥ã«ãããã£ã¦ç¢ºèªã§ããã¨ã¡ãã£ã¨å¬ãããã¨ããããããããªãã§ãã
ãã¨ã¯Retrievalãæ¹åãããªãGenerationãæ¹åãããªããã¦ããããã®ææ¨ãæ¹åããããè¦ã¦ããã°å¹çããæ¹åãåãã¦ããããã§ããã
使ã£ãnotebook
ä»å使ç¨ããnotebookã¯ãã¡ãã§ãã
ææ³
以ä¸ãè©ä¾¡ç¨ãã¼ã¿ã»ãããä½æããã®ããã¾ãã«æéã«æããã®ã§ãããã£ãã®ãã¨AIã«ãããã¦ãã¾ã£ã¦ã¯ã©ããï¼ãã¨ãã£ã¦ã¿ãè¨äºã§ããã
ããããã£ã¨è¯ãæ¹æ³ãããã ãã©ãªã¼ãã£ã¦ã³ã¡ã³ãããã人ãããããã²æãã¦ä¸ããã