Evaluating an AgentKit app can be done on multiple levels:
- Routing layer: Evaluate the meta agent's accuracy of choosing the right action plan based on the user query
- Tool layer: Evaluate individual tools
- Output layer: Evaluate the final output quality
AgentKit natively integrates with LangSmith, which is a useful tool for tracing and tracking the performance of your app. https://docs.smith.langchain.com/
See Optional Features for instructions to set up LangSmith.