In our blog, we report MT Bench results comparing our routers to Martian and Unify AI, two commercial offerings for routing. We conduct these benchmarks using the official FastChat repository, replacing model calls to either Martian or Unify AI using an OpenAI-compatible server.
For Unify AI, we pick the best-performing router on the user dashboard router@q:1|c:1.71e-03|t:1.10e-05|i:1.09e-03
and use it for benchmarking, allowing calls to either GPT-4 Turbo or Mixtral 8x7B. Using this router, we obtain a MT Bench score of 8.757862
with 45.625%
of calls routed to GPT-4. In comparison, our best-performing router achieves the same performance with 25.40% GPT-4 calls.
client = openai.OpenAI(
base_url="https://api.unify.ai/v0/",
api_key="UNIFY_API_KEY"
)
response = client.chat.completions.create(
model="router@q:1|c:1.71e-03|t:1.10e-05|i:1.09e-03|models:gpt-4-turbo,mixtral-8x7b-instruct-v0.1",
...
)
For Martian, we allow calls to either GPT-4 Turbo or Llama 2 70B Chat based on the list of supported models. Because the API does not return which model each request is routed to, we use the max_cost_per_million_tokens
parameter to estimate the % of GPT-4 calls. Specifically, we set the max_cost_per_million_tokens
to be $10.45, a value approximated using public inference costs for llama-2-70b-chat
and gpt-4-turbo-2024-04-09
from Together.AI and OpenAI. Given a per M tokens cost of $0.90 for Llama 2 70B Chat and per M tokens cost of $20 for GPT-4 Turbo (assuming a 1:1 input:output ratio), we calculate ($20 + $0.90) / 2 = $10.45
so that approximately 50% of calls are routed to GPT-4. Using this, we obtain a MT Bench score of 8.3125
. In comparison, our best-performing router achieves the same performance with 29.66% GPT-4 calls.
client = openai.OpenAI(
base_url="https://withmartian.com/api/openai/v1",
api_key="MARTIAN_API_KEY",
)
response = client.chat.completions.create(
model="router",
extra_body={
"models": ["gpt-4-turbo-128k", "llama-2-70b-chat"],
"max_cost_per_million_tokens": 10.45,
},
...
)
The full MT Bench results and judgements are available in the mt-bench/
directory.