A novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs. It provides a theoretical framework for LLM routing and delivers comparative analysis of various routing approaches, setting a standard for assessment of multi-LLM deployments.
from benchthing import Bench
bench = Bench("router-bench")
bench.run(
benchmark="router-bench",
task_id="1",
models=['model-router-1', 'model-router-2']
)
result = bench.get_result("1")