Agentbench is a sophisticated evaluation framework designed to assess the capabilities of Large Language Models (LLMs) when functioning as autonomous agents. It presents a variety of complex tasks and scenarios to test the models' ability to understand, plan, and execute actions in diverse environments.
import { Bench } from 'benchthing';
const bench = new Bench('agentbench');
await bench.run({
benchmark: 'agentbench',
taskId: '1',
agents: yourAgents,
});
const result = await bench.getResult('1');