LegalBench is an ongoing open science effort to collaboratively curate tasks for evaluating legal reasoning in English large language models (LLMs). The benchmark currently consists of 162 tasks gathered from 40 contributors, covering a wide range of textual types, task structures, legal domains, and difficulty levels.
import { Bench } from 'benchthing';
const bench = new Bench('legalbench');
await bench.run({
benchmark: 'legalbench',
taskId: '1',
models: yourLegalModels,
});
const result = await bench.getResult('1');