BIG-bench

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark containing more than 200 tasks to probe large language models and extrapolate their future capabilities. Tasks cover areas including reasoning, common sense, creativity, and various other cognitive abilities.

from benchthing import Bench

bench = Bench("big-bench")

bench.run(
    benchmark="big-bench",
    task_id="1",
    models=yourLanguageModels
)

result = bench.get_result("1")

BIG-bench

Sign up to get access to the BIG-bench benchmark API