The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark containing more than 200 tasks to probe large language models and extrapolate their future capabilities. Tasks cover areas including reasoning, common sense, creativity, and various other cognitive abilities.
from benchthing import Bench
bench = Bench("big-bench")
bench.run(
benchmark="big-bench",
task_id="1",
models=yourLanguageModels
)
result = bench.get_result("1")