Overview

MEGA-Bench

A large-scale evaluation suite containing 505 realistic tasks encompassing over 8,000 samples from 16 expert annotators. Unlike other benchmarks that use standard multi-choice questions, MEGA-Bench supports diverse output formats including numbers, phrases, code, LaTeX, coordinates, JSON, and free-form text, evaluated using over 40 different metrics.

from benchthing import Bench

bench = Bench("mega-bench")

bench.run(
    benchmark="mega-bench",
    task_id="1",
    models=['multimodal-model-1', 'multimodal-model-2']
)

result = bench.get_result("1")

Sign up to get access to the MEGA-Bench benchmark API