MLE-bench

A benchmark containing 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. The best-performing setup--OpenAI's o1-preview with AIDE scaffolding--achieves at least the level of a Kaggle bronze medal in 16.9% of competitions.

from benchthing import Bench

bench = Bench("mle-bench")

bench.run(
    benchmark="mle-bench",
    task_id="1",
    agents=yourMLEngineeringAgent
)

result = bench.get_result("1")

MLE-bench

Sign up to get access to the MLE-bench benchmark API