A benchmark containing 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. The best-performing setup--OpenAI's o1-preview with AIDE scaffolding--achieves at least the level of a Kaggle bronze medal in 16.9% of competitions.
from benchthing import Bench
bench = Bench("mle-bench")
bench.run(
benchmark="mle-bench",
task_id="1",
agents=yourMLEngineeringAgent
)
result = bench.get_result("1")