Overview

SWE-bench

SWE-bench is a comprehensive benchmark designed to evaluate the performance of AI models on real-world software engineering tasks. It provides a diverse set of challenges that mirror the complexities of actual software development scenarios.

import { Bench } from 'benchthing';

const bench = new Bench('swe-bench');

await bench.run({
  benchmark: 'swe-bench',
  taskId: '1',
  models: ['yourCodeModel', 'yourCodeModel2'],
});

const result = await bench.getResult('1');

Sign up to get access to the SWE-bench benchmark API