SWE-bench Multimodal

An extension of SWE-bench focused on visual, user-facing JavaScript software. Features 617 task instances from 17 JavaScript libraries used for web interface design, diagramming, data visualization, syntax highlighting, and interactive mapping. Each task contains at least one image in its problem statement or unit tests.

from benchthing import Bench

bench = Bench("swe-bench-multimodal")

bench.run(
    benchmark="swe-bench-multimodal",
    task_id="1",
    models=['yourCodeModel', 'yourCodeModel2']
)

result = bench.get_result("1")

SWE-bench Multimodal

Sign up to get access to the SWE-bench Multimodal benchmark API