An extension of SWE-bench focused on visual, user-facing JavaScript software. Features 617 task instances from 17 JavaScript libraries used for web interface design, diagramming, data visualization, syntax highlighting, and interactive mapping. Each task contains at least one image in its problem statement or unit tests.
from benchthing import Bench
bench = Bench("swe-bench-multimodal")
bench.run(
benchmark="swe-bench-multimodal",
task_id="1",
models=['yourCodeModel', 'yourCodeModel2']
)
result = bench.get_result("1")