A comprehensive benchmark that evaluates Large Multimodal Models (LMMs) from an image generation perspective. Features MMGenBench-Test with 13 distinct image patterns and MMGenBench-Domain for domain-specific evaluation. Uses an automated pipeline where LMMs generate prompts from input images, which are then used by text-to-image models to recreate the original image.
from benchthing import Bench
bench = Bench("mmgenbench")
bench.run(
benchmark="mmgenbench",
task_id="1",
models=['multimodal-model-1', 'multimodal-model-2']
)
result = bench.get_result("1")