A unified evaluation platform for assessing Speculative Decoding methods in the same device and testing environment, ensuring fair comparisons. Supports evaluation of multiple open-source models including EAGLE-2, Hydra, Medusa, and others. Accepted at ACL 2024 Findings.
from benchthing import Bench
bench = Bench("spec-bench")
bench.run(
benchmark="spec-bench",
task_id="1",
models=yourLanguageModels
)
result = bench.get_result("1")