A question answering dataset containing 113K crowd-sourced questions from Wikipedia that require reasoning across multiple paragraphs. Each question comes with gold paragraphs and supporting facts identified by crowdworkers. Features diverse reasoning strategies including missing entities, intersection questions, and comparison questions.
from benchthing import Bench
bench = Bench("hotpot-qa")
bench.run(
benchmark="hotpot-qa",
task_id="1",
models=yourQuestionAnsweringModel
)
result = bench.get_result("1")