BIRD (Big Bench for Large-scale Database Grounded Text-to-SQL Evaluation) contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It covers more than 37 professional domains, designed to evaluate the performance of text-to-SQL models on large-scale, real-world databases.
import { Bench } from 'benchthing';
const bench = new Bench('bird-sql');
await bench.run({
benchmark: 'bird-sql',
taskId: '1',
models: ['text-davinci-003', 'gpt-3.5-turbo'],
});
const result = await bench.getResult('1');