We built RapidFire AI, an open-source framework that lets you compare dozens (or hundreds) of RAG and context engineering configurations in parallel, without needing a GPU cluster.
Tuning a RAG pipeline means experimenting with chunk sizes, embedding models, retrieval strategies, reranking thresholds, prompt schemes, generator models, and more. With traditional tools, you run these sequentially, wait for each to finish on the full dataset, and then compare. That's painfully slow and wastes tokens/compute on configs you'd have killed after seeing the first 10% of results.
RapidFire AI shards your eval dataset and schedules all configs one shard at a time, cycling through them with efficient swapping. You get running metric estimates with confidence intervals in real time, based on online aggregation from the database systems literature. Spot a bad config early? Stop it. See a promising one? Clone it and tweak knobs on the fly, no restart needed.
On a beefy machine you can comfortably run 100+ configs in a single experiment. Want to see it in action without installing anything? We have a Google Colab tutorial that runs 4 RAG retrieval configs in parallel on a free Colab GPU, zero local setup, under 2 minutes to get started. It builds a financial Q&A pipeline on the FiQA dataset, grid-searches over chunk sizes and reranker settings, and shows live metrics with confidence intervals as the configs run. If you're only calling OpenAI or other closed APIs, you don't even need a GPU at all.
Colab: https://colab.research.google.com/github/RapidFireAI/rapidfi...
We'd love feedback on what knobs/integrations matter most to you. Happy to answer questions here.
kbigdelysh•1h ago