I was a bit overwhelmed with the different ways that you can process documents to create embeddings for RAG, so I wanted to create a tool to experiment with different OCR models, refining the OCR results, different chunking methods, and different embedding models.
You can: - search processed documents in the playground - evaluate the retrieval results using an llm-as-judge (not perfect, but can be a useful signal) - compare different datasets (using aggregate metrics or by side by side comparison in the playground)
You can also manually inspect the results of each query, and of each intermediate document processing result.
To get a better idea, check out one of the use cases: https://ragbandit.com/use-cases/optimizing-insurance-documen...
To be completely fair, I haven't added that many options for the different stages of the document processing pipeline! There are tons of features that I'd like to add, but I've already spent quite a bit of time on this, so I'd really appreciate it if you could let me know if this is something that could be useful for you/you find interesting. Would you use something like this?
Tech stack: Postgres (with pgvector), fastapi, [ragbandit-core](https://github.com/MartimChaves/ragbandit-core) (the document processing core is open source), typescript with react, celery for background tasks (and redis as the broker).
It's currently a credits-based subscription with optional top-ups. You can get 1000 credits to try it out (I ask for card info for these 1000 credits as a spam filter).
Thanks, Martim