I built Sieves, an open-source Python library that makes it easy to build structured document AI pipelines without locking yourself into any specific LLM framework.
You can mix and match model frameworks (Outlines, LangChain, DSPy, GLiNER2, Transformers) for different tasks while keeping one declarative pipeline definition. E.g. fast local models for classification and frontier LLMs for more challenging tasks.
Includes:
- unified task and schema abstractions
- pipeline for task chaining
- execution across multiple backends
- built-in evaluation, optimization, and conditional task execution
- support for distillation to smaller models
Full motivation, design rationale, and examples in the linked blog.tl;dr: I was doing a lot of consulting/prototyping for document AI projects and kept running into the same lock-in and boilerplate issues, so I decided to write a library that addresses this.
If you're working a lot with document AI in greenfield projects, this may be interesting to you.
Happy to answer questions or feedback!
Repo: https://github.com/MantisAI/sieves / docs: https://sieves.ai