Built this after hitting real latency issues in production. LangChain added 250ms+ overhead before the LLM was even called — routing, validation, and middleware stacked up fast.
The core realization: most "orchestration" is just routing + retry + caching. Those three things don't require 47 packages or 15MB of dependencies.
What it actually is: ~15KB of Python, 2 runtime deps (httpx + pydantic). Benchmarks on 1,000 requests showed 65ms avg vs LangChain's 420ms, 3MB memory/request vs 12MB.
The part I'm most proud of: the testing layer. MockLLMClient lets you inject deterministic responses so you can test agent routing and retry logic without hitting any APIs. Hard to do that cleanly with LangChain's abstractions.
214 tests, 91% coverage. Live demo at ct-agentforge.streamlit.app if you want to try it without cloning.
chunktort•36m ago
The core realization: most "orchestration" is just routing + retry + caching. Those three things don't require 47 packages or 15MB of dependencies.
What it actually is: ~15KB of Python, 2 runtime deps (httpx + pydantic). Benchmarks on 1,000 requests showed 65ms avg vs LangChain's 420ms, 3MB memory/request vs 12MB.
The part I'm most proud of: the testing layer. MockLLMClient lets you inject deterministic responses so you can test agent routing and retry logic without hitting any APIs. Hard to do that cleanly with LangChain's abstractions.
214 tests, 91% coverage. Live demo at ct-agentforge.streamlit.app if you want to try it without cloning.
Happy to go deep on any part of the architecture.