- llm-stream — streaming from OpenAI + Anthropic, callback-based
- llm-cache — file-backed semantic cache, LRU eviction
- llm-cost — offline token counting + cost estimation
- llm-retry — exponential backoff + circuit breaker + provider failover
- llm-format — structured output enforcer with hand-rolled JSON parser
Drop in one .hpp, link libcurl, done. No nlohmann, no boost, no Python.
https://github.com/Mattbusel/llm-stream
https://github.com/Mattbusel/llm-cache
https://github.com/Mattbusel/llm-cost
https://github.com/Mattbusel/llm-retry
https://github.com/Mattbusel/llm-format