Hi HN,
Built llm-use: a lightweight Python toolkit for efficient agent workflows with multiple LLMs.
Core pattern: strong model (Claude/GPT-4o/big local) for planning + synthesis; cheap/local workers for parallel subtasks (research, scrape, summarize, extract…).
Features:
• Mix Anthropic, OpenAI, Ollama, llama.cpp
• Smart router: cheap/local first, escalate only if needed (learned + heuristic)
• Parallel workers (–max-workers)
• Real scraping + cache (BS4 or Playwright)
• Offline-first (full Ollama support)
• Cost tracking ($ for cloud, 0 local)
• TUI chat + MCP server mode
• Local session logs
Quick example (hybrid):
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8b \
--enable-scrape \
--task "Summarize 6 recent sources on post-quantum crypto"
Or routed version:
python3 cli.py exec \
--router ollama:llama3.1:8b \
--orchestrator openai:o1 \
--worker gpt-4o-mini \
--task "Explain recent macOS security updates"
MIT licensed, minimal deps, embeddable.
Repo: https://github.com/llm-use/llm-use
Feedback welcome on:
• Routing heuristics you’d find useful
• Pain points with agent costs / local vs cloud
• Missing integrations?
Thanks!