1. Estimates tokens/costs for prompts across OpenAI/Gemini/Anthropic-style models 2. Runs load tests against real LLM endpoints with progress bars, retries, and (optional) dry runs
It started as a glorified “how many tokens will this cost?” script. Over the last release I added provider selection (--provider {openai|openrouter|anthropic|generic}) plus real Anthropic and “bring-your-own-endpoint” clients so I can stress-test gateways before giving them real traffic.
Try it # install curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/inst... | bash
# dry-run Anthropic echo "Hello!" | tokuin load-test \ --model claude-3-sonnet \ --provider anthropic \ --runs 5 \ --concurrency 2 \ --dry-run --estimate-cost
# generic endpoint smoke test echo "Ping" | tokuin load-test \ --model lambda-1 \ --provider generic \ --endpoint https://example.com/infer \ --runs 10 --concurrency 2
Repo (MIT/Apache-2.0): https://github.com/nooscraft/tokuin
What’s different 1. Provider-aware CLI: auto-detects from model names but you can force it when needed. 2. Built-in Anthropic client and a generic REST adapter (just point at an endpoint and go). 3. Optional --dry-run yields the same metrics without burning API credits. 4. Auth stays in env vars/flags—no config files or dashboards.
Implementation notes 1. Rust 2021 + tokio, reqwest, indicatif. 2. Load simulator schedules requests, tracks latencies, histograms, and costs. 3. Token estimation uses tiktoken-rs and a simple pricing registry.
Feedback I’m looking for 1. Are the CLI defaults (timeouts, retry curve) sensible for real traffic? 2. Should generic mode accept response extraction hooks so it works with more JSON shapes? 2. Any load-test metrics you’d want before trusting this in CI?
Thanks for trying it out—happy to answer questions or take feature requests.