frontpage.

Show HN: Tokuin – CLI load tester and token estimator for LLM APIs

https://github.com/nooscraft/tokuin

2•oshadha89•1d ago

Hi HN! I built Tokuin, a Rust CLI that does two things I kept hacking together for LLM projects:

1. Estimates tokens/costs for prompts across OpenAI/Gemini/Anthropic-style models 2. Runs load tests against real LLM endpoints with progress bars, retries, and (optional) dry runs

It started as a glorified “how many tokens will this cost?” script. Over the last release I added provider selection (--provider {openai|openrouter|anthropic|generic}) plus real Anthropic and “bring-your-own-endpoint” clients so I can stress-test gateways before giving them real traffic.

Try it # install curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/inst... | bash

# dry-run Anthropic echo "Hello!" | tokuin load-test \ --model claude-3-sonnet \ --provider anthropic \ --runs 5 \ --concurrency 2 \ --dry-run --estimate-cost

# generic endpoint smoke test echo "Ping" | tokuin load-test \ --model lambda-1 \ --provider generic \ --endpoint https://example.com/infer \ --runs 10 --concurrency 2

Repo (MIT/Apache-2.0): https://github.com/nooscraft/tokuin

What’s different 1. Provider-aware CLI: auto-detects from model names but you can force it when needed. 2. Built-in Anthropic client and a generic REST adapter (just point at an endpoint and go). 3. Optional --dry-run yields the same metrics without burning API credits. 4. Auth stays in env vars/flags—no config files or dashboards.

Implementation notes 1. Rust 2021 + tokio, reqwest, indicatif. 2. Load simulator schedules requests, tracks latencies, histograms, and costs. 3. Token estimation uses tiktoken-rs and a simple pricing registry.

Feedback I’m looking for 1. Are the CLI defaults (timeouts, retry curve) sensible for real traffic? 2. Should generic mode accept response extraction hooks so it works with more JSON shapes? 2. Any load-test metrics you’d want before trusting this in CI?

Thanks for trying it out—happy to answer questions or take feature requests.

Visualizing Why Bitcoin Can't Work over HF Radio

Eye of the Beholder – C64 vs. DOS monster comparison

Interesting SPI Routing with iCE40 FPGAs

Show HN: Active Memory Plugin in Claude Code

The Al Capone theory of sexual harassment

Resolving Gibbs-Bolzmann and von Neuman-Birkhoff ergodic tension

The truth about superintelligent models: humanity has less life left than you

United States National Design Studio

Think in Math. Write in Code

Show HN: Floxtop – Offline Mac app that organizes files and images by meaning

From building websites to playing Bartók, Pope Leo XIV is reimagining the papacy

Microsoft's data sovereignty: Now with extra sovereignty!

Show HN: Active Memory Plugin in Claude Code

Hackers are abusing a Google OAuth endpoint to hijack user sessions (2024)

Valori – A Python-native Vector Database I built from scratch

And that is just the places that start with 'M'

X is amplifying far-right accounts

Show HN: Teda.dev – AI app builder that beats Lovable

Termix: Web-based server management platform with SSH terminal and tunneling

How Linux would continue without Linus Torvalds

JavaScript – W3C Wiki

Women are hiding their boyfriends online and there's more than one reason why

Show HN: Nupst v5 –> Deno based shutdown tool protecting against power outage

They're not wolves – they're sheep

Climate Service Srl

Show HN: A simple TUI orgmode editor inspired by nano and k9s

There Has to Be a Better Way to Make Titanium

AI favors texts written by other AIs, even when they're worse than human ones

Ask HN: How would u setup a child's first Linux computer?

Strategies and Tools to make money on Polymarket

Show HN: Tokuin – CLI load tester and token estimator for LLM APIs

Visualizing Why Bitcoin Can't Work over HF Radio

Eye of the Beholder – C64 vs. DOS monster comparison

Interesting SPI Routing with iCE40 FPGAs

Show HN: Active Memory Plugin in Claude Code

The Al Capone theory of sexual harassment

Resolving Gibbs-Bolzmann and von Neuman-Birkhoff ergodic tension

The truth about superintelligent models: humanity has less life left than you

United States National Design Studio

Think in Math. Write in Code

Show HN: Floxtop – Offline Mac app that organizes files and images by meaning

From building websites to playing Bartók, Pope Leo XIV is reimagining the papacy

Microsoft's data sovereignty: Now with extra sovereignty!

Show HN: Active Memory Plugin in Claude Code

Hackers are abusing a Google OAuth endpoint to hijack user sessions (2024)

Valori – A Python-native Vector Database I built from scratch

And that is just the places that start with 'M'

X is amplifying far-right accounts

Show HN: Teda.dev – AI app builder that beats Lovable

Termix: Web-based server management platform with SSH terminal and tunneling

How Linux would continue without Linus Torvalds

JavaScript – W3C Wiki

Women are hiding their boyfriends online and there's more than one reason why

Show HN: Nupst v5 –> Deno based shutdown tool protecting against power outage

They're not wolves – they're sheep

Climate Service Srl

Show HN: A simple TUI orgmode editor inspired by nano and k9s

There Has to Be a Better Way to Make Titanium

AI favors texts written by other AIs, even when they're worse than human ones

Ask HN: How would u setup a child's first Linux computer?

Strategies and Tools to make money on Polymarket