Show HN: LLM Simulation – Experience TTFT and tokens/SEC before investing

1•hertzdog•2mo ago

I built a small tool to simulate the user experience of LLM response speeds, focusing on TTFT (time to first token) and tokens/second.

Instead of reading benchmark numbers, you can feel how fast or slow different configurations are, by adjusting TTFT, token generation rate, and output length. It streams tokens exactly as an LLM would, but without generating real content.

I was wondering which Apple should I buy and then I did it in the weekend, to better feel what does it mean to run locally a model.

The project/toy is public on github too: https://github.com/htxsrl/localllmsimulation

Thanks to the sources (cited) for the real benchmarks that allowed me to set up a small ML model to fit even futuristic hardware (like an imaginary M9 with 2048 Gb RAM and 3000Gb/s bandwidth).

Comments

ndgold•2mo ago

Lovely, yoinked.

Also, I’m seeing check marks next to all quants which confused me a little bit when trying to select.

hertzdog•2mo ago

Thanks! I added the check marks because when I was testing different quantizations, I often picked a model and only afterwards discovered that it couldn’t even load — it just didn’t fit into RAM or VRAM.

So the check mark simply indicates that the model can actually run under those constraints (fits in memory), not that it’s selected.

The AI Talent War Is for Plumbers and Electricians

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

I Maintain My Blog in the Age of Agents

The Fall of the Nerds

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

How close is AI to taking my job?

You are the reason I am not reviewing this PR

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

How Meta Made Linux a Planet-Scale Load Balancer

A Turing Test for AI Coding

How to Identify and Eliminate Unused AWS Resources

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

CLI for Common Playwright Actions

Would you use an e-commerce platform that shares transaction fees with users?

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

The Evolution of the Interface

Azure: Virtual network routing appliance overview

Seedance2 – multi-shot AI video generation

Πfs – The Data-Free Filesystem

Go-busybox: A sandboxable port of busybox for AI agents

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

xAI Merger Poses Bigger Threat to OpenAI, Anthropic

Atlas Airborne (Boston Dynamics and RAI Institute) [video]

Zen Tools

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

The purpose of Continuous Integration is to fail

Apfelstrudel: Live coding music environment with AI agent chat

What Is Stoicism?

What happens when a neighborhood is built around a farm