frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Fall of the Nerds

https://www.noahpinion.blog/p/the-fall-of-the-nerds
1•otoolep•33s ago•0 comments

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

https://the-lexicon-project.netlify.app/
1•breadwithjam•3m ago•1 comments

How close is AI to taking my job?

https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job
1•cjbarber•3m ago•0 comments

You are the reason I am not reviewing this PR

https://github.com/NixOS/nixpkgs/pull/479442
2•midzer•5m ago•1 comments

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

https://familymemories.video
1•tareq_•7m ago•0 comments

How Meta Made Linux a Planet-Scale Load Balancer

https://softwarefrontier.substack.com/p/how-meta-turned-the-linux-kernel
1•CortexFlow•7m ago•0 comments

A Turing Test for AI Coding

https://t-cadet.github.io/programming-wisdom/#2026-02-06-a-turing-test-for-ai-coding
2•phi-system•7m ago•0 comments

How to Identify and Eliminate Unused AWS Resources

https://medium.com/@vkelk/how-to-identify-and-eliminate-unused-aws-resources-b0e2040b4de8
2•vkelk•8m ago•0 comments

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

https://github.com/MrTechGadget/A2C_DVI_SMD
2•mmoogle•8m ago•0 comments

CLI for Common Playwright Actions

https://github.com/microsoft/playwright-cli
3•saikatsg•9m ago•0 comments

Would you use an e-commerce platform that shares transaction fees with users?

https://moondala.one/
1•HamoodBahzar•11m ago•1 comments

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

https://github.com/ykdojo/safeclaw
2•ykdojo•14m ago•0 comments

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-3
3•gmays•14m ago•0 comments

The Evolution of the Interface

https://www.asktog.com/columns/038MacUITrends.html
2•dhruv3006•16m ago•1 comments

Azure: Virtual network routing appliance overview

https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-routing-appliance-overview
2•mariuz•16m ago•0 comments

Seedance2 – multi-shot AI video generation

https://www.genstory.app/story-template/seedance2-ai-story-generator
2•RyanMu•20m ago•1 comments

Πfs – The Data-Free Filesystem

https://github.com/philipl/pifs
2•ravenical•23m ago•0 comments

Go-busybox: A sandboxable port of busybox for AI agents

https://github.com/rcarmo/go-busybox
3•rcarmo•24m ago•0 comments

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

https://research.nvidia.com/labs/nemotron/files/NVFP4-QAD-Report.pdf
2•gmays•25m ago•0 comments

xAI Merger Poses Bigger Threat to OpenAI, Anthropic

https://www.bloomberg.com/news/newsletters/2026-02-03/musk-s-xai-merger-poses-bigger-threat-to-op...
2•andsoitis•25m ago•0 comments

Atlas Airborne (Boston Dynamics and RAI Institute) [video]

https://www.youtube.com/watch?v=UNorxwlZlFk
2•lysace•26m ago•0 comments

Zen Tools

http://postmake.io/zen-list
2•Malfunction92•28m ago•0 comments

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

https://hailey.at/posts/3mear2n7v3k2r
2•carnevalem•29m ago•1 comments

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•31m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
2•rcarmo•32m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•32m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•32m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
3•Brajeshwar•33m ago•0 comments

Extreme Inequality Presages the Revolt Against It

https://www.noemamag.com/extreme-inequality-presages-the-revolt-against-it/
2•Brajeshwar•33m ago•0 comments

There's no such thing as "tech" (Ten years later)

1•dtjb•34m ago•0 comments
Open in hackernews

Ask HN: How the same LLM "instance" serve multiple clients?

1•BiraIgnacio•9mo ago
I've been playing with running LLMs locally and only then realized I have no idea how to scale it (I don't really know how LLMs work internally).

I'm assuming context is everything but if the same LLM process can serve multiple clients, aren't there risks of mixing contexts? Does anyone have any ideas?

Comments

sherdil2022•9mo ago
Let me ChatGPT for you:

Good question. Let’s break it down carefully.

When you hear about a single LLM instance serving multiple clients at the same time, it usually works like this: • The LLM instance is stateless: Each client sends a request (prompt + settings), the model processes that one request independently, and returns the response. The LLM doesn’t “remember” between requests unless you explicitly include conversation history in the prompt. • Concurrency is handled by infrastructure: Even though the LLM is “one model,” it can handle many incoming requests because the backend (server) wraps the model with techniques like: • Asynchronous request handling (e.g., using async/await patterns) • Batching: multiple prompts are packed together into a single forward pass through the model (very common in high-traffic servers) • Parallelism: the server could have multiple workers/replicas of the model (copies or shared GPUs) running side-by-side. • Queueing: if too many clients at once, requests are queued and processed in order. • Memory isolation: Each request is kept separate in memory. No client’s data leaks into another client’s conversation unless you (the app developer) accidentally introduce a bug.

So:

It’s not that one model is “locked” into serving only one person at a time. It’s more like the model is a very fast function being called many times in parallel.

⸻