frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Ask HN: How are you managing LLM inference at the edge?

7•gray_amps•2d ago
I’m building a system to run small LLMs on-device (mobile, IoT, on-prem servers) and would love to hear how others have tackled the challenges.

Context:

Use cases: offline chatbots, smart cameras, local data privacy

Models: 7–13B parameter quantized models (e.g. Llama 2, Vicuna)

Constraints: limited RAM/flash, CPU-only or tiny GPU, intermittent connectivity

Questions:

What runtimes or frameworks are you using (ONNX Runtime, TVM, custom C++)?

How do you handle model loading, eviction, and batching under tight memory?

Any clever tricks for quantization, pruning, or kernel fusions that boost perf?

How do you monitor and update models securely in the field?

Looking forward to your benchmarks, war stories, and code pointers!

Comments

byte-bolter•2d ago
I’m using ONNX Runtime with 4-bit quantization on a Raspberry Pi 4. I preload the quantized model into shared memory so multiple processes can reuse it. Evict old sessions by LRU when I hit a 1 GB RAM cap. For batching, I accumulate inputs over 50 ms to boost throughput without hurting latency. So far I get ~15 RPS on a 7 B Llama 2 model.

Ask HN: What are good high-information density UIs (screenshots, apps, sites)?

492•troupo•2d ago•359 comments

Ask HN: RAG or shared memory for task planning across physical agents?

10•mbbah•21h ago•1 comments

Ask HN: How much better are AI IDEs vs. copy pasting into chat apps?

134•lopatin•2d ago•132 comments

Ask HN: Escaping a Low-Paying Nepali IT Job and Ineffective Learning Cycle

7•shivajikobardan•11h ago•2 comments

Ask HN: What would you do with the #manga chat channel in Libera IRC network?

4•babuloseo•13h ago•1 comments

Ask HN: Is there a service that offers Common Crawl as an API?

6•georgehill•14h ago•2 comments

Ask HN: Anyone using knowledge graphs for LLM agent memory/context management?

9•mbbah•21h ago•1 comments

Blazeio.SharpEvent: A Python Async Primitive That Scales to 1M Waiters with O(1)

6•anonyxbiz•1d ago•0 comments

Ask HN: AI-Filtering Browser Extension?

7•v-yanakiev•1d ago•3 comments

AI Summarizer: Summarize Web, YouTube and PDFs in Seconds–Free

10•huizhu•2d ago•2 comments

Ask HN: How to get good at marketing your product and SEO?

4•flashblaze•1d ago•4 comments

OSUniverse: Building a Better OSWorld

5•mountainriver•2d ago•0 comments

Ask HN: How could vibe coding show the code at a high level to non-programmers?

6•amichail•1d ago•8 comments

Ask HN: Hackathons feel fake now

210•sepidy•5d ago•128 comments

Ask HN: Are you using AI coding assistance?

8•cloudking•1d ago•13 comments

Ask HN: Nvidia GeForce RTX 5060 arrives May 19 at $299 revive PC builds?

10•byte-bolter•2d ago•11 comments

Ask HN: How do you obtain software development contracts?

30•codingclaws•3d ago•17 comments

Getting tired of Helm – any better way to handle deployments in Kubernetes?

22•DeborahEmeni_•4d ago•21 comments

Ask HN: Did Aliexpress stop shipping to US?

27•olalonde•4d ago•17 comments

Why do websites prevent pasting via onpaste="return false;"

5•gleenn•1d ago•3 comments

Ask HN: Help us validate our idea of an administrative app for small businesses

3•Kuyawa•1d ago•1 comments

Ask HN: How are you managing LLM inference at the edge?

7•gray_amps•2d ago•1 comments

We built an AI-powered voice tool to boost sales

2•Artjoker•2d ago•1 comments

Ask HN: Why is the sender chat box always on the right?

5•bdhe•2d ago•8 comments

Is a Smaller Internet Better?

3•sawyersweet•2d ago•0 comments

Ask HN: What's the best framework for building Mac/Windows desktop apps in 2025?

7•anoojb•18h ago•6 comments

Ask HN: Which Firefox add-ons are you using in 2025?

6•vintageclothldn•2d ago•11 comments

Ask HN: Have you used Claude Code? Is it any good?

8•mbm•3d ago•9 comments

Ask HN: Privacy concerns when using AI assistants for coding?

6•Kholin•1d ago•5 comments

Ask HN: Has anyone managed to pass Meta's Access Verification?

24•hipgrave•5d ago•11 comments