frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Ask HN: How are you managing LLM inference at the edge?

4•gray_amps•2h ago
I’m building a system to run small LLMs on-device (mobile, IoT, on-prem servers) and would love to hear how others have tackled the challenges.

Context:

Use cases: offline chatbots, smart cameras, local data privacy

Models: 7–13B parameter quantized models (e.g. Llama 2, Vicuna)

Constraints: limited RAM/flash, CPU-only or tiny GPU, intermittent connectivity

Questions:

What runtimes or frameworks are you using (ONNX Runtime, TVM, custom C++)?

How do you handle model loading, eviction, and batching under tight memory?

Any clever tricks for quantization, pruning, or kernel fusions that boost perf?

How do you monitor and update models securely in the field?

Looking forward to your benchmarks, war stories, and code pointers!

Comments

byte-bolter•2h ago
I’m using ONNX Runtime with 4-bit quantization on a Raspberry Pi 4. I preload the quantized model into shared memory so multiple processes can reuse it. Evict old sessions by LRU when I hit a 1 GB RAM cap. For batching, I accumulate inputs over 50 ms to boost throughput without hurting latency. So far I get ~15 RPS on a 7 B Llama 2 model.

Rust Dependencies Scare Me

https://vincents.dev/blog/rust-dependencies-scare-me/?
1•vsgherzi•16s ago•0 comments

Nothing Radicalizes You Against Dirty Diesels Like Riding a Motorcycle

https://www.jalopnik.com/1852318/riding-motorcycle-radicalizes-against-dirty-diesels/
1•rntn•1m ago•0 comments

Show HN: I Built Cursor for CSV

https://www.tablab.app/csv/view
1•scottgpaulin•2m ago•0 comments

State of Docs Report 2025

https://www.stateofdocs.com/2025/
1•wayneshng•2m ago•0 comments

Claude Code: Anthropic's Agent in Your Terminal

https://www.latent.space/p/claude-code
1•swyx•3m ago•0 comments

Stability by Design

https://potetm.com/devtalk/stability-by-design.html
1•potetm•4m ago•0 comments

High-income groups disproportionately contribute to climate extremes

https://www.nature.com/articles/s41558-025-02325-x
2•colinprince•6m ago•0 comments

Show HN: Extension for full-text browser history search

https://rearview-ai.vercel.app/
3•ApbNfMR•7m ago•0 comments

5 Common Antipatterns in Payment Systems Design

https://news.alvaroduran.com/p/5-common-antipatterns-in-payment
1•ohduran•8m ago•0 comments

Bill Gates Accuses Elon Musk of 'Killing Children' by Cutting Foreign Aid

https://www.nytimes.com/2025/05/08/us/bill-gates-elon-musk-killing-children.html
3•breadwinner•9m ago•1 comments

Supporting Independent Businesses Should Be as Easy as Finding Starbucks

https://www.electro-app.com/home
2•piotrsirko•9m ago•0 comments

Engineers create a robot that can jump 10 feet high–without legs

https://techxplore.com/news/2025-04-robot-feet-high-legs.html
1•PaulHoule•11m ago•0 comments

$100K/day cloud bill isn't a Bug – it's by Design

https://old.reddit.com/r/aethernet/comments/1khyt39/100kday_cloud_bill_isnt_a_bug_its_by_design
1•todsacerdoti•11m ago•0 comments

Hard-Earned Lessons from 2 Years of Improving AI Applications

https://blog.ragas.io/hard-earned-lessons-from-2-years-of-improving-ai-applications
1•amrrs•12m ago•0 comments

Open Source SLM Trained for MCP

https://osmosis.ai/blog/applying-rl-mcp
3•KaseyZhang•13m ago•1 comments

Apple Says Google Searches Down on Safari and Google Says Searches Are Up

https://www.seroundtable.com/apple-vs-google-search-changes-39380.html
1•belter•14m ago•0 comments

Photo Library Export Tool for Mac

https://apps.apple.com/en/app/fotomediathek-export/id6741324048
1•HackerMichl•15m ago•0 comments

Structured Outputs by Example

https://structuredoutputsbyexamples.com/
1•jxnl•15m ago•0 comments

I built a meeting scheduler in a month, and it got 500 signups in 24 hours

https://www.warmcal.com
1•ac1990•16m ago•1 comments

Why Google Search Deal Is Critical for Firefox's Future

https://www.omgubuntu.co.uk/2025/05/mozilla-says-google-search-deal-vital-to-firefoxs-survival
1•StanAngeloff•17m ago•0 comments

Don't Look at Stock Markets. Look at the Ports

https://www.theatlantic.com/economy/archive/2025/05/trump-tariff-shipping-ports/682673/
1•paulpauper•19m ago•0 comments

Here’s How To Handle A Recession If The Job Market Were To Plummet

https://www.forbes.com/sites/eliamdur/2025/05/03/heres-how-to-handle-a-recession-if-the-job-market-were-to-plummet/
1•paulpauper•20m ago•0 comments

How to start a school with your friends

https://prigoose.substack.com/p/how-to-start-a-university
1•geverett•21m ago•1 comments

Details Avoid Bias

https://www.overcomingbias.com/p/details-avoid-bias
1•paulpauper•21m ago•0 comments

Fighting Unwanted Notifications with Machine Learning in Chrome

https://blog.chromium.org/2025/05/fighting-unwanted-notifications-with.html
2•feross•21m ago•0 comments

Level-5 CEO says games being made 80-90% by AI "aesthetic sense" a must for devs

https://automaton-media.com/en/news/level-5-ceo-says-games-are-now-being-made-80-90-by-ai-making-aesthetic-sense-a-must-for-developers/
1•msephton•24m ago•1 comments

QUIC restarts, slow problems: udpgrm to the rescue

https://blog.cloudflare.com/quic-restarts-slow-problems-udpgrm-to-the-rescue/
1•emot•32m ago•0 comments

DOGEs K Schutt's computer infected by malware, credentials found in stealer logs

https://micahflee.com/doge-bro-kyle-schutts-computer-infected-by-malware-credentials-found-in-stealer-logs/
8•cycomanic•33m ago•0 comments

Why Everyone's Talking About Crypto Payments in 2025

https://caizcoin.medium.com/paying-with-stablecoins-why-everyones-talking-about-crypto-payments-in-2025-162330290bf7
1•byte-bolter•35m ago•2 comments

Uber's Journey to Ray on Kubernetes

https://www.infoq.com/news/2025/05/uber-journey-ray-kubernetes/
1•mooreds•36m ago•0 comments