frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Praise for Price Gouging

https://www.grumpy-economist.com/p/praise-for-price-gouging
1•mhb•43s ago•0 comments

Open source infra orchestrator agent clanker CLI

https://github.com/bgdnvk/clanker
1•tekbog•2m ago•0 comments

Lance table format explained simply, stupid (Animated)

https://tontinton.com/posts/lance/
1•tontinton•3m ago•0 comments

Solving Soma

https://anekstein.com/posts/2026-02-01-blocker
1•davidanekstein•3m ago•0 comments

We built a cloud platform for agentic software (our virtualization, etc.)

https://agentuity.com/
1•rblalock•3m ago•2 comments

Show HN: WLM-SLP – A 0D-27D Structural Language for Multi-Agent Alignment

https://github.com/gavingu2255-ai/WLM-Open-Source/blob/main/README.md
1•WujieGuGavin•4m ago•0 comments

Former Tumblr Head Jeff D'Onofrio Steps in as Acting CEO at the Washington Post

https://www.theverge.com/tech/875433/tumblr-jeff-donofrio-ceo-washington-post-layoffs
1•bookofjoe•7m ago•0 comments

Bounded Flexible Arrays in C

https://people.kernel.org/kees/bounded-flexible-arrays-in-c
1•fanf2•7m ago•0 comments

The Invisible Labor Force Powering AI

https://cacm.acm.org/news/the-invisible-labor-force-powering-ai/
1•pseudolus•9m ago•0 comments

Reading Recursion via Pascal

https://journal.paoloamoroso.com/reading-recursion-via-pascal
1•AlexeyBrin•9m ago•0 comments

Show HN: I made a website that finds patterns on your spreadsheet

https://analyzetable.com
1•kouhxp•10m ago•0 comments

Jokes on You AI: Turning the Tables – LLMs for Learning

https://www.dev-log.me/jokes_on_you_ai_llms_for_learning/
1•wazHFsRy•11m ago•0 comments

You don't need RAG in 2026

https://ryanlineng.substack.com/p/you-dont-need-rag-in-2026
1•kareninoverseas•12m ago•0 comments

WatchLLM – Cost kill switch for AI agents (with loop detection)

https://www.watchllm.dev/
1•Kaadz•15m ago•2 comments

I turned myself into an AI-generated deathbot – here's what I found

https://www.bbc.com/news/articles/c93wjywz5p5o
1•cmsefton•26m ago•0 comments

Management style doesn't predict survival

https://orchidfiles.com/management-style-doesnt-predict-survival/
1•theorchid•26m ago•0 comments

One Generation Runs the Country. The Next Cashed in on Crypto

https://www.wsj.com/finance/currencies/trump-sons-crypto-billions-1e7f1414
1•impish9208•28m ago•1 comments

"I Was Wrong": Why the Civil War Is Running Late [video][2h21m]

https://www.youtube.com/watch?v=RDmkKZ7vAkI
1•Bender•29m ago•0 comments

Show HN: A sandboxed execution environment for AI agents via WASM

https://github.com/Parassharmaa/agent-sandbox
1•paraaz•32m ago•0 comments

Wine-Staging 11.2 Brings More Patches to Help Adobe Photoshop on Linux

https://www.phoronix.com/news/Wine-Staging-11.2
2•doener•32m ago•0 comments

The Nature of the Beast

https://cinemasojourns.com/2026/02/07/the-nature-of-the-beast/
1•jjgreen•32m ago•0 comments

From Prediction to Compilation: A Manifesto for Intrinsically Reliable AI

1•JanusPater•32m ago•0 comments

Show HN: Curated list of 1000 open source alternatives to proprietary software

https://opensrc.me
1•ZenithSoftware•34m ago•0 comments

AI's Real Problem Is Illegitimacy, Not Hallucination

1•JanusPater•35m ago•1 comments

'I fell into it': ex-criminal hackers urge UK pupils to use web skills for good

https://www.theguardian.com/technology/2026/feb/08/i-fell-into-it-ex-criminal-hackers-urge-manche...
1•robaato•36m ago•0 comments

Why 175-Year-Old Glassmaker Corning Is Suddenly an AI Superstar

https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b
1•bookofjoe•37m ago•1 comments

Keeping WSL Alive

https://shift1w.com/blog/keeping-wsl-alive/
1•jakesocks•38m ago•0 comments

Unlocking core memories with GoldSrc engine and CS 1.6 (2025)

https://www.danielbrendel.com/blog/43-unlocking-core-memories-with-goldsrc-engine
3•foxiel•38m ago•0 comments

Gtrace an advanced network path analysis tool

https://github.com/hervehildenbrand/gtrace
2•jimaek•39m ago•0 comments

America does not trust Putin or Trump

https://re-russia.net/en/review/809/
1•mnky9800n•42m ago•0 comments
Open in hackernews

Ask HN: How are you managing LLM inference at the edge?

7•gray_amps•9mo ago
I’m building a system to run small LLMs on-device (mobile, IoT, on-prem servers) and would love to hear how others have tackled the challenges.

Context:

Use cases: offline chatbots, smart cameras, local data privacy

Models: 7–13B parameter quantized models (e.g. Llama 2, Vicuna)

Constraints: limited RAM/flash, CPU-only or tiny GPU, intermittent connectivity

Questions:

What runtimes or frameworks are you using (ONNX Runtime, TVM, custom C++)?

How do you handle model loading, eviction, and batching under tight memory?

Any clever tricks for quantization, pruning, or kernel fusions that boost perf?

How do you monitor and update models securely in the field?

Looking forward to your benchmarks, war stories, and code pointers!

Comments

byte-bolter•9mo ago
I’m using ONNX Runtime with 4-bit quantization on a Raspberry Pi 4. I preload the quantized model into shared memory so multiple processes can reuse it. Evict old sessions by LRU when I hit a 1 GB RAM cap. For batching, I accumulate inputs over 50 ms to boost throughput without hurting latency. So far I get ~15 RPS on a 7 B Llama 2 model.
tynskid2025•9mo ago
Do you have a repo outline of how you did this, I would be so grateful