frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Working with Btrfs – Snapshots

https://fedoramagazine.org/working-with-btrfs-snapshots/
1•hamid914•1m ago•0 comments

If agents are building your app, who gets the W-2?

https://vercel.com/blog/if-agents-are-building-your-app-who-gets-the-w-2
1•cramforce•1m ago•0 comments

Show HN: Generate polished reports/docs automatically from messy inputs

https://gridfusion.ai/
1•arjun_tejaswi_m•4m ago•0 comments

CRDT: Text Buffer by Evan Wallace

https://madebyevan.com/algos/crdt-text-buffer/
2•skadamat•4m ago•0 comments

The SSO Wall of Shame – Vendors that treat SSO as luxury feature

https://sso.tax/
2•vinnyglennon•4m ago•0 comments

Domain Fronting

https://en.wikipedia.org/wiki/Domain_fronting
1•rolph•4m ago•0 comments

New FBI case files reveal suspects, tips and hoaxes in DB Cooper plane hijacking

https://www.abc.net.au/news/2025-07-13/db-cooper-new-files-fbi-suspects-cold-casae/105513276
1•austinallegro•6m ago•0 comments

The sokol-gfx resource view update

https://floooh.github.io/2025/08/17/sokol-gfx-view-update.html
1•ibobev•6m ago•0 comments

Building free tax filing app for US employees

https://tax-employees.web.app/
1•kikichiki•7m ago•0 comments

Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration

https://github.com/lemonade-sdk/lemonade
7•ramkrishna2910•8m ago•0 comments

Theo de Raadt on YubiKey: cccccblddbkhttjnhvbufcvrtggtvvfnuviieecckfcg

https://marc.info/?l=openbsd-tech&m=175561603325837&w=2
1•chunky-kai-shek•8m ago•0 comments

Life on an Outdated Kernel

https://kernel-5mp.pages.dev/
1•danielh4t•9m ago•0 comments

Hacking Toniebox

https://20y.hu/~slink/journal/toniebox/index.html
1•b6dybuyv•9m ago•0 comments

Show HN: Trajectory.fyi – Compare people and companies by age

1•trajectoryfyi•10m ago•0 comments

Compilers Aren't Just for Programming Languages

https://www.architecture-weekly.com/p/compilation-isnt-just-for-programming
1•redbell•12m ago•0 comments

Ukrainian Sniper Sets New Record for Longest Confirmed Engagement (13,000ft/4km)

https://militarnyi.com/en/news/ukrainian-sniper-sets-new-record-for-longest-confirmed-engagement/
1•giuliomagnifico•12m ago•0 comments

Princeton Researchers and Forum Veterans Are Fighting over AI Optimization

https://www.generative-engine.org/the-great-geo-strategy-wars-why-princeton-researchers-and-fo-1755630165056
1•flixing•13m ago•1 comments

Against Breathalyzers

https://newpolity.com/blog/breathalyzers
2•TheFreim•14m ago•0 comments

Webb telescope finds a new tiny moon around Uranus

https://apnews.com/article/new-moon-uranus-jwst-5b348bb1443477ebad62bed7245abbf3
2•geox•18m ago•0 comments

3D printing reshapes construction for nuclear energy

https://techxplore.com/news/2025-07-3d-reshapes-nuclear-energy.html
1•PaulHoule•19m ago•0 comments

Tool Time Session: Emacs Basics [video]

https://www.youtube.com/watch?v=HyMCzEwI4cU
1•TheFreim•19m ago•0 comments

APIs don't make good MCP tools

https://www.reillywood.com/blog/apis-dont-make-good-mcp-tools/
1•kiyanwang•20m ago•0 comments

The next 10 years won't be about AI knowing, they will be about AI doing

https://www.freethink.com/series/the-freethink-interview/adam-cheyer
1•speckx•21m ago•0 comments

The On-Line Encyclopedia of Integer Sequences

https://oeis.org/
1•mxschll•22m ago•0 comments

How the Mafia Infiltrated Germany

https://unherd.com/2025/08/how-the-mafia-infiltrated-germany/
1•zolbrek•26m ago•0 comments

Using Sound to Remember Quantum Information

https://www.caltech.edu/about/news/using-sound-to-remember-quantum-information
2•gmays•28m ago•0 comments

ArduinoOS

https://github.com/DrBubble/ArduinoOS
1•dcminter•30m ago•0 comments

The global car reckoning is here, far too many auto companies don't have a plan

https://www.wired.com/story/the-global-car-reckoning-is-here-auto-companies-dont-have-a-plan/
4•gloxkiqcza•32m ago•0 comments

Here Comes the World Wide Web of Everything connects devices, robots, AI agents

https://spectrum.ieee.org/spatial-web-standard
1•justcallmejm•34m ago•1 comments

SmallJS

https://small-js.org/Home/Home.html
1•Bogdanp•37m ago•0 comments
Open in hackernews

How to Give Your RTX 4090 Nearly Infinite Memory for LLM Inference

https://medium.com/data-science-collective/how-to-give-your-rtx-gpu-nearly-infinite-memory-for-llm-inference-de2c57af1e82
1•dikobraz•2h ago

Comments

dikobraz•2h ago
We explored a network-attached KV-cache for consumer GPUs to offset their limited VRAM. It doesn’t make RTX cards run giant models efficiently. Still, for workloads that repeatedly reuse lengthy prefixes—such as chatbots, coding assistants, and multi-turn threads—it delivers a 2–4× speedup in RPS and time-to-first-token on 7B and 70B models.

How it works: On return visits, instead of re-running the prompt through the model, we fetch previously computed KV blocks from network storage and skip re-computing those tokens (i.e., we avoid re-running prefill on repeated prefixes). This is helpful when VRAM can’t hold all sessions, and users pause between messages, which is almost always the case.

Why RTX benefits: Prefill is the computationally intensive part (quadratic attention, numerous reductions, and inter-GPU traffic). Without NVLink, PCIe becomes the choke point in multi-GPU setups. KV-caching cuts repeated prefill, leaving mostly the lighter decoding step—something PCIe-only RTX nodes handle well.

Results & endpoint: - 2–4× speedup on multi-turn benchmarks (RPS & TTFT) with RTX 4090. - We’ve opened one free public endpoint for demos, not production grade (https://console.cloudrift.ai/inference?modelId=meta-llama%2F...). Ping us at hello@cloudrift.ai if you need a reliable setup.

Technical Notes: - Works with consumer and data-center GPUs. In theory, you can even split roles: NVLink boxes do prefill, while cheaper RTX pods serve as decoders using stored KV. - We use special hardware to reduce fetch overhead and offload the CPU, but you can reproduce this at home with a regular NAS (with lower peak performance). - For a more in-depth walkthrough of the math and architecture of a KV-cache solution, please watch this video from the KV-cache solution vendor (https://www.youtube.com/watch?si=T69vxku8xPr6p7I0&v=CV4FYMTF...)