frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions

1•kachapopopow•23s ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/
1•melvinodsa•2m ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•13m ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis
2•thread_id•14m ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504
1•geeknews•15m ago•0 comments

TSMC to produce 3-nanometer chips in Japan

https://www3.nhk.or.jp/nhkworld/en/news/20260205_B4/
2•cwwc•18m ago•0 comments

Quantization-Aware Distillation

http://ternarysearch.blogspot.com/2026/02/quantization-aware-distillation.html
1•paladin314159•18m ago•0 comments

List of Musical Genres

https://en.wikipedia.org/wiki/List_of_music_genres_and_styles
1•omosubi•20m ago•0 comments

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

https://sknet.ai/
1•BeinerChes•20m ago•0 comments

University of Waterloo Webring

https://cs.uwatering.com/
1•ark296•21m ago•0 comments

Large tech companies don't need heroes

https://www.seangoedecke.com/heroism/
1•medbar•22m ago•0 comments

Backing up all the little things with a Pi5

https://alexlance.blog/nas.html
1•alance•23m ago•1 comments

Game of Trees (Got)

https://www.gameoftrees.org/
1•akagusu•23m ago•1 comments

Human Systems Research Submolt

https://www.moltbook.com/m/humansystems
1•cl42•23m ago•0 comments

The Threads Algorithm Loves Rage Bait

https://blog.popey.com/2026/02/the-threads-algorithm-loves-rage-bait/
1•MBCook•26m ago•0 comments

Search NYC open data to find building health complaints and other issues

https://www.nycbuildingcheck.com/
1•aej11•29m ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html
2•lxm•31m ago•0 comments

Show HN: Grovia – Long-Range Greenhouse Monitoring System

https://github.com/benb0jangles/Remote-greenhouse-monitor
1•benbojangles•35m ago•1 comments

Ask HN: The Coming Class War

2•fud101•35m ago•4 comments

Mind the GAAP Again

https://blog.dshr.org/2026/02/mind-gaap-again.html
1•gmays•37m ago•0 comments

The Yardbirds, Dazed and Confused (1968)

https://archive.org/details/the-yardbirds_dazed-and-confused_9-march-1968
1•petethomas•38m ago•0 comments

Agent News Chat – AI agents talk to each other about the news

https://www.agentnewschat.com/
2•kiddz•38m ago•0 comments

Do you have a mathematically attractive face?

https://www.doimog.com
3•a_n•42m ago•1 comments

Code only says what it does

https://brooker.co.za/blog/2020/06/23/code.html
2•logicprog•48m ago•0 comments

The success of 'natural language programming'

https://brooker.co.za/blog/2025/12/16/natural-language.html
1•logicprog•48m ago•0 comments

The Scriptovision Super Micro Script video titler is almost a home computer

http://oldvcr.blogspot.com/2026/02/the-scriptovision-super-micro-script.html
3•todsacerdoti•48m ago•0 comments

Discovering the "original" iPhone from 1995 [video]

https://www.youtube.com/watch?v=7cip9w-UxIc
1•fortran77•50m ago•0 comments

Psychometric Comparability of LLM-Based Digital Twins

https://arxiv.org/abs/2601.14264
1•PaulHoule•51m ago•0 comments

SidePop – track revenue, costs, and overall business health in one place

https://www.sidepop.io
1•ecaglar•54m ago•1 comments

The Other Markov's Inequality

https://www.ethanepperly.com/index.php/2026/01/16/the-other-markovs-inequality/
2•tzury•55m ago•0 comments
Open in hackernews

Sharing base model in GPU VRAM across multiple inference stack process [video]

https://www.youtube.com/watch?v=OC1yyJo9zpg
7•medicis123•5mo ago

Comments

medicis123•5mo ago
We have just published a short demo of the WoolyAI GPU Hypervisor, showcasing VRAM memory sharing/deduplication. Load a single base model once, then run multiple isolated LoRA stacks or VLLM stacks on the same GPU.

Why this matters

Higher capacity: Share the base model in VRAM; add more adapters or vertical inference stacks per GPU without increasing memory usage.

Isolation & control: Each stack is its own process with independent batching and SLA-aware scheduling.

While vLLM supports multiple adapters on a single vLLM process, many teams need predictable per-adapter SLAs—this is where running independent stacks with a shared base model in VRAM can enable doing it all on the same GPU.

The demo uses LoRA inference using Pytorch, but the same applies when using vLLM. If you’re scaling LoRA inference across business units or model variants and need predictable latency without overprovisioning GPUs, I’d love your feedback. Comment or DM to chat.