news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Qwen3.5-397B at 4.74 tok/s using 5.9GB RAM

https://xcancel.com/danveloper/status/2033940538563445236

14•m-hodges•1h ago

Comments

cogman10•1h ago

Interesting, but what exactly did it do and what does it mean? Like, did it simply convert a 397B model into a 20b model? Or is this still a 397B model that now only uses around 6GB while running?

simonw•1h ago

Yeah the details on this look pretty thin. Best I could see was this snippet from the screenshot:

> Key technique: selective expert streaming via direct I/0. Only ~10 of 512 experts per layer are loaded from SSD per token (~1.8GB I/0 per token at 1.4 GB/s effective bandwidth). Non-expert weights (~5GB) are pinned in DRAM. LRU expert cache provides 44%+ hit rate.

It's apparently using ideas from: https://arxiv.org/abs/2312.11514

> This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters in flash memory, but bringing them on demand to DRAM. Our method involves constructing an inference cost model that takes into account the characteristics of flash memory, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks.

0x457•56m ago

Interesting. Reminds me how Gemma 3N with PLE caching works.

quietbuilder•47m ago

44% cache hit rate is low. Over half the expert loads are cold reads off SSD, so at 1.4 GB/s effective bandwidth and ~1.8GB I/O per token, 4.74 tok/s checks out — but it'll drop with longer context or heavier reasoning.

Running 397B on consumer hardware is genuinely impressive for a proof of concept. A year ago this wasn't a thing. But I keep wondering whether a well-quantized 70B that fits entirely in RAM would just be faster in practice. No I/O bottleneck, consistent throughput, smaller model but actually usable.

Overseas 'content farms' creating political deepfakes uncovered

https://www.bbc.com/news/articles/c07jj7d72yzo

1•robtherobber•14s ago•0 comments

Show HN: StackStats – Analytics tool for Substack writers, runs 100% locally

1•rishikeshs•49s ago•0 comments

Et tu, S&P 500? The SpaceX IPO gamesmanship is going to be epic

https://www.ft.com/content/59adbe42-ca30-47f3-9cda-5415945e9368

1•petethomas•2m ago•0 comments

You're Not Thinking About Your Network the Way You Should

https://packetpushers.net/podcasts/heavy-strategy/hs127-youre-not-thinking-about-your-network-the...

1•oavioklein•2m ago•0 comments

Did Cinema Get Narrower?

https://www.kopanko.com/notes/did-cinema-get-narrower

1•pcktm•2m ago•0 comments

Turning raw logs into feature vectors without manual labeling

https://www.securesql.info/2025/04/05/etl-playbooks/

1•projectnexus•2m ago•1 comments

Show HN: Starting Five – NBA Lineup Building Challenges

https://draftdawg.app

1•perhapsAnLLM•3m ago•0 comments

SecOps without manual schemas: Using EBMs and automated ETL for detection

https://www.securesql.info/2025/04/04/loop-architecture/

1•projectnexus•4m ago•1 comments

Ban Bots Not Human Directed Tool Use

1•morpheos137•4m ago•1 comments

Show HN: Horizon – GPU-accelerated infinite-canvas terminal in Rust

https://github.com/peters/horizon

1•petersunde•5m ago•0 comments

Fair Source Software in the AI Age

https://blog.sentry.io/fair-source-software-in-the-ai-age/

1•ezekg•5m ago•0 comments

AI Agents and the New SaaS

https://www.gouthamve.dev/on-ai-agents-and-the-new-saas/

2•gouthamve•6m ago•0 comments

YouTube is experimenting with ads visible even after users skip

https://searchengineland.com/youtube-tests-sticky-banner-after-ad-skip-471902

2•speckx•7m ago•0 comments

Stop training your security ML on labeled attack data

https://www.securesql.info/2025/04/03/energy-based-models-anomaly-detection/

1•projectnexus•7m ago•1 comments

Why does it feel uncomfortable to think about how much you use your phone?

https://dogdogfish.com/blog/2026/03/17/psychological-discomfort/

1•matthewsharpe3•8m ago•0 comments

Stripe.com/6oU7sL9Pwg6Xa9kBest AI Agent Certi1iK1gs0s

1•OpenClawAura•8m ago•0 comments

Spectra – detect API contract drift from real runtime traffic

https://github.com/rmalik1-hash/spectra_windows_public

1•Spectra73•8m ago•1 comments

What was DOGE? How Elon Musk tried to gamify government

https://www.theguardian.com/news/ng-interactive/2026/mar/17/elon-musk-gamify-government

4•billybuckwheat•9m ago•0 comments

Why Claude Code Can't Find Your Tools

https://layer5.io/blog/engineering/why-claude-code-cant-find-your-tools/

2•lcalcote•10m ago•0 comments

India's outsourcing industry is worth $300B. Can it survive AI?

https://www.bbc.com/news/articles/c5yrq1090p8o

3•devonnull•10m ago•0 comments

Can You Train a Computer?

https://dimitrisp.substack.com/p/can-you-train-a-computer

2•marojejian•10m ago•0 comments

Zero ZGC4: A Better Graphing Calculator for School and Beyond

https://www.zerocalculators.com/features

1•uticus•12m ago•0 comments

Kexec handover and the live update orchestrator

https://lwn.net/Articles/1033364/

1•tosti•12m ago•0 comments

Researchers disclose vulnerabilities in IP KVMs from four manufacturers

https://arstechnica.com/security/2026/03/researchers-disclose-vulnerabilities-in-ip-kvms-from-4-m...

2•joozio•13m ago•0 comments

Illinois Introducing Operating System Account Age Bill

https://www.ilga.gov/Legislation/BillStatus?DocTypeID=HB&DocNum=5511

15•terminalbraid•13m ago•1 comments

Putting Thought into Things (2014)

https://ia.net/topics/putting-thought-into-things

1•levmiseri•13m ago•1 comments

Operating Systems: Three Easy Pieces

https://pages.cs.wisc.edu/~remzi/OSTEP/

1•vinhnx•14m ago•0 comments

How to bake your docs into your CLI

https://coasts.dev/blog/a-better-pattern-than-mcp-for-agent-friendly-clis

1•jsunderland323•15m ago•0 comments

Why Your Best Opportunities Are in Your Network

https://willem720055.substack.com/p/why-your-best-opportunities-are-already

1•anqer•16m ago•0 comments

Brat, a parallel TAP testing harness for the POSIX shell

https://codeberg.org/sstephenson/brat

1•PaulHoule•16m ago•0 comments