frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Deploying Gemma 4 26B on an RTX 5090

https://datapnt.com/blog/deploying-gemma-4-26b-a4b-on-rtx-5090
5•sudo_ls_ads•1h ago

Comments

sudo_ls_ads•1h ago
Author here. Quick context on what made this worth writing up: Gemma 4 26B A4B is an MoE — 26B total params, 4B active per token — which fundamentally changes what’s viable on a single consumer GPU. During decode you pay the memory bandwidth cost of a 4B model but get the quality of a 26B. That’s what makes a 5090 a real option for it; a dense 26B wouldn’t be.

The interesting part was the quant format choice. NVFP4 is Blackwell’s native 4-bit format and theoretically the fastest path, but MoE support for Gemma 4 specifically was blocked on an unmerged vLLM PR (#39045) — linear layers loaded, expert weights didn’t. Falling back to nightly didn’t help because that day’s nightly was broken by someone landing an unconditional pandas import in the AITER code path without updating the image’s deps. Ended up on AWQ + Marlin kernels, which has been stable in vLLM for over a year. For single-user memory-bandwidth-bound decode the gap to NVFP4 is smaller than you’d expect — both hit the same 4x weight compression, and AWQ dequantizes to FP16 in-register rather than using FP4 tensor cores. I’m getting ~196 tok/s; I’d estimate NVFP4 would be 220-240 if it had worked.

Happy to dig into the vLLM config, the RunPod Serverless side, or the NVFP4 vs AWQ tradeoff in more depth.

Show HN: Deadline.email – a daily reminder that you'll die

https://deadline.email/
1•onesandofgrain•6s ago•0 comments

Token Maxer, Eventually

https://brunokiafuka.substack.com/p/token-maxer-eventually
1•brunokiafuka•48s ago•0 comments

Programming the Univac-1219 [video]

https://www.youtube.com/watch?v=rU8sCbwB8XU
1•caminanteblanco•48s ago•1 comments

NY Times: That Meeting You Hate May Keep A.I. From Stealing Your Job

https://www.nytimes.com/2026/04/15/business/ai-jobs-human-work.html
1•nicolapede•1m ago•0 comments

C# that looks like Go

https://makarchie.com/posts/csharp-that-looks-like-go-file-based-apps/
1•azhenley•1m ago•0 comments

SF is obsessed with the safest drivers – and ignoring the ones killing people

https://www.sfchronicle.com/opinion/openforum/article/pedestrian-death-driver-accident-22210904.php
1•standardUser•4m ago•1 comments

There are only four skills: design, technical, management and physical

https://www.lesswrong.com/posts/KRLGxCaqdgrotyB8z/there-are-only-four-skills-design-technical-man...
2•samuel246•5m ago•0 comments

Fake Pro-Trump Avatars Emerge on Social Media

https://www.nytimes.com/2026/04/17/business/media/artificial-intelligence-trump-social-media.html
3•doener•8m ago•1 comments

Gender reassignment significantly increases psychiatric morbidity

https://onlinelibrary.wiley.com/doi/10.1111/apa.70533
1•hereme888•9m ago•0 comments

Reconstructing a Dead USB Protocol: A Handheld's Secrets Unlocked by a Hot Knife

https://github.com/coremaze/ME2-Writeup
3•Bawoosette•10m ago•0 comments

Atlantic's circulation collapse would lead to substantial oceanic carbon release

https://www.nature.com/articles/s43247-026-03427-w
3•doener•11m ago•0 comments

I time travelled to Ancient Rome [video]

https://www.youtube.com/watch?v=aaua5ghidk0
1•lisper•11m ago•0 comments

Palantir posts mini-manifesto denouncing inclusivity and 'regressive' cultures

https://techcrunch.com/2026/04/19/palantir-posts-mini-manifesto-denouncing-regressive-and-harmful...
5•benwerd•14m ago•0 comments

Critical flaw in Protobuf library enables JavaScript code execution

https://www.bleepingcomputer.com/news/security/critical-flaw-in-protobuf-library-enables-javascri...
3•Brajeshwar•17m ago•1 comments

Is the 'Tailored Resume' advice feasible without automation anymore?

https://applygenius.ai
1•mikkaai•17m ago•0 comments

Show HN: Google Gemini Is Scanning Your Photos – and the EU Said No

4•anju-kushwaha•18m ago•0 comments

Accepted proposal: UUID in the Go standard library

https://rednafi.com/shards/2026/04/go-uuid/
2•ingve•18m ago•0 comments

Amazon DCV – A Better Alternative to VNC

https://aws.amazon.com/hpc/dcv/
1•alhazrod•21m ago•0 comments

Self-healing GitHub CI that won't let AI touch your application code

https://github.com/mosidze/aiheal
3•mosidze•28m ago•0 comments

Show HN: AgentGrade – agent-readiness guide for your site

https://agentgrade.com/
2•usiegj00•28m ago•0 comments

AI Is Killing Open Source SaaS Too

https://nmn.gl/blog/open-source-killed-ai
1•namanyayg•28m ago•1 comments

543 Hours: What happens when AI runs while you sleep

https://michael.roth.rocks/research/543-hours/
2•pramodbiligiri•32m ago•0 comments

PM Carney declares U.S. ties now a 'weakness' in address to Canadians

https://www.ctvnews.ca/politics/article/pm-carney-declares-us-ties-now-a-weakness-in-address-to-c...
39•Teever•32m ago•10 comments

"Ukraine cut out the bloated red tape of military bureaucracy"

https://www.youtube.com/watch?v=1s39U0j2jPA
1•lifeisstillgood•33m ago•1 comments

Rensei – let agents code 3D models and screenshot them. then 3D print

https://github.com/remorses/rensei
3•xmorse•34m ago•0 comments

The State of LLM Bug Bounties in 2026

https://wraith.sh/learn/state-of-llm-bug-bounties-2026
1•WizardX_0x•34m ago•0 comments

CNNs + VLM outperforms pure VLMs on OCR

https://interfaze.ai/blog/cnn-plus-vlm-more-than-vlm
2•yoeven•34m ago•0 comments

Show HN: I built an open source and secure infrastructure for internal apps

https://github.com/RootCX/RootCX
1•seyz•35m ago•0 comments

This time is no different

https://czep.net/26/this-time.html
1•czep•37m ago•0 comments

How I sequenced my genome at home

https://twitter.com/sethshowes/status/2045782975380406623
4•Finbarr•37m ago•2 comments