frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: NVFP4 on Desktop Blackwell – 122B MoE on a Single RTX PRO 6000 31 tok/s

1•jcartu•2h ago
Qwen 3.5 122B-A10B (MoE, ~10B active parameters) running in native NVFP4 on a single RTX PRO 6000 Blackwell GPU. 31 tokens/sec, 89GB VRAM, piecewise CUDA graphs. No multi-GPU, no cloud.

Why this matters: NVIDIA's TRT-LLM explicitly blocks desktop Blackwell from FP4 — the error literally says "FP4 Gemm not supported before Blackwell, nor GeForce Blackwell." The RTX 5090, PRO 6000, and DGX Spark all use SM120 — same FP4 tensor cores as the B100/B200 datacenter chips (SM100). The lock is artificial product segmentation, not a hardware limitation.

CUTLASS 4.2+ already ships SM120 FP4 kernels. They're compiled into vLLM. The problem is purely dispatch logic — Python-level capability checks that only recognize SM100, not SM120.

Setup (vLLM 0.17.0, stable pip install):

CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server --model Sehyo/Qwen3.5-122B-A10B-NVFP4 --port 8100 --max-model-len 4096 --gpu-memory-utilization 0.85 --compilation-config '{"cudagraph_mode": "piecewise"}'

Key gotchas: (1) Do NOT pass --quantization flag, model uses compressed-tensors format and vLLM auto-detects. (2) Full CUDA graphs OOM — use piecewise mode (31 tok/s vs 12 tok/s eager). (3) Python 3.14 breaks numba, stick with 3.13.

Results: 31 tok/s on 1 GPU vs 54 tok/s on 2 GPUs with Q8_0 llama.cpp. Half the hardware, ~60% the speed, ~98% the quality.

The broader point: SM120 and SM100 share the same FP4 tensor core architecture. CUTLASS has the kernels. The frameworks just need to route SM120 to them. A 122B MoE model on a single desktop GPU at 31 tok/s was datacenter-only six months ago.

Relevant issues: vLLM #33416, SGLang #18954, CUTLASS #2800. We're submitting a PR (~10 lines of Python).

Model: https://huggingface.co/Sehyo/Qwen3.5-122B-A10B-NVFP4

AI agents are coming for government. How one big city is letting them in

https://www.fastcompany.com/91504876/boston-cio-santi-garces-on-ai-agents-mcp-open-data
1•johnshades•29s ago•0 comments

The Government Told Courts It Could Easily Refund Tariffs. Now It Says It Can't

https://www.techdirt.com/2026/03/09/the-government-told-courts-it-could-easily-refund-unlawful-ta...
1•cdrnsf•34s ago•0 comments

How to Track Competitor Pricing Changes Automatically

https://adversa.io/blog/track-competitor-pricing-changes/
1•robinweller•56s ago•0 comments

Canadian employment trends in the era of generative artificial intelligence

https://www150.statcan.gc.ca/n1/pub/36-28-0001/2026001/article/00003-eng.htm
1•jyunwai•1m ago•0 comments

Show HN: A daily arithmetic puzzle with a hidden Hard Mode

https://make24.app
1•kapework•3m ago•0 comments

Breaking macOS Screen Time for fun and profit

https://dunkirk.sh/blog/screentime/
1•clacker-o-matic•4m ago•2 comments

CIA faces furious backlash after hidden document with potential cure for cancer

https://www.dailymail.co.uk/sciencetech/article-15629211/cia-cancer-cure-document-declassified.html
1•Bender•4m ago•1 comments

SSH Config: The File Nobody Reads

https://vivianvoss.net/blog/ssh-config
1•alwillis•4m ago•0 comments

Show HN: Time as the 4th Dimension – What if it emerges from rotational motion?

1•lisajguo•5m ago•0 comments

The internet is being flooded with AI content. How can we tell what is human?

1•01-_-•6m ago•0 comments

Unified Attestation: open-source alternative to Google Play Integrity

https://uattest.net/
1•turrini•6m ago•0 comments

Moltbook: Bot‑Only Network Full of Prompt and Scam Posts Now Monitored

https://youscan.io/blog/moltbook-monitoring/
1•defly•6m ago•0 comments

Ultrasound-Responsive Nanoparticles for Biofilm Treatment

https://pubs.acs.org/doi/10.1021/jacsau.5c01711
1•PaulHoule•7m ago•0 comments

Show HN: Quadratic Intelligence Growth from Logarithmic Routing (QIS Protocol)

https://yonderzenith.github.io/QIS-Protocol-Website/article-architecture-diagram.html
1•chris_trevethan•7m ago•1 comments

OpenAI updates privacy policy as ads expand in ChatGPT

https://searchengineland.com/openai-updates-privacy-policy-as-ads-expand-in-chatgpt-471150
5•speckx•8m ago•0 comments

Show HN: Self-hosted Chromium engine with 256 parallel stealth sessions

https://owlbrowser.net/
1•ahstanin•8m ago•0 comments

Show HN: ChatShell – 22MB AI Agent with 9 Built-In Tools (Tauri, Not Electron)

https://github.com/chatshellapp/chatshell-desktop
1•s3anw3•9m ago•1 comments

Show HN: Marque – MCP/CLI server for persistent agent design identity

https://marque-web.vercel.app/
1•Parth_Sharma_18•9m ago•0 comments

AI Is a Microcontractor

https://aartur.substack.com/p/ai-is-a-microcontractor
1•aartur•9m ago•0 comments

AI agents with memory solve problems 2x better (and 5 more papers)

https://blog.santthosh.com/distilled-weekly-mar-02-mar-08-2026/
1•santthosh01•10m ago•1 comments

Adobe's OpenPBR BSDF

https://github.com/adobe/openpbr-bsdf
1•franzb•11m ago•0 comments

Notchi: A macOS notch companion that reacts to Claude Code activity in real-time

https://github.com/sk-ruban/notchi
1•Areibman•11m ago•0 comments

Show HN: We forked KuzuDB and added concurrent writes for AI agent memory

https://www.vela.partners/blog/kuzudb-ai-agent-memory-graph-database
2•yihlamur•12m ago•0 comments

Show HN: A tiny multiplayer experiment where everyone attacks the same dragon

https://dragon-attack-game--aridora520.replit.app
1•ArisLee•13m ago•1 comments

Anthropic "Philosopher" Amanda Askell's Connection to "Effective Altruism"

https://nypost.com/2026/03/03/business/ai-giant-anthropic-philosopher-amanda-askells-oddball-blog...
2•1vuio0pswjnm7•13m ago•0 comments

Ask HN: Free hosting/cloud providers for free non-profit open source apps?

1•-thrwwy-•13m ago•1 comments

Doom Counter – between Nostradamus, Gaza and elections

https://doomcounter.com
1•kalpolintrol•15m ago•0 comments

Deepfakes for Code and the Asymmetric Internet

https://matthiasplappert.com/blog/2026/deepfakes-for-code/
2•mplappert•15m ago•0 comments

Publisher demands $500 from impersonated author to retract paper

https://retractionwatch.com/2026/03/05/publisher-demands-500-from-impersonated-author-to-retract-...
4•MaysonL•16m ago•0 comments

A Brief History of Type Systems and How AI Is Changing the Tradeoffs

https://ntnt-lang.org/blog/type-systems-for-ai-agents
1•joshcramer•18m ago•0 comments