frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Furiosa: 3.5x efficiency over H100s

https://furiosa.ai/blog/introducing-rngd-server-efficient-ai-inference-at-data-center-scale
62•written-beyond•1h ago

Comments

grosswait•1h ago
How usable is this in practice for the average non AI organization? Are you locked into a niche ecosystem that limits the options of what models you can serve?
sanxiyn•31m ago
Yes, but in principle it isn't that different from running on Trainium or Inferentia (it's a matter of degree), and plenty of non-AI organizations adopted Trainium/Inferentia.
darknoon•1h ago
really weird graph where they're comparing to 3x H100 PCI-E which is a config I don't think anyone is using.

they're trying to compare at iso-power? I just want to see their box vs a box of 8 h100s b/c that's what people would buy instead, and they can divide tokens and watts if that's the pitch.

zmmmmm•1h ago
What can it actually run? The fact their benchmark plot refers to Llama 3.1 8b signals to me that it's hand implemented for that model and likely can't run newer / larger models. Why else would you benchmark such an outdated model? Show me a benchmark for gpt-oss-120b or something similar to that.
sanxiyn•44m ago
Looking at their blog, they in fact ran gpt-oss-120b: https://furiosa.ai/blog/serving-gpt-oss-120b-at-5-8-ms-tpot-...

I think Llama 3 focus mostly reflects demand. It may be hard to believe, but many people aren't even aware gpt-oss exists.

reactordev•13m ago
Many are aware, just can’t offload it onto their hardware.

The 8B models are easier to run on an RTX to compare it to local inference. What llama does on an RTX 5080 at 40t/s, Furiosa should do at 40,000t/s or whatever… it’s an easy way to have a flat comparison across all the different hardware llama.cpp runs on.

zmmmmm•8m ago
Now I'm interested ...

It still kind of makes the point that you are stuck with a very limited range of models that they are hand implementing. But at least it's a model I would actually use. Give me that in a box I can put in a standard data center with normal power supply and I'm definitely interested.

But I want to know the cost :-)

rjzzleep•2m ago
The fact that so many people are focusing solely on massive LLM models is an oversight by people that narrowly focusing on a tiny (but very lucrative) subdomain of AI applications.
kuil009•54m ago
The positioning makes sense, but I’m still somewhat skeptical.

Targeting power, cooling, and TCO limits for inference is real, especially in air-cooled data centers.

But the benchmarks shown are narrow, and it’s unclear how well this generalizes across models and mixed production workloads. GPUs are inefficient here, but their flexibility still matters.

whimsicalism•53m ago
Got excited, then I saw it was for inference. yawns

Seems like it would obviously be in TSMCs interest to give preferential taping to nvidia competitors, they benefit from having a less consolidated customer base bidding up their prices.

roughly•51m ago
I am of the opinion that Nvidia's hit the wall with their current architecture in the same way that Intel has historically with its various architectures - their current generation's power and cooling requirements are requiring the construction of entirely new datacenters with different architectures, which is going to blow out the economics on inference (GPU + datacenter + power plant + nuclear fusion research division + lobbying for datacenter land + water rights + ...).

The story with Intel around these times was usually that AMD or Cyrix or ARM or Apple or someone else would come around with a new architecture that was a clear generation jump past Intel's, and most importantly seemed to break the thermal and power ceilings of the Intel generation (at which point Intel typically fired their chip design group, hired everyone from AMD or whoever, and came out with Core or whatever). Nvidia effectively has no competition, or hasn't had any - nobody's actually broken the CUDA moat, so neither Intel nor AMD nor anyone else is really competing for the datacenter space, so they haven't faced any actual competitive pressure against things like power draws in the multi-kilowatt range for the Blackwells.

The reason this matters is that LLMs are incredibly nifty often useful tools that are not AGI and also seem to be hitting a scaling wall, and the only way to make the economics of, eg, a Blackwell-powered datacenter make sense is to assume that the entire economy is going to be running on it, as opposed to some useful tools and some improved interfaces. Otherwise, the investment numbers just don't make sense - the gap between what we see on the ground of how LLMs are used and the real but limited value add they can provide and the actual full cost of providing that service with a brand new single-purpose "AI datacenter" is just too great.

So this is a press release, but any time I see something that looks like an actual new hardware architecture for inference, and especially one that doesn't require building a new building or solving nuclear fusion, I'll take it as a good sign. I like LLMs, I've gotten a lot of value out of them, but nothing about the industry's finances add up right now.

kuil009•48m ago
Thanks for this. It put into words a lot of the discomfort I’ve had with the current AI economics.
bigyabai•45m ago
> but nothing about the industry's finances add up right now.

The acquisitions do. Remember Groq?

petesergeant•38m ago
> nothing about the industry's finances add up right now

Nothing about the industry’s finances, or about Anthropic and OpenAI’s finances?

I look at the list of providers on OpenRouter for open models, and I don’t believe all of them are losing money. FWIW Anthropic claims (iirc) that they don’t lose money on inference. So I don’t think the industry or the model of selling inference is what’s in trouble there.

I am much more skeptical of Anthropic and OpenAI’s business model of spending gigantic sums on generating proprietary models. Latest Claude and GPT are very very good, but not better enough than the competition to justify the cash spend. It feels unlikely that anyone is gonna “winner takes all” the market at this point. I don’t see how Anthropic or OpenAI’s business model survive as independent entities, or how current owners don’t take a gigantic haircut, other than by Sam Altman managing to do something insane like reverse acquiring Oracle.

EDIT: also feels like Musk has shown how shallow the moat is. With enough cash and access to exceptional engineers, you can magic a frontier model out of the ether, however much of a douche you are.

flyinglizard•30m ago
You’re right but Nvidia enjoys an important advantage Intel had always used to mask their sloppy design work: the supply chain. You simply can’t source HBMs at scale because Nvidia bought everything, TSMC N3 is likewise fully booked and between Apple and Nvidia their 18A is probably already far gone and if you want to connect your artisanal inference hardware together then congratulations, Nvidia is the leader here too and you WILL buy their switches.

As for the business side, I’ve yet to hear of a transformative business outcome due to LLMs (it will come, but not there yet). It’s only the guys selling the shovels that are making money.

This entire market runs on sovereign funds and cyclical investing. It’s crazy.

segmondy•26m ago
> The reason this matters is that LLMs are incredibly nifty often useful tools that are not AGI and also seem to be hitting a scaling wall

I don't know who needs to hear this, but the real break through in AI that we have had is not LLMs, but generative AI. LLM is but one specific case. Furthermore, we have hit absolutely no walls. Go download a model from Jan 2024, another from Jan 2025 and one from this year and compare. The difference is exponential in how well they have gotten.

re-thc•19m ago
> I am of the opinion that Nvidia's hit the wall with their current architecture

Not likely since TSMC has a new process with big gains.

> The story with Intel

Was that their fab couldn’t keep up not designs.

nl•19m ago
> I am of the opinion that Nvidia's hit the wall with their current architecture

Based on what?

Their measured performance on things people care about keep going up, and their software stack keeps getting better and unlocking more performance on existing hardware

Inference tests: https://inferencemax.semianalysis.com/

Training tests: https://www.lightly.ai/blog/nvidia-b200-vs-h100

https://newsletter.semianalysis.com/p/mi300x-vs-h100-vs-h200... (only H100, but vs AMD)

> but nothing about the industry's finances add up right now

Is that based just on the HN "it is lots of money so it can't possibly make sense" wisdom? Because the released numbers seem to indicate that inference providers and Anthropic are doing pretty well, and that OpenAI is really only losing money on inference because of the free ChatGPT usage.

Further, I'm sure most people heard the mention of an unnamed enterprise paying Anthropic $5000/month per developer on inference(!!) If a company if that cost insensitive is there any reason why Anthropic would bother to subsidize them?

richwater•39m ago
This is from September 2025, what's new?
sanxiyn•36m ago
What's new is HN discovered it. It wasn't posted in September 2025.
nl•12m ago
So inference only and slower than B200s?

Maybe they are cheap.

Key US Power Grid [PJM] Cuts Demand Outlook on Overstated AI Boom

https://www.bloomberg.com/news/articles/2026-01-14/biggest-us-power-grid-cuts-demand-outlook-on-o...
1•toomuchtodo•39s ago•0 comments

Social media time does not increase teenagers' mental health problems – study

https://www.theguardian.com/media/2026/jan/14/social-media-time-does-not-increase-teenagers-menta...
1•sien•3m ago•0 comments

The Inelastic Markets Hypothesis [pdf]

https://r.jordan.im/download/investing/gabaix2020.pdf
1•luu•9m ago•0 comments

The Single-Click Microsoft Copilot Attack That Silently Steals Personal Data

https://www.varonis.com/blog/reprompt
2•extesy•13m ago•0 comments

DeepSeek's technical papers show frontier innovation

https://www.scmp.com/tech/tech-trends/article/3339769/deepseek-stays-mum-next-ai-model-release-te...
2•nsoonhui•18m ago•0 comments

"Don't worry. Boys are hard to find." Trump/Epstein and... Criminal Enterprises

https://lisevoldeng.substack.com/p/dont-worry-boys-are-hard-to-find
2•Tadpole9181•19m ago•0 comments

We built a browser with GPT-5.2 in Cursor

https://xcancel.com/mntruell/status/2011562190286045552?s=20
2•aaraujo002•20m ago•0 comments

Show HN: Commosta – marketplace to share computing resources

https://www.commosta.io/
1•gkm25•20m ago•0 comments

JSON Render

https://json-render.dev/
1•handfuloflight•22m ago•0 comments

Show HN: IMSAI/Altair inspired microcomputer with web emulator

https://gzalo.github.io/microcomputer/
1•gzalo•23m ago•0 comments

Skillshare: Sync skills to all your AI CLI tools with one command

https://github.com/runkids/skillshare
1•handfuloflight•30m ago•0 comments

Show HN: Chklst – A Minimalist Checklist

https://www.chklst.xyz/
2•rgbjoy•32m ago•0 comments

Opinion: Why tech leaders can't regulate AI before releasing them?

1•lauraorchid•33m ago•1 comments

Vibe Coding Paradox

https://blog.kaplich.me/vibe-coding-paradox/
1•skaplich•33m ago•0 comments

Show HN: I built a satellite forensic engine to detect fraud in Carbon Markets

1•kccanarch•33m ago•1 comments

Google is shutting down the Tenor API

https://www.reddit.com/r/webdev/s/ZjlFO8kiW4
1•kull•34m ago•2 comments

Bubblewrap: A nimble way to prevent agents from accessing your .env files

https://patrickmccanna.net/a-better-way-to-limit-claude-code-and-other-coding-agents-access-to-se...
2•0o_MrPatrick_o0•36m ago•0 comments

Is passive investment inflating a stockmarket bubble?

https://www.economist.com/finance-and-economics/2026/01/14/is-passive-investment-inflating-a-stoc...
23•andsoitis•37m ago•25 comments

I beat Factorio on 1k Floppy disks [video]

https://www.youtube.com/watch?v=cTPBGZcTRqo
1•simonpure•37m ago•1 comments

ISS astronauts return to Earth early due to illness of crew member

https://www.cbc.ca/news/science/nasa-crew11-early-return-9.7045315?cmp=rss
2•gnabgib•39m ago•0 comments

2025 Berggruen Prize Essay Competition Winners

https://berggruen.org/eu/news/2025-berggruen-prize-essay-competition-winners
2•i7l•39m ago•0 comments

AgentDiscover Scanner – Multi-layer AI agent detection (code, network, K8s eBPF)

https://github.com/Defend-AI-Tech-Inc/agent-discover-scanner
1•DefendAI•39m ago•0 comments

Skrillex Releases Kora

https://skrlx.com/
2•Lucasoato•46m ago•0 comments

Kutt.ai – Free AI Video Generator, Text and Image to Video

https://kutt.ai/
2•zuoning•46m ago•2 comments

Personal Intelligence: Connecting Gemini to Google Apps

https://blog.google/innovation-and-ai/products/gemini-app/personal-intelligence/
1•simonpure•47m ago•1 comments

Mapping Nostr keys to DNS-based internet identifiers

https://github.com/nostr-protocol/nips/blob/master/05.md
1•gjvc•55m ago•0 comments

WAPlus' Guide to WhatsApp CRM

https://waplus.io/blog/whatsapp-crm
2•bocaiconnie•56m ago•1 comments

Verizon Is Down

https://www.macrumors.com/2026/01/14/verizon-is-down-iphone-sos/
7•vapemaster•59m ago•4 comments

Verizon outage today (but not on their map)

https://www.verizon.com/about/california-outage-map
5•kalu•1h ago•3 comments

Show HN: Quick Beats – minimalistic webapp (mobile and desktop) drum machine

https://alganet.github.io/quick-beats/
2•gaigalas•1h ago•0 comments