frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Seedance 2.0 Release

https://seedancy2.com/
1•funnycoding•24s ago•0 comments

Leisure Suit Larry's Al Lowe on model trains, funny deaths and Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
1•thelok•27s ago•0 comments

Towards Self-Driving Codebases

https://cursor.com/blog/self-driving-codebases
1•edwinarbus•45s ago•0 comments

VCF West: Whirlwind Software Restoration – Guy Fedorkow [video]

https://www.youtube.com/watch?v=YLoXodz1N9A
1•stmw•1m ago•1 comments

Show HN: COGext – A minimalist, open-source system monitor for Chrome (<550KB)

https://github.com/tchoa91/cog-ext
1•tchoa91•2m ago•0 comments

FOSDEM 26 – My Hallway Track Takeaways

https://sluongng.substack.com/p/fosdem-26-my-hallway-track-takeaways
1•birdculture•3m ago•0 comments

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
1•ivanglpz•6m ago•0 comments

Show HN: Almostnode – Run Node.js, Next.js, and Express in the Browser

https://almostnode.dev/
1•PetrBrzyBrzek•6m ago•0 comments

Dell support (and hardware) is so bad, I almost sued them

https://blog.joshattic.us/posts/2026-02-07-dell-support-lawsuit
1•radeeyate•7m ago•0 comments

Project Pterodactyl: Incremental Architecture

https://www.jonmsterling.com/01K7/
1•matt_d•7m ago•0 comments

Styling: Search-Text and Other Highlight-Y Pseudo-Elements

https://css-tricks.com/how-to-style-the-new-search-text-and-other-highlight-pseudo-elements/
1•blenderob•9m ago•0 comments

Crypto firm accidentally sends $40B in Bitcoin to users

https://finance.yahoo.com/news/crypto-firm-accidentally-sends-40-055054321.html
1•CommonGuy•10m ago•0 comments

Magnetic fields can change carbon diffusion in steel

https://www.sciencedaily.com/releases/2026/01/260125083427.htm
1•fanf2•11m ago•0 comments

Fantasy football that celebrates great games

https://www.silvestar.codes/articles/ultigamemate/
1•blenderob•11m ago•0 comments

Show HN: Animalese

https://animalese.barcoloudly.com/
1•noreplica•11m ago•0 comments

StrongDM's AI team build serious software without even looking at the code

https://simonwillison.net/2026/Feb/7/software-factory/
2•simonw•12m ago•0 comments

John Haugeland on the failure of micro-worlds

https://blog.plover.com/tech/gpt/micro-worlds.html
1•blenderob•12m ago•0 comments

Show HN: Velocity - Free/Cheaper Linear Clone but with MCP for agents

https://velocity.quest
2•kevinelliott•13m ago•2 comments

Corning Invented a New Fiber-Optic Cable for AI and Landed a $6B Meta Deal [video]

https://www.youtube.com/watch?v=Y3KLbc5DlRs
1•ksec•14m ago•0 comments

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

https://xapis.dev
2•nmfccodes•15m ago•1 comments

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

https://psychotechnology.substack.com/p/near-instantly-aborting-the-worst
2•eatitraw•21m ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
2•anipaleja•21m ago•0 comments

The Super Sharp Blade

https://netzhansa.com/the-super-sharp-blade/
1•robin_reala•22m ago•0 comments

Smart Homes Are Terrible

https://www.theatlantic.com/ideas/2026/02/smart-homes-technology/685867/
2•tusslewake•24m ago•0 comments

What I haven't figured out

https://macwright.com/2026/01/29/what-i-havent-figured-out
1•stevekrouse•25m ago•0 comments

KPMG pressed its auditor to pass on AI cost savings

https://www.irishtimes.com/business/2026/02/06/kpmg-pressed-its-auditor-to-pass-on-ai-cost-savings/
1•cainxinth•25m ago•0 comments

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

https://twitter.com/b1rdmania/status/2020155122181869666
3•birdmania•25m ago•1 comments

First Proof

https://arxiv.org/abs/2602.05192
8•samasblack•27m ago•4 comments

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

https://mohammedeabdelaziz.github.io/articles/trendscope-market-scanner
1•mohammede•28m ago•0 comments

Kagi Translate

https://translate.kagi.com
2•microflash•29m ago•0 comments
Open in hackernews

Nvidia Nemotron 3 Family of Models

https://research.nvidia.com/labs/nemotron/Nemotron-3/
257•ewt-nv•1mo ago

Comments

Y_Y•1mo ago
Wow, Nvidia keepson pushing the frontier of misleading benchmarks
pants2•1mo ago
If it's intelligence + speed you want, nothing comes close to GPT-OSS-120B on Cerebras or Groq.

However, this looks like it has great potential for cost-effectiveness. As of today it's free to use over API on OpenRouter, so a bit unclear what it'll cost when it's not free, but free is free!

https://openrouter.ai/nvidia/nemotron-3-nano-30b-a3b:free

viraptor•1mo ago
> nothing comes close to GPT-OSS-120B on Cerebras

That's temporary. Cerebras speeds up everything, so if Nemotron is good quality, it's just a matter of time until they add it.

credit_guy•1mo ago
That's unlikely. Cerebras doesn't speed up everything. Can it speed up everything? I don't know, I'm not an insider. But does it speed up everything? That is evidently not the case. Their page [1] lists only 4 production models and 2 preview models.

[1] https://inference-docs.cerebras.ai/models/overview

ahmadyan•1mo ago
They need to compile the model for their chips. Standard transformers are easier, so GPT-OSS, Qwen, GLM, etc if there is demand, they will deploy it.

Nemotron on the other hand is a hybrid (Transformer + Mamba-2) so it will be more challenging to compile it on Cerebras/Groq chips.

(Me thinks Nvidia is purposefully picking architecture+FP4 that is easy to ship on Nvidia chips, but harder for TPU or Cerebras/Groq to deploy)

red2awn•1mo ago
Very interesting release:

* Hybrid MoE: 2-3x faster than pure MoE transformers

* 1M context length

* Trained on NVFP4

* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)

* Open model training recipe (coming soon)

Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.

Also interesting that the model is trained in NVFP4 but the inference weights are FP8.

bcatanzaro•1mo ago
The Nano model isn’t pretrained in FP4, only Super and Ultra are. And posttraining is not in FP4, so the posttrained weights of these models are not native FP4.
wcallahan•1mo ago
I don’t do ‘evals’, but I do process billions of tokens every month, and I’ve found these small Nvidia models to be the best by far for their size currently.

As someone else mentioned, the GPT-OSS models are also quite good (though I haven’t found how to make them great yet, though I think they might age well like the Llama 3 models did and get better with time!).

But for a defined task, I’ve found task compliance, understanding, and tool call success rates to be some of the highest on these Nvidia models.

For example, I have a continuous job that evaluates if the data for a startup company on aVenture.vc could have overlapping/conflated two similar but unrelated companies for news articles, research details, investment rounds, etc… which is a token hungry ETL task! And I recently retested this workflow on the top 15 or so models today with <125b parameters, and the Nvidia models were among the best performing for this type of work, particularly around non-hallucination if given adequate grounding.

Also, re: cost - I run local inference on several machines that run continuously, in addition to routing through OpenRouter and the frontier providers, and was pleasantly surprised to find that if I’m a paying customer of OpenRouter otherwise, the free variant there from Nvidia is quite generous for limits, too.

btown•1mo ago
Would you mind sharing what hardware/card(s) you're using? And is https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B... one of the ones you've tested?
heavyset_go•1mo ago
Support for this landed in llama.cpp recently if anyone is interested in running it locally.
wcallahan•1mo ago
Yes, I run it locally on 3 different AMD Strix Halo machines (Framework Desktop and 2 GMKTec machines, 128gb x 2, 96gb x 1) and a Mac Studio M2 Ultra 128gb of unified memory.

I’ve used several runtimes, including vLLM. Works great! Speedy. Best results with Ubuntu after trying a few different distributions and Vulkan and ROCm drivers.

andy99•1mo ago
What do you mean about not doing evals? Just literally that you don’t run any benchmarks or do you have something against them?
woodson•1mo ago
Not OP, but perhaps they mean not putting too much faith in common benchmarks (thanks to benchmaxxing).
wcallahan•1mo ago
Yes to both comments. I said that to:

1. disclose my method was not quantifiably measurable as the not model, because that is not important to me, speed of action/development outcomes is more important to me, and because

2. I’ve observed a large gap between benchmark toppers and my own results

But make no mistake, I like have the terminals scrolling live across multiple monitors so I can glance at them periodically and watch their response quality, so I care and notice which give better/worse results.

My biggest goal right now after accuracy is achieving more natural human-like English for technical writing.

danielmarkbruce•1mo ago
He's just saying anecdotally these models are good. A reasonable response might be "have you systematically evaluated them?". He has pre-answered - no.
kgeist•1mo ago
>the GPT-OSS models are also quite good

I recently pitted gpt-oss 120b against Qwen3-Next 80b on a lot of internal benchmarks (for production use), and for me, gpt-oss was slightly slower (vLLM, both fit in VRAM), much worse at multilingual tasks (33 languages evaluated), and had worse instruction following (e.g., Qwen3-Next was able to reuse the same prompts I used for Gemma3 perfectly, while gpt-oss struggled and RAG benchmarks suddenly went from 90% to 60% without additional prompt engineering).

And that's with Qwen3-Next being a random unofficial 4-bit quant (compared to gpt-oss having native support) + I had to disable multi-token prediction in Qwen3-Next because vLLM crashed with it.

Has someone here tried both gpt-oss 120b and Qwen3-Next 80b? Maybe I was doing something wrong because I've seen a lot of people praise gpt-oss.

scrlk•1mo ago
gpt-oss is STEM-maxxed, so I imagine most of the praise comes from people using it for agentic coding.

> We trained the models on a mostly English, text-only dataset, with a focus on STEM, coding, and general knowledge.

https://openai.com/index/introducing-gpt-oss/

dandelionv1bes•1mo ago
Completely agree. I was working on something with TensorRT LLM and threw Nemotron in there more on a whim. It completely mopped the floor with other models for my task (text style transfer), following joint moderation with another LLM & humans. Really impressed.
selfhoster11•1mo ago
You may want to use the new "derestricted" variants of gpt-oss. While the ostensible goal of these variants is to de-censor them, it ends up removing the models' obsession with policy and wasting thinking tokens that could be used towards actually reasoning through a problem.
wcallahan•1mo ago
Great advice. Have you observed any other differences? I’ve been wondering if there are any specialized variants yet of GPT-OSS models yet that outperform on specific tasks (similar to the countless Llama 3 variants we’ve seen).
max002•1mo ago
Im upvoting, im happy to finally see open source model with commercial use from Nvidia as most of the models ive been checking from you guys couldnt be used in commercial settings. Bravo Nvidia!
teleforce•1mo ago
Just wondering is any commercial restriction can be considered open source at all? Even the most stringent GPL allows you to commercialize [1].

But we are talking about LLM model here not software, but the same principle should applies.

[1] Open-source license:

https://en.wikipedia.org/wiki/Open-source_license

kristianp•1mo ago
The article seem to focus on the nano model. Where are the details of the larger ones?
shikon7•1mo ago
> We are releasing the Nemotron 3 Nano model and technical report. Super and Ultra releases will follow in the coming months.
jtbayly•1mo ago
Any chance of running this nano model on my Mac?
netghost•1mo ago
Kind of depends on your mac, but if it's a relatively recent apple silicon model… maybe, probably?

> Nemotron 3 Nano is a 3.2B active (3.6B with embeddings) 31.6B total parameter model.

So I don't know the exact math once you have a MoE, but 3.2b will run on most anything, 31.6b and you're looking at needing a pretty large amount of ram.

vessenes•1mo ago
Given Mac bandwidth, you'll generally want to load the whole thing in RAM. You get speed benefits based on smaller-size active experts, since the Mac compute is slow compared to Nvidia hardware. This should be relatively snappy on a Mac, if you can load the entire thing.
axoltl•1mo ago
There's MLX versions of the model, so yes. LM Studio hasn't updated their mlx-lm runtime yet though, you'll get an exception.

But if you're OK running it without a UI wrapper, mlx_lm==0.30.0 will serve you fine.

anon373839•1mo ago
Looks like LM Studio just updated the MLX runtime, so there's compatibility now.
axoltl•1mo ago
Yep! 60t/s on the 8 bit MLX on an M4 Pro with 64GB of RAM.
mark_l_watson•1mo ago
I used Nemotron 3 nana on LM Studio yesterday on my 32G M2-Pro mac mini. It is fast and passed all of my personal tool use tests, and did a good job analyzing code. Love it.

Today I ran a few simple cases on Ollama, but not much real testing.

jonrosner•1mo ago
running it on my M4 @ 90tps, takes 18GB of RAM.
pylotlight•1mo ago
M2 Max @ 17tps btw
Tepix•1mo ago
If it uses 18GB of RAM, you're not using the official model (released in BF16 and FP8), but a quantization of unknown quality.

If you write "M4", you mean M4 and not M4 Pro or M4 Max?

keyle•1mo ago
LMStudio and 32+ gb of RAM.

https://lmstudio.ai/models/nemotron-3

Simplest to just install it from the app.

sosodev•1mo ago
I love how detailed and transparent the data set statistics are on the huggingface pages. https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...

I've noticed that open models have made huge efficiency gains in the past several months. Some amount of that is explainable as architectural improvements but it seems quite obvious that a huge portion of the gains come from the heavy use of synthetic training data.

In this case roughly 33% of the training tokens are synthetically generated by a mix of other open weight models. I wonder if this trend is sustainable or if it might lead to model collapse as some have predicted. I suspect that the proliferation of synthetic data throughout open weight models has lead to a lot of the ChatGPT writing style replication (many bullet points, em dashes, it's not X but actually Y, etc).

kristopolous•1mo ago
I was just using the embeddings model last night. Boy is it slow. Nice results but this 5090 isn't cutting it.

I'm guessing there's some sophistication in the instrumentation I'm just not up to date with.

sosodev•1mo ago
The claim that a small, fast, and decently accurate model makes a good foundation for agentic workloads seems like a reasonable claim.

However, is cost the biggest limiting factor for agent adoption at this point? I would suspect that the much harder part is just creating an agent that yields meaningful results.

all2•1mo ago
This has been my major concern, so much do that I'm going to be launching a tool to handle this specific task: agent conception and testing. There is so little visibility in the tools I've used that debug is just a game of whackamole.
sosodev•1mo ago
Did you see this HN submission? https://news.ycombinator.com/item?id=46242838

It seems similar to what you're describing.

all2•1mo ago
I did not. Thanks for the heads up!
ineedasername•1mo ago
No, I really don't think cost is the limiting factor- it's tooling and competent workforce to implement it. Every company of any substantial size, or near enough, is trying to implement and hire for those roles, and the # of people familiar with the specific tooling + lack of maturity in tooling increasing the learning curve, these are the bottlenecks.
DoctorOetker•1mo ago
can it understand input in and generate output for different language tokens? does it know narrow IPA transcription of sentences in arbitrary languages?
ofermend•1mo ago
We just evaluated Nemotron-3 for Vectara's hallucination leaderboard.

It scores at 9.6% hallucination rate, similar to qwen3-next-80b-a3b-thinking (9.3%) but of course it is much smaller.

https://github.com/vectara/hallucination-leaderboard

radarsat1•1mo ago
I find it really interesting that it uses a Mamba hybrid with Transformers. Is it the only significant model right now using (at least partially) SSM layers? This must contribute to lower VRAM requirements right? Does it impact how KV caching works?
dJLcnYfsE3•1mo ago
I would say it is weird, that NVidia competes with own customers but looking back at "Founders Edition" cards maybe it isn't that weird at all. The better question probably is - with every big corporation having its own LLM, what exactly is OpenAI moat that would explain their valuation?
notyourwork•1mo ago
They and Tesla know something no one else does.
beng-nl•1mo ago
Can you tell us more? I’m curious to hear what is behind this implication.
leobg•1mo ago
A guess:

They both believe the product people focus on will commoditize. Tesla realized early that EVs without autonomy are a dead end for long-term dominance, just as NVIDIA believes models without infrastructure are a dead end for durable AI profits.

(Am I close?)

lukeinator42•1mo ago
I wonder if they also want to create more of a market for their products such as the DGX Spark.
jonrosner•1mo ago
after testing it for a little I am pretty disappointed. While I do get 90 token per second out of it from my M4 Pro which is more than enough for a real world use case, the quality is just not there. I gave it a codebase that it should analyze and answer me some questions and it started hallucinating right away. No replacement for a "real" coding agent - maybe for other agentic work like sorting emails though.
Tepix•1mo ago
Is it just me or is Nvidia trolling hard by calling a model with 30b parameters "nano"? With a bit of context, it doesn't even fit on a RTX 5090.

Other LLMs with the "nano" moniker are around 1b parameters or less.

patpatpat•1mo ago
FWIW It runs on my 9060xt(AMD) 16gb, without any tweaks just fine. It's very useable. I asked it to write a prime sieve in c#, started responding in .38 seconds, and wrote an implementation @ 20 tokens/sec
genpfault•1mo ago
Getting ~150 tok/s on an empty context with a 24 GB 7900XTX via llama.cpp's Vukan backend.
Tepix•1mo ago
Again, you're using some 3rd party quantisations, not the weights supplied by Nvidia (which don't fit in 24GB).
Tepix•1mo ago
But you're using a 3rd party quant of unknown quality. Nvidia is only providing weights as BF16 and FP8.
barrystaes•1mo ago
I wonder what performance remains on 12GB VRAM GPU when local ollama ties in the systems RAM to run this huge nano model.

https://github.com/jameschrisa/Ollama_Tuning_Guide/blob/main...

omneity•1mo ago
Nemotron now works on LM Studio if you update the runtime (from the settings > Runtime screen).

The default chat template is incorrect though and will fail but I published a corrected one you can replace it with: https://gist.github.com/omarkamali/a594b6cb07347f501babed489...

thoughtpeddler•1mo ago
Is it fair to view this release as Nvidia strategically flexing that they can compete with their own customers in the model layer -- that they can be as vertically integrated as, say, GDM?