frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
591•klaussilveira•11h ago•170 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
896•xnx•16h ago•544 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
93•matheusalmeida•1d ago•22 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
20•helloplanets•4d ago•13 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
26•videotopia•4d ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
200•isitcontent•11h ago•24 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
199•dmpetrov•11h ago•91 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
312•vecti•13h ago•136 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
353•aktau•17h ago•176 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
22•romes•4d ago•3 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
354•ostacke•17h ago•92 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
458•todsacerdoti•19h ago•229 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
7•bikenaga•3d ago•1 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
80•quibono•4d ago•18 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
256•eljojo•14h ago•154 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
53•kmm•4d ago•3 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
390•lstoll•17h ago•263 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
231•i5heu•14h ago•177 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
120•SerCe•7h ago•98 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
136•vmatsiiako•16h ago•59 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
68•phreda4•10h ago•12 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
13•neogoose•4h ago•7 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
25•gmays•6h ago•7 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
44•gfortaine•9h ago•13 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
271•surprisetalk•3d ago•37 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1043•cdrnsf•20h ago•431 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
171•limoce•3d ago•90 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
60•rescrv•19h ago•22 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
89•antves•1d ago•64 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments
Open in hackernews

CUDA Ontology

https://jamesakl.com/posts/cuda-ontology/
271•gugagore•2mo ago

Comments

ArcHound•2mo ago
That is a great reference, explains a lot of small inaccuracies between various tutorials when you're trying to debug some of these issues. Saved and printed, thanks a lot!
visarga•2mo ago
Wondering why a $4T company can't afford a smart installation assistant that can auto-detect problems and apply fixes as needed. I wasted too many days chasing driver and torch versions. It's probably the worst part of working in ML. Combine this with Python's horrible package management and you got a perfect combo - like the cough and the stitch.
numbers_guy•2mo ago
They provide containers to cater to those needs: https://catalog.ngc.nvidia.com/search
threeducks•2mo ago
After being once again frustrated by the CUDA installation experience, I thought that I should give those containers a try. Unfortunately, my computer did not boot anymore after following the installation instructions for the NVIDIA container toolkit as outlined on the NVIDIA website. Reinstalling everything and following the instructions from some random blog post made it work, but I then found that the container with the CUDA version that I needed had been deprecated.

There were other problems, such as the research cluster of my university not having Docker, but that is a different issue.

YetAnotherNick•2mo ago
Containers don't include drivers which is the primary reason for issues.
torginus•2mo ago
Containers afair rely on the exact driver version matching between the host system and the container itself.

We were on AWS when we used this so setting up seemed easy enough - AWS gave you the driver, and a matching docker image was easy enough to find.

kcb•2mo ago
That's not the case, CUDA containers user space does not have to match the host drivers CUDA capability. The container needs to be the same major version or lower. So a system with a CUDA 13 capable driver should be able to run all previous versions.

For some versions there's even sometimes compat layers built into the container to allow forward version compatibility.

fragmede•2mo ago
Just have claude code fix it
ux266478•2mo ago
I'm wondering how a $4T company got away with shipping the absolute state of the toolchain to begin with. They have total and complete sovereignty on everything on the outside of the OS and PCIe boundaries with a bottomless pool of top class labor. There's no reason it has to be cruftier or more fragile than any other low latency networked computation... and yet here we are. AMD isn't any better. I'm almost interested to see if Intel has done any better with L0, but I highly suspect it suffers from the exact same ecosystem hell problems that plague the other two.

The idea that getting a PCIe FPGA board to crunch numbers is less headache prone than a GPU is laughable, but that's the absurd reality we live in.

w-m•2mo ago
This is a good resource. But for the computer vision and machine learning practitioner most of the fun can start where this article ends.

nvcc from the CUDA toolkit has a compatibility range with the underlying host compilers like gcc. If you install a newer CUDA toolkit on an older machine, likely you'll need to upgrade your compiler toolchain as well, and fix the paths.

While orchestration in many (research) projects happens from Python, some depend on building CUDA extensions. An innocently looking Python project may not ship the compiled kernels and may require a CUDA toolkit to work correctly. Some package management solutions provide the ability to install CUDA toolkits (conda/mamba, pixi), the pure-Python ones do not (pip, uv). This leaves you to match the correct CUDA toolkit to your Python environment for a project. conda specifically provides different channels (default/nvidia/pytorch/conda-forge), from conda 4.6 defaulting to a strict channel priority, meaning "if a name exists in a higher-priority channel, lower ones aren't considered". The default strict priority can make your requirements unsatisfiable, even though there would be a version of each required package in the collection of channels. uv is neat and fast and awesome, but leaves you alone in dealing with the CUDA toolkit.

Also, code that compiles with older CUDA toolkit versions may not compile with newer CUDA toolkit versions. Newer hardware may require a CUDA toolkit version that is newer than what the project maintainer intended. PyTorch ships with a specific CUDA runtime version. If you have additional code in your project that also is using CUDA extensions, you need to match the CUDA runtime version of your installed PyTorch for it to work. Trying to bring up a project from a couple of years ago to run on latest hardware may thus blow up on you on multiple fronts.

anotherpaul•2mo ago
Yes, this is the actual lived reality. Thank you for outlining it so well.
eapriv•2mo ago
Sounds like most of these problems come from using Python.
mellosouls•2mo ago
You imply these problems would go away (or wouldn't be replaced by new ones) with another language.
eapriv•2mo ago
Removing layers usually improves stability.
alecco•2mo ago
> nvcc from the CUDA toolkit has a compatibility range with the underlying host compilers like gcc. If you install a newer CUDA toolkit on an older machine, likely you'll need to upgrade your compiler toolchain as well, and fix the paths.

Conversely, nvcc often stops working with major upgrades of gcc/clang. Fun times, indeed.

This is why a lot of people just use NVIDIA's containers even for local solo dev. It's a hassle to set up initially (docker/podman hell) but all the tools are there and they work fine.

embedding-shape•2mo ago
> This is why a lot of people just use NVIDIA's containers even for local solo dev. It's a hassle to set up initially (docker/podman hell) but all the tools are there and they work fine.

Yeah, which I feel like is fine for one project, or one-offs, but once you've accumulated projects, having individual 30GB images for each of them quickly adds up.

I found that most of my issues went away as I started migrating everything to `ux` for the python stuff, and nix for everything system related. Now I can finally go back to a 1 year old ML project, and be sure it'll run like before, and projects share a bit more data.

jcelerier•2mo ago
Yep, right now nvidia libs are broken with clang-21 and recent glibc due to stuff like rsqrt() having throw() in the declaration and not in the definition
the__alchemist•2mo ago
What trouble have you had specifically? On both Win and Linux, installing the CUDA toolkit (e.g. v13) just works for me. My use case is compiling kernels (or cuFFT FFI) using nvcc for FFI in rust programs and libs.
billti•2mo ago
> Also, code that compiles with older CUDA toolkit versions may not compile with newer CUDA toolkit versions. Newer hardware may require a CUDA toolkit version that is newer than what the project maintainer intended.

This is the part I find confusing, especially as NVIDIA doesn't make it easy to find and download the old toolkits. Is this effectively saying that just choosing the right --arch and --code flags isn't enough to support older versions? But that as it statically links in the runtime library (by default) that newer toolkits may produce code that just won't run on older drivers? In other words, is it true that to support old hardware you need to download and use old CUDA Toolkits, regardless of nvcc flags? (And to support newer hardware you may need to compile with newer toolkits).

That's how I read it, which seems unfortunate.

zvr•2mo ago
Great explanation!

It should probably also add that everything CUDA is owned by NVIDIA, and "CUDA" itself is a registered trademark. The official way to refer to it is that the first time you spell it out as "NVIDIA® CUDA®" and then subsequently refer to just CUDA.

threeducks•2mo ago
Why should the author use the registered trademark symbol?
xpe•2mo ago
I am not a layer (IANAL), but here is what Gemini 3 Pro says: "You generally do not need to use the trademark symbol for CUDA in a blog post, unless you have a specific commercial relationship with NVIDIA."

Now direct from actual sources... From [1]

> Intended users of this Brand Guideline are members of the NVIDIA Partner Network (NPN), including original equipment manufacturers (OEMs), solution advisors, cloud partners, solution providers, distributors, solutions integrators, and service delivery partners.

From [2]:

> Always include the correct trademark (™ vs ®) by referring to the content documents provided or using the list of common NVIDIA products and technologies. After the first mention of the NVIDIA product or technology, which includes the appropriate trademarks, the trademark does not need to be included in future mentions within the same document, article, etc.

> CUDA®

[1]: https://brand.nvidia.com/d/wGtgoY2mtYYM/nvidia-partner-netwo...

[2]: https://brand.nvidia.com/d/wGtgoY2mtYYM/nvidia-partner-netwo...

zvr•2mo ago
Ah, my point (which I failed to make, obviously) was not regarding the post.

It was more like "the whole CUDA stuff is by a single company".

pjmlp•2mo ago
Great overview, with lots of effort place into it.

However, it misses the polyglot part (Fortran, Python GPU JIT, all the backends that support PTX), the library ecosystem (writing CUDA kernels should be the exception not the rule), the graphical debugging tools and IDE integration.

virajk_31•2mo ago
thanks for the kernel nomenclatures
bbx•2mo ago
For reference: CUDA means "Compute Unified Device Architecture".
montyanderson•2mo ago
this is fantastic
coffeeaddict1•2mo ago
I wish GPU vendors would stick to a standard terminology, at least for common parts. It's really confusing having to deal with warps vs wavefronts vs simd groups, thread block vs workgroup, streaming multiprocessor vs compute unit vs execution unit, etc...
RYJOX•2mo ago
Interesting, does this approach change with out-of-order cores? In fact maybe I misunderstand lol
NullCascade•2mo ago
What is the cheapest CUDA-enabled VM providers one can use to learn CUDA?
eamag•2mo ago
Lightning.ai
dahart•2mo ago
This article has good info, but is the overloading premise slightly contrived? Maybe I don’t talk to enough CUDA beginners. I work with CUDA a lot but I’m not exactly a CUDA expert, and from my perspective, in practice there are default assumptions one can safely make for the base terms, and people do qualify the alternatives almost always. For example, if someone says “CUDA version”, they always mean the toolkit, and never mean compute capability, runtime, or language. The term “driver” when used without qualification always means the display driver, and never means the driver API, there really is no overload there.
einpoklum•2mo ago
I actually find it is pretty easy to get confused between the different kinds of versions. For example:

"The CUDA "driver version" looks like the CUDA runtime version - so what's the difference?" https://stackoverflow.com/q/40589814/1593077

or consider the version you get when you run nvidia-smi, versus the version you get when you run nvcc --version. Those are very different numbers...

The compatibility between different versions of the driver and the toolkit is also a cause for some headaches in my experience.

dahart•2mo ago
Oh yeah, knowing the difference between the runtime API and driver API is definitely an issue, and there is common confusion around that. But that’s not an overloaded word problem, right? I wasn’t trying to say there’s no confusion, and I do think understanding the terms in the article is super helpful. To your point, I think there’s justification for needing a codex like this article has without framing it as an overloaded terminology problem.
the__alchemist•2mo ago
I suspect you're colored by your experience, despite your modesty about it. To you or I, "CUDA version" probably means something like 'v13' or w/e of the "CUDA toolkit", which you know means the user running your code needs an "nvidia driver" = "580" or higher.

I wouldn't have been able to tell you this a few months ago, and it was confusing! Machine that compiles vs machine that runs, CUDA toolkit which includes both vs nvidia driver which just includes one part of it etc... The article explicitly describes this.

einpoklum•2mo ago
> CUDA Runtime: The runtime library (libcudart) that applications link against.

That library is actually a rather poor idea. If you're writing a CUDA application, I strongly recommend avoiding the "runtime API". It provides partial access to the actual CUDA driver and its API, which is 'simpler' in the sense that you don't explicitly create "contexts", but:

* It hides or limits a lot of the functionality.

* Its actual behavior vis-a-vis contexts is not at all simple and is likely to make your life more difficult down the road.

* It's not some clean interface that's much more convenient to use.

So, either go with the driver, or consider my CUDA API wrappers library [1], which _does_ offer a clean, unified, modern (well, C++11'ish) RAII/CADRe interface. And it covers much more than the runtime API, to boot: JIT compilation of CUDA (nvrtc) and PTX (nvptx_compiler), profiling (nvtx), etc.

> Driver API ... provides direct access to GPU functionality.

Well, I wouldn't go that far, it's not that direct. Let's call it: "Less indirect"...

[1] : https://github.com/eyalroz/cuda-api-wrappers/

nickysielicki•2mo ago
If you do this, you forego both backwards and forwards compatibility. You must follow the driver release cadence exactly, and rebuild all of your code for every driver you want to support when a new release happens, or you risk subtle breakage. NVIDIA guarantees nothing in terms of breakage for you.

Probably the worst part of this: for the most part, in practice, it will work just fine. Until it doesn’t. You will have lots of fun debugging subtle bugs in a closed-source black box, which reproduces only against certain driver API header versions, which potentially does not match the version of the actual driver API DSO you’ve dlopened, and which only produces problems when mixed with certain Linux kernel versions.

(I have the exact opposite opinion; people reach too eagerly for the driver API when they don’t need it. Almost everything that can be done with the driver api can be done with the runtime API. If you absolutely must use the driver API, which I doubt, you should at least resolve the function pointers through cudaGetDriverEntrypointByVersion.)

einpoklum•2mo ago
Disagree, because:

* The Runtime API is also a black box - it's just differently shaped.

* CUDA runtime APIs are also incompatible with CUDA drivers which are significantly older. Although TBH I have not checked that compatibility range recently.

* C++ is a compiled language. So, yes, in some cases, you need to recompile. But - less than your might think. Specifically, the driver API headers use macros to direct your API function names to versioned names. For example:

    #define cuStreamGetCaptureInfo              __CUDA_API_PTSZ(cuStreamGetCaptureInfo_v3)

  and this versioned function will typically be available also when the signature changes to v4 (in this example, it seems two versions backwards are available in CUDA 13.0).
* ... meaning also that you don't have to "follow the driver release cadence exactly". But even if you want to follow it - there's a change every couple of years: a major CUDA version is released, and instead of functionality getting added, the API changes. And as for actual under-the-hoold behavior while observing the same API - that can change whether you're using the driver or the runtime API.

* Finally, if you want something more stable, more portable, that doesn't change frequently - OpenCL can also be considered rather than CUDA.

Nydhal•2mo ago
This is a classic lesson. You can write almost the same article for Java: language vs bytecode vs JVM vs JDK vs libs ...
scotty79•2mo ago
> This article provides a rigorous ontology of CUDA components: a systematic description of what exists in the CUDA ecosystem, how components relate to each other, their versioning semantics, compatibility rules, and failure modes.

That's the first instance in my life when somebody coherently described what the word 'ontology' means. I'm sure this explanation is wrong, but still...