frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

We Made CUDA Optimization Suck Less

https://www.rightnowai.co/
39•jaberjaber23•1d ago

Comments

jaberjaber23•1d ago
We’re RightNow AI. We built a tool that automatically profiles, detects bottlenecks, and generates optimized CUDA kernels using AI.

If you’ve written CUDA before, you know how it goes. You spend hours tweaking memory access, digging through profiler dumps, swapping out intrinsics, and praying it’ll run faster. Most of the time, you're guessing.

We got tired of it. So we built something that just works.

What RightNow AI Actually Does Prompt-based CUDA Kernel Generation Describe what you want in plain English. Get fast, optimized CUDA code back. No need to know the difference between global and shared memory layouts.

Serverless GPU Profiling Run your code on real GPUs without having local hardware. Get detailed reports about where it's slow and why.

Performance Optimizations That Deliver Not vague advice like “try more threads.” We return rewritten code. Our users are seeing 2x to 4x improvements out of the box. Some hit 20x.

Why We Built It We needed it for our own work. Our ML stack was bottlenecked by GPU code we didn’t have time to optimize. Existing tools felt ancient. The workflow was slow, clunky, and filled with trial and error.

We thought: what if I could just say "optimize this kernel for A100" and get something useful?

So we built it.

RightNow AI is live. You can try it for freee: https://www.rightnowai.co/

If you use it and hit something rough, tell us. We’ll fix it.

paulirish•5h ago
What does one of the GPU profiling reports look like?

Edit: oh is it this? https://youtu.be/b-yh3FFpSX8?t=28

PontifexCipher•5h ago
No examples of before/after? Maybe I missed something.
godelski•5h ago
I was expecting something like TensorRT or Triton, but found "Vibe Coding"

The project seems very naive. CUDA programming sucks because there's a lot of little gotchas and nuances that dramatically change performance. These optimizations can also significantly change between GPU architectures: you'll get different performances out of Volta, Ampere, or Blackwell. Parallel programming is hard in the first place, and it gets harder on GPUs because of all these little intricacies. People that have been doing CUDA programming for years are still learning new techniques. It takes a very different type of programming skill. Like actually understanding that Knuth's "premature optimization is the root of evil" means "get a profiler" not "don't optimize". All this is what makes writing good kernels take so long. That's even after Nvidia engineers are spending tons of time trying to simplify it.

So I'm not surprised people are getting 2x or 4x out of the box. I'd expect that much if a person grabbed a profiler. I'd honestly expect more if they spent a week or two with the documentation and serious effort. But nothing in the landing page is convincing me the LLM can actually significantly help. Maybe I'm wrong! But it is unclear if the lead dev has significant CUDA experience. And I don't want something that optimizes a kernel for an A100, I want kernelS that are optimized for multiple architectures. That's the hard part and all those little nuances are exactly what LLM coding tends to be really bad at.

germanjoey•3h ago
TBH, the 2x-4x improvement over a naive implementation that they're bragging about sounded kinda pathetic to me! I mean, it depends greatly on the kernel itself and the target arch, but I'm also assuming that the 2x-4x number is their best case scenario. Whereas the best case for hand-optimized could be in the tens or even hundreds of X.
cjbgkagh•3h ago
The website appears vibe coded, as do the product-hunt reviews with "RightNow AI is an impressive..." appearing more than would be expected by random chance.

Either someone is good at writing CUDA Kernels and a 1-10% perf improvement is impressive, or they're bad at writing CUDA Kernels and a 2x-4x over naïve very often isn't impressive.

What percentage of people who do write custom CUDA kernels are bad at it? How many are so bad at it that they leave 20x on the table as claimed on the website?

What could have helped sell it to me as a concept is an example of a before and after.

EDIT: One of the reviews states "RightNow AI is an innovative tool designed to help developers profile and optimize CUDA code efficiently. Users have praised its ability to identify bottlenecks and enhance GPU performance. For example, one user stated, "RightNow AI is a game-changer for GPU optimization."" I think some of the AI prompt has leaked into the output.

techbro92•2h ago
Cuda optimization actually doesn’t suck that much. I think NSight studio is amazing and super helpful for profiling and identifying bottlenecks in kernels

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
687•Fysi•11h ago•186 comments

Show HN: Muscle-Mem, a behavior cache for AI agents

https://github.com/pig-dot-dev/muscle-mem
143•edunteman•6h ago•33 comments

Internet Scrabble Club

https://isc.ro/
22•indigodaddy•3d ago•4 comments

Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback

10•DonEsquire•59m ago•4 comments

What is HDR, anyway?

https://www.lux.camera/what-is-hdr/
518•_kush•13h ago•261 comments

Copaganda: How Police and the Media Manipulate Our News

https://www.teenvogue.com/story/copaganda-when-the-police-and-the-media-manipulate-our-news
134•pavel_lishin•2h ago•35 comments

Migrating to Postgres

https://engineering.usemotion.com/migrating-to-postgres-3c93dff9c65d
113•shenli3514•4h ago•82 comments

A server that wasn't meant to exist

https://it-notes.dragas.net/2025/05/13/the_server_that_wasnt_meant_to_exist/
255•jaypatelani•10h ago•67 comments

Show HN: Semantic Calculator (king-man+woman=?)

https://calc.datova.ai
86•nxa•6h ago•105 comments

Git Bug: Distributed, Offline-First Bug Tracker Embedded in Git, with Bridges

https://github.com/git-bug/git-bug
175•stefankuehnel•1d ago•63 comments

Hegel 2.0: The imaginary history of ternary computing (2018)

https://www.cabinetmagazine.org/issues/65/weatherby.php
32•Hooke•2d ago•2 comments

Changes since congestion pricing started in New York

https://www.nytimes.com/interactive/2025/05/11/upshot/congestion-pricing.html
198•Vinnl•1d ago•248 comments

NASA Stennis Releases First Open-Source Software

https://www.nasa.gov/centers-and-facilities/stennis/stennis-first-open-source-software/
12•mindcrime•1d ago•5 comments

Getting Started with Celtic Coins – Crude and Barbarous, or Just Different?

https://collectingancientcoins.co.uk/getting-started-with-celtic-coins-crude-and-barbarous-or-just-different/
35•jstrieb•3d ago•5 comments

Our narrative prison

https://aeon.co/essays/why-does-every-film-and-tv-series-seem-to-have-the-same-plot
126•anarbadalov•9h ago•107 comments

StackAI (YC W23) Is Hiring Pydantic and FastAPI Wizard

https://www.ycombinator.com/companies/stackai/jobs/8nYnmlN-backend-engineer
1•baceituno•5h ago

Variadic Switch

https://pydong.org/posts/variadic-switch/
28•Tsche•1d ago•2 comments

Smalltalk-78 Xerox NoteTaker in-browser emulator

https://smalltalkzoo.thechm.org/users/bert/Smalltalk-78.html
71•todsacerdoti•9h ago•27 comments

The cryptography behind passkeys

https://blog.trailofbits.com/2025/05/14/the-cryptography-behind-passkeys/
166•tatersolid•14h ago•147 comments

Databricks and Neon

https://www.databricks.com/blog/databricks-neon
268•davidgomes•16h ago•183 comments

Updated rate limits for unauthenticated requests

https://github.blog/changelog/2025-05-08-updated-rate-limits-for-unauthenticated-requests/
58•xena•5d ago•81 comments

How the economics of multitenancy work

https://www.blacksmith.sh/blog/the-economics-of-operating-a-ci-cloud
144•tsaifu•13h ago•30 comments

UK's Ancient Tree Inventory

https://ati.woodlandtrust.org.uk/
55•thinkingemote•16h ago•51 comments

Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

39•MarcoDewey•10h ago•18 comments

MicroPython v1.25.0

https://github.com/micropython/micropython/releases/tag/v1.25.0
6•todsacerdoti•21m ago•0 comments

How to Build a Smartwatch: Picking a Chip

https://ericmigi.com/blog/how-to-build-a-smartwatch-picking-a-chip/
230•rcarmo•19h ago•99 comments

Interferometer Device Sees Text from a Mile Away

https://physics.aps.org/articles/v18/99
196•bookofjoe•4d ago•53 comments

Show HN: Lumier – Run macOS VMs in a Docker

https://github.com/trycua/cua/tree/main/libs/lumier
120•GreenGames•10h ago•39 comments

Uber to introduce fixed-route shuttles in major US cities

https://techcrunch.com/2025/05/14/uber-to-introduce-fixed-route-shuttles-in-major-us-cities-other-ways-to-save/
135•rpgbr•10h ago•336 comments

Perverse incentives of vibe coding

https://fredbenenson.medium.com/the-perverse-incentives-of-vibe-coding-23efbaf75aee
155•laurex•6h ago•164 comments