Show HN: Wafer – Profile, inspect assembly, and iterate on CUDA within your IDE

https://www.wafer.ai/

3•technoabsurdist•1mo ago

Hi HN, I’m Emilio. We’re launching the Wafer extension for the popular IDEs (VS Code, Cursor and Antigravity).

Wafer exists to make performance engineers more efficient. Most of the work perf engs do is extracting signal and turning it into the next experiment. You spend hours per kernel doing interpretation and bookkeeping: which counters matter, what changed, what hypothesis you’re testing, what to try next.

Wafer is building an environment where profiling, compiler analysis, and docs are first-class context in your workflow, so iteration is cheap. long-term, that same structured context becomes the interface for an automation layer that can read the evidence, propose a change, and rerun the loop.

NVIDIA has poured an insane amount of truth into their tooling. NCU, compiler output, SASS, the counters, the sections, the warnings, the “this is why you’re slow” breadcrumbs. Serious perf engineers already live in this stuff. The real problem is that it’s still not packaged as a tight loop. You run a profile, you get a giant report, then you spend a bunch of time translating it into a plan, mapping it back to the right lines of code, deciding what to ignore, deciding what to try next, and keeping track of what you’ve already tested. That translation step is where a ton of time goes, and it’s also the part that doesn’t scale.

We're just starting out and today, Wafer makes that translation step cheaper by keeping the evidence and the code in one place. You can run Nsight Compute profiling from your editor and view results where you’re editing, so you’re not flipping between terminals, report viewers, and screenshots. You can compile CUDA and inspect PTX and SASS mapped back to your source, so “what did the compiler actually do” is something you can answer in seconds and iterate on quickly. And you can query GPU documentation from inside the editor with the exact context you’re working in.

What we’re adding and moving towards is making that loop not just faster, but more automatic and more reproducible. We’re rolling out GPU Workspaces, where you keep a persistent CPU environment for your repo and dependencies, and only spin up GPU execution when you actually run something. A lot of GPU dev time is editing, debugging, and iterating on hypotheses, not burning GPU cycles - but today the workflow forces you to keep a GPU box alive just to preserve state. We want the “run the experiment” part to be on-demand and reliable, without killing your environment.

The bigger direction is the same theme: take the evidence perf engineers already use and make it machine-legible, so an automation layer can actually act on it. We're working on tool-driven loops: read the profile, identify the highest leverage bottleneck, propose a concrete code change, run the diff, re-profile, and keep a history of what worked and what didn’t.

If you’ve ever wished you could hand an agent your kernel plus the profiler and compiler evidence and have it do real work instead of vibes, that’s what we’re building towards.

You can see more about us here: https://wafer.ai

Or download directly from here: VS Code: https://marketplace.visualstudio.com/items?itemName=Wafer.wa... Cursor: https://open-vsx.org/extension/wafer/wafer

Would love feedback from anyone doing CUDA, CUTLASS/CuTe, Triton, training or inference perf. If you try it and something feels slow, confusing, or missing, email me at emilio@wafer.ai

Comments

stevenarellano•1mo ago

can confirm i now use this for my everyday gpu development

Dave Farber has passed away

Researchers find brain mechanism behind 'flashes of intuition'

Extracting Xcode's Claude Code Prompt

The inner turmoil of an indie developer diagnosed with NPD (2025)

AI is not another abstraction because god plays dice

Show HN: Tandem – An open-source, local-first AI workspace (Rust and React)

Show HN: AI Perks – A curated list of free AI credits and deals for developers

Why E cores make Apple Silicon fast

Show HN: Google Maps but for your repo (Open Source)

Djevops: Host Django on Bare Metal

How to Destroy a Space Station

Show HN: I built a framework to benchmark LLMs on System Design and Architecture

What do you expect from a Turkey-based hosting provider?

Why Files Are Not Enough as Memory for AI Agents

Nabaztag: Embodiment of "IoT" that was before its time

Show HN: Friends don't let friends do math after a few drinks

Show HN: A free, minimal CV builder I made as a side project

Show HN: Competitor Finder API – find real competitors from one hostname

Show HN: Textream: Dynamic Island-style teleprompter for macOS with voice track

How do you use AI coding tools at scale without losing architectural control?

What to do with the KDE Oxygen and Air themes?

Show HN: One app to command CLI agents across projects - RexIDE

Windows is leaving old printers behind without solution

Eight More Months of Agents

Uber held liable, ordered to pay $8.5M in driver rape suit

DayTradingCentral – Free Trading Journal (Next.js, NestJS, Postgres)

Creative problem-solving of unsolved puzzles during REM sleep

Show HN: Language learning through AI example sentences (onigiri.kr)

Wi-Fi 7 marketing is lying about its biggest feature [video]

Thoughts on LLMs