But, no local AMD hardware means constant SSH juggling. Upload code, compile remotely, run rocprof, download results, repeat. We were spending more time managing infrastructure than optimizing kernels.
Context: DigitalOcean's AMD GPU droplets ship with ROCm preinstalled. At $1.99/hour for MI300X access, it's cheaper than buying hardware. We're currently competing in the GPU mode kernel optimization competition. We spend a good chunk of our time setting up the infra to profile these kernels.
Solution: Built Chisel to make AMD GPU development feel local. One command spins up a droplet, syncs your code, runs profiling with rocprof, and pulls results back. It handles the SSH, rsync, and teardown automatically.
Key features:
- chisel up creates MI300X droplet in seconds
- chisel sync pushes only changed files
- chisel profile kernel.cpp compiles, profiles, downloads traces
- chisel pull grabs artifacts back to local
- Auto cleanup prevents zombie droplets
The profiling integration was the killer feature for us.Available on PyPI: pip install chisel-cli
Source: https://github.com/anthropics/chisel
Would love feedback, especially from people doing GPU work and exploring AMD alternatives to NVIDIA.
If you're interested in contributing, we'd welcome any help. our general short-term direction is to add grafana, concurrent runs, better error handling, and support for other cloud providers.
LargoLasskhyfv•5mo ago