They’re expensive, and setting up infrastructure is painful. Most of the time, your GPUs sit idle while you’re coding, yet you still pay for uptime when scripts fail on the first try.
I ran into this as a “GPU poor” researcher. Tasks like downloading datasets or transforming data don’t need a GPU, but traditional setups force you to use one. Cloud setups don’t help—VMs with GPUs require manual environment setup, CUDA installations, or Docker containers just to get started. Multi-GPU training adds more headaches: not all images support NCCL, so communication between nodes can fail
At my research lab[1], we run experiments across model training, synthetic data generation, and RL. We needed a setup that was flexible, reliable, easy to use and collaborate.
When I went looking for a solution that would let me write code locally and run it on GPUs instantly, without worrying about infrastructure, multi-node setups, or idle GPU time, I stumbled upon Modal [2], and after a year of using it, it’s been a game-changer: it increases our research throughput and productivity, saves a ton on GPU costs and infrastructure management, and allows us to ship really fast.
I’ve compiled everything we’ve learned into this blog + hands-on tutorial [3], with three examples showing different ways to use Modal: rapidly develop on GPUs, deploy at scale, and do it all without breaking a sweat over infrastructure.
Here’s what we cover in the blog: - Wrapping existing code to run on Modal’s serverless infrastructure. - Handling datasets on Modal with volumes for seamless access. - Writing training scripts using Unsloth and Axolotl for easy fine-tuning. - Serving models in a scalable, high-throughput way with vLLM.
By the end, you’ll know how to write and experiment locally and run on GPUs instantly—no idle bills, no complex environment setup, no multi-node headaches.
[1] https://cognitivelab.in [2] https://modal.com [3] https://aiengineering.academy/LLM/ServerLessFinetuning/