I’m excited to share RUNRL JOB, our new one-click service for running Reinforcement Learning Fine-Tuning (RFT) workloads—think GRPO, PPO, or any custom reward-based tuning—directly on HPC-AI.com.
What It Is Pre-wired RFT pipeline: dual-network configs, memory optimizations, logging, and reward modules are all set up for you.
Model support: demos with Qwen-3B and Qwen-1.5 out of the box; drop in your own model if you like.
Cost & performance transparency: real-hardware benchmarks on 8× H100/H200, with live metrics in TensorBoard and built-in cost tracking.
Why It Matters Memory-efficient GRPO: up to 40% memory savings vs PPO—no separate value network or double backward pass.
Zero setup: no Dockerfiles, no dependency hell—just click “Start” and your training job spins up.
Accessible RLHF: lowers the barrier for researchers, students, and indie hackers to experiment at scale.
How to Try Visit the blog post: https://hpc-ai.com/blog/RUNRL_JOB_is_live_on_hpc-ai
Click “Launch GPU Instances”, choose H100 or H200.
Select the RUNRL JOB template and hit “Start Job”.
Monitor progress live in JupyterLab or via TensorBoard—zero extra setup.
icemount•3h ago
cheerGPU•3h ago