Hi HN,
I built Runfra, a decentralized compute grid designed to run AI models on "fragmented" consumer GPUs.
This started simple, I had 3 RTX GPUs at home sitting mostly idle, and it felt a bit ridiculous to pay for H100 cloud while those were at 0% utilization. The tricky part is that home GPUs are kind of messy. Slower, less stable, with limited VRAM, and machines going offline, etc.
So instead of trying to minimize latency, I focused on generating guaranteed, quality results with the following approach:
- batch-first instead of real-time
- scoring layer to filter out bad outputs and retry for quality
- simple heartbeat-based scheduling so jobs recover if a node dies
- 4-bit quantization to get models like FLUX.1 onto 8GB cards
Right now it’s only a PoC focused on image generation, but for the long term, I’m interested in whether something like a "scheduler for home GPUs" could actually work for broader models (LLMs, etc). Curious how people think about this tradeoff. Would you use something slower but cheaper for background jobs, or is low latency still non-negotiable? Would love to hear if this "batch + filtered" approach solves a real pain point for you.
spencer9714•1h ago
- batch-first instead of real-time
- scoring layer to filter out bad outputs and retry for quality
- simple heartbeat-based scheduling so jobs recover if a node dies
- 4-bit quantization to get models like FLUX.1 onto 8GB cards
Right now it’s only a PoC focused on image generation, but for the long term, I’m interested in whether something like a "scheduler for home GPUs" could actually work for broader models (LLMs, etc). Curious how people think about this tradeoff. Would you use something slower but cheaper for background jobs, or is low latency still non-negotiable? Would love to hear if this "batch + filtered" approach solves a real pain point for you.
Link: https://runfra.com/