https://buzzbear.ai/ (free tier available)
---
How I avoided cloud GPU costs:
I needed ML inference for text similarity. Cloud GPU pricing is $700-1000/mo for instances that would sit idle most of the time. "Scale to zero" sounds great until you want a generous free tier.
My solution: the RTX 3080 I already own.
The stack (Elixir/Phoenix + Bumblebee):
- Phoenix app on Fly.io
- Bumblebee ML node on my home server (RTX 3080)
- Tailscale + libcluster to connect them
From the code's perspective, they're one Erlang cluster. Where I can call the ML service like a local function. Tailscale tunnels the traffic, libcluster handles discovery.
Capacity: ~1M comparisons/hour. I currently need a fraction of that.Scaling: If I need more, I spin up cloud GPU nodes, connect them to my Tailscale network, and libcluster auto-discovers them. Zero code changes.
Happy to answer questions about any of this!