The ecosystem has matured: DGX Spark, high-end Mac Studios, AMD Strix Halo, upcoming DGX Station. Models are getting smaller and more efficient. Inference engines (llama.cpp, vLLM, SGLang) and frontends (Ollama, LMStudio, Jan) have made local deployment accessible. Yet I keep meeting more people researching this than actually deploying it.
For those running local inference: - What's your setup and use case? - Is it personal or shared across a team? - What's the real driver — privacy, regulation, latency, cost, tinkering?
I'm skeptical on cost arguments (cloud inference scales better, plus API subsidies, for now at least!), but curious if I'm missing something.
What would make local AI actually worth it for you?
01092026•9h ago
Whats your stack?
And none of that hardware can run larger models, smaller tiny ones, or highly quantized versions of larger ones sure. Or do you have something important to say?
Blue_Cosma•9h ago
Our stack changes per project, adapting to client needs and infra: Llama 70B on a Mac Studio M1 with Ollama in 2024, vLLM on 4xH100 private cloud for larger deployments. Most recently, we've been working on a custom workstation with 2x RTX PRO 6000 Blackwell Max-Q + 1.1TB DDR5 to run larger models locally using SGLang and KTransformers.
The question isn't rhetorical, I'm trying to understand if the demand we see in regulated sectors is the whole market or if there's broader adoption I'm missing.
01092026•4h ago
I run largest models I can, DeepSeek, adding a few more soon. The fact that I can have a premier high end model run locally is main interest, a 70B model is pointless unless it's a specific task based special model or whatever Text to speech, etc.
I am more interested in ditching Nvidia for AMD Chips+GPUs, but not even ROCm - just run with OpenGL / Vulkan weights in shaders. Faster, more control, better performance for MY architecture, etc. This is the goal.
I don't think many people are running models, maybe outside of a company? I guess you are company/industry focused, I am just a programmer / personal.
People don't see a need I guess? It's complicated. Well - actually it's NOT if you have lots of money to buy all the right stuff, brand new, etc.
For regular guys like me, we have to be creative to get shit to run in the best way, it's all we can afford.