OpenHands just announced a collaboration with AMD that lets developers run full coding agents locally on new Ryzen AI hardware — no cloud APIs, no data leaving the machine, and zero per-token cost.
The setup uses AMD’s open-source Lemonade stack + Ryzen AI Max series (CPU + GPU + NPU with 126 TOPS) to run models like Qwen3-Coder-30B directly on-device. You can point OpenHands to a local Lemonade endpoint and get full autonomous agent workflows running offline.
Why it’s interesting:
Local inference for real coding agents (not just autocomplete)
Privacy/compliance: IP never leaves your workstation
Cost: no usage-based billing
Performance: NPU/GPU optimized, low latency
Open source stack end-to-end
Given how fast local LLM tooling is evolving (Apple, NVIDIA, AMD, etc.), this feels like an inflection point: true autonomous dev-agents running locally, not in the cloud.
Curious to hear from others:
Who else is running agentic workloads entirely locally?
Is this the beginning of serious local-first dev tooling?
How big will “offline AI” get as hardware accelerates?
mitchwainer•1h ago
The setup uses AMD’s open-source Lemonade stack + Ryzen AI Max series (CPU + GPU + NPU with 126 TOPS) to run models like Qwen3-Coder-30B directly on-device. You can point OpenHands to a local Lemonade endpoint and get full autonomous agent workflows running offline.
Why it’s interesting:
Local inference for real coding agents (not just autocomplete)
Privacy/compliance: IP never leaves your workstation
Cost: no usage-based billing
Performance: NPU/GPU optimized, low latency
Open source stack end-to-end
Given how fast local LLM tooling is evolving (Apple, NVIDIA, AMD, etc.), this feels like an inflection point: true autonomous dev-agents running locally, not in the cloud.
Curious to hear from others:
Who else is running agentic workloads entirely locally?
Is this the beginning of serious local-first dev tooling?
How big will “offline AI” get as hardware accelerates?