Engineer here – I run a therapeutic AI system for elderly care in assisted living
facilities. Entire stack runs locally on Mac Minis (no cloud, no GPUs) for privacy
and reliability.
Qwen 3.5's DeltaNet architecture broke our llama.cpp inference (1.5s → 21s).
Migrated to Apple's MLX framework and brought it back to 7s while maintaining
100% clinical pass rates.
Happy to answer questions about MLX vs llama.cpp, DeltaNet optimization, or
running SLMs on Apple Silicon for production healthcare workloads.
asqpl•2h ago
Qwen 3.5's DeltaNet architecture broke our llama.cpp inference (1.5s → 21s). Migrated to Apple's MLX framework and brought it back to 7s while maintaining 100% clinical pass rates.
Happy to answer questions about MLX vs llama.cpp, DeltaNet optimization, or running SLMs on Apple Silicon for production healthcare workloads.