Maybe the assumption is that container-oriented users can build their own if given native packages?
I suppose a Dockerfile could be included but that also seems unconventional.
[1]: https://github.com/lemonade-sdk/lemonade/releases/tag/v10.0....
This is answered from their Project Roadmap over on Github[0]:
Recently Completed: macOS (beta)
Under Development: MLX support
[0] https://github.com/lemonade-sdk/lemonade?tab=readme-ov-file#...
The interesting part to me isn’t just local inference, but how much orchestration it’s trying to handle (text, image, audio, etc). That’s usually where things get messy when running models locally.
Curious how much of this is actually abstraction vs just bundling multiple tools together. Also wondering if the AMD/NPU optimizations end up making it less portable compared to something like Ollama in practice.
"FastFlowLM (FLM) support in Lemonade is in Early Access. FLM is free for non-commercial use, however note that commercial licensing terms apply. "
Nowadays you get TTS, STT, text & image generation and image editing should also be possible. Besides being able to run via rocm, vulkan or on CPU, GPU and NPU. Quite a lot of options. They have a quite good and pragmatic pace in development. Really recommend this for AMD hardware!
Edit: OpenAI and i think nowaday ollama compatible endpoints allow me to use it in VSCode Copilot as well as i.e. Open Web UI. More options are shown in their docs.
https://github.com/lemonade-sdk/llamacpp-rocm
But I'm not doing anything with images or audio. I get about 50 tokens a second with GPT OSS 120B. As others have pointed out, the NPU is used for low-powered, small models that are "always on", so it's not a huge win for the standard chatbot use case.
nijave•1h ago
iugtmkbdfil834•1h ago