frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

https://lemonade-server.ai
135•AbuAssar•3h ago

Comments

nijave•1h ago
Anyone compare to ollama? I had good success with latest ollama with ROCm 7.4 on 9070 XT a few days ago
iugtmkbdfil834•1h ago
Seconded. Currently on ollama for local inference, but I am curious how it compares.
9dc•1h ago
so... what does it do? i dont get it Lol
iugtmkbdfil834•1h ago
Initial read suggests it is a mini-swiss army knife, because it seems to be able to do a lot ( based on website claims anyway ). The app integration seems to suggest they want to be more of a control dashboard.
syntaxing•1h ago
Wow this is super interesting. This creates a local “Gemini” front end and all. This is more or less a generative AI aggregator where it installs multiple services for different gen modes. I’m excited to try this out on my strix halo. The biggest issue I had is image and audio gen so this seems like a great option.
jmillikin•1h ago
Surprising that the Linux setup instructions for the server component don't include Docker/Podman as an option, its Snap/PPA for Ubuntu and RPM for Fedora.

Maybe the assumption is that container-oriented users can build their own if given native packages?

freedomben•1h ago
They do have some container options, though I definitely think they should be added to the release page: https://lemonade-server.ai/install_options.html#docker
zenoprax•50m ago
Why should this be on the "Releases"? Shouldn't that just be for build artifacts? Pre-built containers belong on a registry, no?

I suppose a Dockerfile could be included but that also seems unconventional.

freedomben•47m ago
I just meant on the instructions part of the releases page (since they already have some installation instructions), not the artifacts themselves.
freedomben•1h ago
Neat, they have rpm, deb, and a companion AppImage desktop app[1]! Surprised I wasn't aware of this project before. Definitely going to give it a try.

[1]: https://github.com/lemonade-sdk/lemonade/releases/tag/v10.0....

JSR_FDED•1h ago
I’ve read the website and the news announcement, and I still don’t understand what it is. An alternative to LM Studio? Does it support MLX or metal on Macs? I’m assuming it will optimize things for AMD, but are you at a disadvantage using other GPUs?
zelphirkalt•1h ago
I think LM Studio itself uses other software to actually make use of LLMs. If that other software does not support your NPUs, then you are not going to get much performance out of those. This Lemonade thing I am guessing is one such other software, that LM Studio could be using.
molticrystal•59m ago
>Does it support MLX or metal on Macs?

This is answered from their Project Roadmap over on Github[0]:

Recently Completed: macOS (beta)

Under Development: MLX support

[0] https://github.com/lemonade-sdk/lemonade?tab=readme-ov-file#...

zozbot234•1h ago
Note that the NPU models/kernels this uses are proprietary and not available as open source. It would be nice to develop more open support for this hardware.
swiftcoder•1h ago
Are they? The docs say "You can also register any Hugging Face model into your Lemonade Server with the advanced pull command options"
zozbot234•1h ago
That won't give you NPU support, which relies on https://github.com/FastFlowLM/FastFlowLM . And that says "NPU-accelerated kernels are proprietary binaries", not open source.
moconnor•1h ago
Is... is this named because they have a lemon they're trying to make the most of?
TeMPOraL•35m ago
If life keeps giving it them, they should instead invent a combustible lemon.
eddieroger•34m ago
Do they know who you are? They're the guys who are going to blow your house up ... with the lemons.
sensitiveCal•1h ago
Feels like this is sitting somewhere between Ollama and something like LM Studio, but with a stronger focus on being a unified “runtime” rather than just model serving.

The interesting part to me isn’t just local inference, but how much orchestration it’s trying to handle (text, image, audio, etc). That’s usually where things get messy when running models locally.

Curious how much of this is actually abstraction vs just bundling multiple tools together. Also wondering if the AMD/NPU optimizations end up making it less portable compared to something like Ollama in practice.

ilaksh•1h ago
Cool but is there a reason they can't just make PRs for vLLM and llama.cpp? Or have their own forks if they take too long to merge?
cpburns2009•1h ago
Just in case anyone isn't aware. NPUs are low power, slow, and meant for small models.
kouunji•1h ago
I’m looking forward to trying this currently Strix halo’s npu isn’t accessible if you’re running Linux, and previously I don’t think lemonade was either. If this opens up the npu that would be great! Resolute raccoon is adding npu support as well.
dennemark•55m ago
Maybe you have seen NPU support via FLM already: https://lemonade-server.ai/flm_npu_linux.html

"FastFlowLM (FLM) support in Lemonade is in Early Access. FLM is free for non-commercial use, however note that commercial licensing terms apply. "

boomskats•54m ago
I thought the NPU has been available since something like 6.12?
dennemark•57m ago
I have been using lemonade for nearly a year already. On Strix Halo I am using nothing else - although kyuz0's toolboxes are also nice (https://kyuz0.github.io/amd-strix-halo-toolboxes/)

Nowadays you get TTS, STT, text & image generation and image editing should also be possible. Besides being able to run via rocm, vulkan or on CPU, GPU and NPU. Quite a lot of options. They have a quite good and pragmatic pace in development. Really recommend this for AMD hardware!

Edit: OpenAI and i think nowaday ollama compatible endpoints allow me to use it in VSCode Copilot as well as i.e. Open Web UI. More options are shown in their docs.

rpdillon•54m ago
Been running lemonade for some time on my Strix Halo box. It dispatches out to other backends that they include, like diffusion and llama. I actually don't like their combined server, and what I use instead is their llama CPP build for ROCm.

https://github.com/lemonade-sdk/llamacpp-rocm

But I'm not doing anything with images or audio. I get about 50 tokens a second with GPT OSS 120B. As others have pointed out, the NPU is used for low-powered, small models that are "always on", so it's not a huge win for the standard chatbot use case.

zozbot234•43m ago
Even small NPUs can offload some compute from prefill which can be quite expensive with longer contexts. It's less clear whether they can help directly during decode; that depends on whether they can access memory with good throughput and do dequant+compute internally, like GPUs can. Apple Neural Engine only does INT8 or FP16 MADD ops, so that mostly doesn't help.

LinkedIn Is Illegally Searching Your Computer

https://browsergate.eu/
277•digitalWestie•58m ago•117 comments

Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

https://lemonade-server.ai
139•AbuAssar•3h ago•28 comments

Inside Nepal's Fake Rescue Racket

https://kathmandupost.com/money/2026/03/27/inside-nepal-s-fake-rescue-racket
89•lode•2h ago•21 comments

IBM Announces Strategic Collaboration with Arm

https://newsroom.ibm.com/2026-04-02-ibm-announces-strategic-collaboration-with-arm-to-shape-the-f...
168•bonzini•5h ago•98 comments

Sweden goes back to basics, swapping screens for books in the classroom

https://undark.org/2026/04/01/sweden-schools-books/
289•novaRom•3h ago•164 comments

Significant Raise of Reports

https://lwn.net/Articles/1065620/
98•stratos123•4h ago•49 comments

Bringing Clojure programming to Enterprise (2021)

https://blogit.michelin.io/clojure-programming/
110•smartmic•5h ago•51 comments

Gone (Almost) Phishin'

https://ma.tt/2026/03/gone-almost-phishin/
89•luu•2d ago•42 comments

Enabling Codex to Analyze Two Decades of Hacker News Data

https://modolap.com/publication/hn-analysis-1
24•ronfriedhaber•3h ago•8 comments

Email obfuscation: What works in 2026?

https://spencermortensen.com/articles/email-obfuscation/
222•jaden•10h ago•68 comments

Mercor says it was hit by cyberattack tied to compromise LiteLLM

https://techcrunch.com/2026/03/31/mercor-says-it-was-hit-by-cyberattack-tied-to-compromise-of-ope...
89•jackson-mcd•1d ago•27 comments

Quantum computing bombshells that are not April Fools

https://scottaaronson.blog/?p=9665
218•Strilanc•13h ago•71 comments

Steam on Linux Use Skyrocketed Above 5% in March

https://www.phoronix.com/news/Steam-On-Linux-Tops-5p
545•hkmaxpro•10h ago•257 comments

EmDash – A spiritual successor to WordPress that solves plugin security

https://blog.cloudflare.com/emdash-wordpress/
620•elithrar•21h ago•462 comments

Emacs-libgterm: Terminal emulator for Emacs using libghostty-vt

https://github.com/rwc9u/emacs-libgterm
27•signa11•3d ago•4 comments

Reinventing the Pull Request

https://lubeno.dev/blog/reinventing-the-pull-request
37•bkolobara•6d ago•28 comments

Telli (YC F24) is hiring engineers, designers, and more (on-site, Berlin)

http://hi.telli.com/join-us
1•sebselassie•6h ago

Artemis II Launch Day Updates

https://www.nasa.gov/blogs/missions/2026/04/01/live-artemis-ii-launch-day-updates/
993•apitman•20h ago•858 comments

Subscription bombing and how to mitigate it

https://bytemash.net/posts/subscription-bombing-your-signup-form-is-a-weapon/
222•homelessdino•9h ago•143 comments

New laws to make it easier to cancel subscriptions and get refunds

https://www.bbc.co.uk/news/articles/cvg0v36ek2go
104•chrisjj•5h ago•42 comments

A new C++ back end for ocamlc

https://github.com/ocaml/ocaml/pull/14701
213•glittershark•14h ago•18 comments

Order and Tension

https://slab.org/2026/03/22/order-and-tension/
11•surprisetalk•3d ago•0 comments

DRAM pricing is killing the hobbyist SBC market

https://www.jeffgeerling.com/blog/2026/dram-pricing-is-killing-the-hobbyist-sbc-market/
544•ingve•16h ago•467 comments

ReactOS Shows Improved Stability and 64-Bit Support at Chemnitz Linux Days 2026

https://old.reddit.com/r/reactos/comments/1sa26yu/back_from_chemnitz_linux_days_2026/
32•jeditobe•2h ago•8 comments

Fast and Gorgeous Erosion Filter

https://blog.runevision.com/2026/03/fast-and-gorgeous-erosion-filter.html
200•runevision•2d ago•20 comments

Built a cheap DIY fan controller because my motherboard never had working PWM

https://www.himthe.dev/blog/msi-forgot-my-fans
48•bobsterlobster•2d ago•15 comments

Show HN: I built a DNS resolver from scratch in Rust – no DNS libraries

https://github.com/razvandimescu/numa
23•rdme•3h ago•13 comments

Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

https://github.com/hauntsaninja/git_bayesect
302•hauntsaninja•4d ago•42 comments

AI for American-produced cement and concrete

https://engineering.fb.com/2026/03/30/data-center-engineering/ai-for-american-produced-cement-and...
204•latchkey•20h ago•115 comments

What Gödel Discovered (2020)

https://stopa.io/post/269
82•qnleigh•2d ago•14 comments