But I liked to know that ollama developed a GUI as well — more options is always better, and maybe it will improve in the future.
Edit: I hope I'm wrong about this. Thanks for clarifying.
Building a product that we've dreamed of building is not wrong. Making money does not need to be evil. I, and the folks who worked tirelessly to make Ollama better will continue to build our dreams.
Also is there a link to the source?
this app got gui.
> For pure CLI versions of Ollama, standalone downloads are available on Ollama’s GitHub releases page.
Sound like closed source. Plus, As I check, the app seem to be tauri app, as it use system webview instead of chromium.
if im being honest i care more about multiple local ai apps on my desktop all hooking into the same ollama instance rather than all downloading their own models as part of the app so i have like multiple 10s of gbs of repeated weights all over the place because apps dont talk to each other
what does it take for THAT to finally happen
Its finally the push I need to move away. I predict ollama will only get worse from here on.
https://github.com/open-webui/open-webui
Curious how this compares to that, which has a ton of features and runs great
Still choices are good, to props to the Ollama team!
Nice because it works on any text. Browser, IDE, email etc.
No former Linux experience, beyond basic Mac OS Terminal commands. Surprisingly simple setup... and I used an online LLM to hold my hand as we walked through the installation / setup. If I wanted to call the CLI, I'd have to ask an online LLM what that code even is (something something ollama3.2).
>ollama is probably the easiest tool ... to experiment with LLMs locally.
Seems quite simple so far. If I can do it (blue collar electrician with no programming experience) than so can you.
https://www.reddit.com/r/LocalLLaMA/comments/1kg20mu/so_why_...
There are multiple (far better) options - eg LM studio if you want GUI, llama.cpp if you want the CLI that ollama ripped off. IMO the only reason ollama is even in the conversation is it was easy to get running on macOS, allowing the SV MBP set to feel included
[0] https://github.com/sherlock-project/sherlock/issues/2011
Trying to match the even larger local front end ecosystem is just a waste of energy.
Other critical takes say the same thing, but wrapped in far more variations of: "definitely not judging/criticising/being negative, but I don't like this."
This is clearly a new direction for Ollama, but I can't find anything at the link explaining or justifying why they're doing it, and that makes me uncomfortable as an existing regular Ollama user.
I think this move does deserves firmer feedback like yours.
I don't think it was generated. (on the basis that this can't be some cutting-edge new model whose output I haven't seen yet)
Supporting multiple backends is HARD. Originally, we thought we'd just add multiple backends to Ollama - MLX, ROCm, TRT-LLM, etc. It sounds really good on paper. In practice, you get into the lowest common denominator effect. What happens when you want to release Model A together with the model creator, and backend B doesn't support it? Do you ship partial support? If you do, then you start breaking your own product experience.
Supporting Vulkan for backwards compatibility on some hardware seems simple right? What if I told you in our testing, there is a portion of the supported hardware matrix getting -20% decrease in performance. What about just cherry picking which hardware to use Vulkan vs ROCm vs CUDA, etc? Do you start managing a long and tedious support matrix, where each time a driver is updated, the support may shift?
Supporting flash attention sounds simple too right? What if I told you over 20% of the hardware and for specific models, enabling it will cause non-trivial amount errors pertaining to specific hardware/model combinations? We are almost in a spot, where we can selectively enable flash attention per type of model architecture and hardware architecture.
It's so easy to add features, and hard to say no, but given any day, I will stand for a better overall product experience (at least to me since it's very subjective). No is temporary and yes is forever.
Ollama focuses on running the model the way the model creators intended. I know we get a lot of negativity on naming but often times, it's what we work with the model creators on naming (which surprisingly may or may not be how another platform named it on release). Overtime, I think this means more focus on top models to optimize more and add capabilities to augment the models.
What seems to be true is that Ollama wants to be a solution that drives the narrative and wants to choose for its users rather than with them. It uses a proprietary model library, it built itself on llama.cpp and didn't upstream its changes, it converted the standard gguf model weights into some unusable file type that only worked with itself, etc.
Sorry but I don't buy it. These are not intractable problems to deal with. These are excuses by former docker creators looking to destroy another ecosystem by attempting to coopt it for their own gain.
So, questions: what are the changes that they didn't upstream, is this listed somewhere? what is the impact? are they also changes in ggml? what was the point of the gguf format change?
You conceptually divide your product to "universal experience" and "conditional experience". You add platform-specific things to the conditional experience, while keeping universal experience unified. I mean, do you even have a choice? The backend limits you, the only alternative you have is to change the backend upstream, which often times is the same as no alternative.
The only case where this is a real problem is when the backends are so different that the universal experience is not the main experience. But I don't think this is the case here?
Most import feature for me is that I want to be able to chat with local models, remote models on my other machines, and cloud models (OpenAI API compatible). Anything that makes it easier to switch between models or query them simultaneously is important.
Here's what I've learned so far:
* Msty - my current favorite. Can do true simultaneous requests to multiple models. Nice aesthetic. Sadly not open source. Have had some freezing issues on Linux.
* Jan.ai - Can't make requests to multiple models simultaneously
* LM Studio - Not open source. Doesn't support remote/cloud models (maybe there's a plugin?)
* GPT4All - Was getting weird JSON errors with openrouter models. Have to explicitly switch between models, even if you're trying to use them from different chats.
Still to try: Librechat, Open WebUI, AnythingLLM, koboldcpp.
Would love to hear any other suggestions.
Works fully local, privacy first, and it's a native app (Swift for macOS, WPF for Windows)
It feels a bit less polished but has more functions that run locally and things work better out of the box.
My favorite thing is that I can just type my own questions / requests in markdown so I can get formatting and syntax highlighting.
Is this easier now? Specifically, I would like to easily connect anthropic models just by plugging in my API key.
I've been using this setup for several months now (over a year?) and it's very effective.
The proxy also benefits pretty much any other application you have that recognizes an OpenAI-compatible API. (Or even if it doesn't)
Why is that? Seems the way to go to add tooling to any LLM that is tool-capable
EDIT: To add a bit, MCP is more than just tools, which is the only use case MCPO supports.
I can create workflows that use multiple models to achieve different goals.
Electron. Python backend. Can talk to Ollama and other backends.
Need help with design and packaging.
- chatbox - https://github.com/chatboxai/chatbox - free and OSS, with a paid tier, supports MCP and local/remote, has a local KB, works well so far and looks promising.
- macai - https://github.com/Renset/macai simple client for remote APIs, does not support image pasting or MCP or anything really, very limited, crashes.
- typingmind.com - web, with a downloadable (if paid) version. Not OSS, but one-time payment, indie dev. One of the first alt chat clients I've ever tried, not using it anymore. Somewhat clunky gui, but ok. Supports MCP, haven't tried it it.
- Open WebUI - deployed for our team so that we could chat through many APIs. Works well for a multi-user web-deployment, but image generation hasn't been working. I don't like it as a personal client though, buggy sometimes but gets frequent fixes fortunately.
- jan.ai - it comes with popular models pre-populated listed, which makes it harder to plug into custom or local model servers. But it supports local model deployment within the app (like what ollama is announcing) which is good for people who don't want to deal with starting a server. I haven't played with it enough, but I personally prefer to deploy a local server (ie ollama, litellm...) and then just have the chat gui app give me a flexible endpoint configuration for adding custom models to it.
I'm also wary of evil actors deploying chat GUIs just to farm your API keys. You should be too. Use disposable api keys, watch usage, refresh with new keys once in a while after trying clients.
I built my own Ollama macOS app written in SwiftUI: https://github.com/sheshbabu/Chital
Launches fast and weighs less than 2MB in size!
How do you handle markdown rendering?
I use this package for markdown: https://github.com/gonzalezreal/swift-markdown-ui
I recently released an Electron App for Ollama [1] and it's nowhere close to 1GB (between 300 - 350MB). A 1GB App would be really big
Negative comments help us grow and make Ollama better any way. We can take harsh feedback to make Ollama better.
Some comments have been downweighted for being generic or off-topic, which is standard moderation; our role as moderators is to keep the discussion threads on-topic. But the comment that was left at the top of the thread after I'd done that seemed at least somewhat negative/critical towards the Ollama team.
Which comments seem unreasonably low to you?
> For pure CLI versions of Ollama, standalone downloads are available on Ollama’s GitHub releases page.
Nothing against that, just an observation.
Previously I tested several local LLM apps, and the 2 best ones to me were LM Studio [1] and Msty [2]. Will check this one out for sure.
One missing feature that the ChatGPT desktop app has and I think is a good idea for these local LLM apps is a shortcut to open a new chat anytime (Alt + Space), with a reduced UI. It is great for quick questions.
In fact, there are many self-made prototypes before this from different individuals. We were hooked, so we built it for ourselves.
Ollama is made for developers, and our focus in continually improving Ollama's capabilities.
thank you guys for all your work on it, regardless
Also, if you search on ollama’s models, you’ll see user ones that you can download too
Nothing out of spite, and purely limited by the amount of effort required to support these models.
We are hopeful too -- where users can technically add models to Ollama directly. Although there is definitely some learning curve.
Ollama has the new 235B and 30B Qwen3 models from this week, so it’s not as if they have done nothing for a month.
ollama pull hf.co/bartowski/nvidia_OpenCodeReasoning-Nemotron-7B-GGUF:IQ4_XS
I use PetrosStav/gemma3-tools and it seems that it only works half of the time - the rest the model call the tool but it doesn't get properly parsed by Ollama.
We are working with Google, and trying to give the feedback on improving tool calling capabilities for future Gemma models. Fingers crossed!
Lots of people trying to being, and many with Ollama, and helping to create beginners is never a bad thing with tech.
Lots of people trying to being, and many with Ollama, and helping to create beginners is never a bad thing with tech.
Many things can be for both developers and end-users. Developers can use the API directly, end users, have more choices.
It was nice it started downloading it but also there was no indication I don't have that model before hand until I opened drop-down to see download buttons.
But of course nice job guys.
As a developer feature request, it would be great if ollama could support more than one location at once, so that it is possible to keep a couple models 'live' but have the option to plug in an external disk with extra models being picked up auto-magically based on the ollama_models path please. Or maybe the server could present a simple html interface next to the API endpoint?
And just to say thanks for making these models easily accessible. I am agAInst AI generally, but it is nice to be able to have a play with these models locally. I havent found one that covers Zig, but appreciate the steady stream of new models to try. Thanks.
I really like using ollama as a backend to OpenWebUI.
I don't have any windows machines and I don't work primarily on macos, but I understand that's where all the paying developers are, in theory.
Did y'all consider a partnership with one of the existing UI and bundle that, similar to duckdb approach?
PS totally running windows here and using kesor/ollama-proxy if I need to make it externally available.
How would you compare and contrast between the two? My main use would be to use it as a tool with a chat interface rather than developing applications that talk to models.
I also tried LM Studio a few months back. The interface felt overly complex and I got weird error messages which made it look like I'd have to manually fix errors in the underlying python environment. Would have been fine if it was for work, but I just wanted to play around with LLMs in my spare time so I couldn't be bothered.
I've used Msty but it seems like LM studio is moving faster, which is kind of important in this space. For example Msty still doesn't support MCP
Would be even better if there was a installation template that checks if Ollama is installed and if not download it as sub installation first checking user computer specs if enough RAM and fast enough CPU/GPU. Also API to prompt user (ask for permission) to install specific model if haven't been installed.
That's actually what we've done for our own App [1]. It checks if Ollama and other dependencies are installed. No model is bundled with it. We prompt user to install a model (you pick a model, click a button and we download the model; similar if you wish to remove a model). The aim is to make it quite simple for non-technical folks to use.
What's wrong with the name? Are you referring to the GPT trademark? That was rejected.
What I meant is the "Py" prefix is typically used for Python APIs/libraries, or Python bindings to libraries in other languages. Sometimes as a prefix for dev tool names like PyInstaller or PyEnv. It's just less often used for standalone apps, only to indicate the project was developed in Python.
This is exactly what I've implemented for my Qt C++ app: https://www.get-vox.com
- I like the simplicity. This would be perfect for setting up a non-technical friend or family member with a local LLM with just a couple clicks
- Multimodal and Markdown support works as expected
- The model dropdown shows both your local models and other popular models available in the registry
I could see using this over Open WebUI for basic use cases where one doesn't need to dial in the prompt or advanced parameters. Maybe those will be exposed later. But for now - I feel the simplicity is a strength.
Another commenter mentioned not being able to point the new UI to a remote Ollama instance - I agree, that would be super handy for running the UI on a slow machine but inferring on something more powerful.
I'll stick with OpenWebUI then.
(It’s a bit more complicated with starting a listening server in my laptop, making sure the port forwarded file doesn’t exist, etc, but this is the basic idea.)
Probably one that suits you pretty much out of the box.
Not a bash on Linux desktop users, just my experience.
Yeah, my wife would murder me as our kids yelled at me for various things
I sort of suspect so? Devs of parenting age trend towards being neurospicy, and dev work requires sustained attention with huge penalties for interruptions.
Please avoid internet tropes on HN.
Just download some tool and be productive within seconds, I'd say.
Then again, maybe OP's app will have some new twist that will revolutionise everything.
With a bit of help from ChatGPT etc., it was trivial to make, and I use it everyday now. I may add DDG and github search to it soon too.
1. VT.ai https://github.com/vinhnx/VT.ai Python
2. VT Chat https://vtchat.io.vn: my own product
Either directly or use it as a base for your own bespoke experience.
This hype bubble is as disgusting and scummy as the previous one.
The speed seems fine to me, but the hallucinations are wild, completely wrong on a few things I like to test the commercial offerings on.
For simple questions about the lua language and how to do things in Unity game engine the results look fairly OK.
Running good enough models locally is appealing to a lot of people and kind of hard if you are not a developer. If you are it's easy (been there done that). That's the core premise of the company. Their tech is of course widely used and for a while they've been focusing just on getting it to that stage. But that's never going to add up to revenue. So, they need to productize what they have.
This looks like a potentially viable way.
Strongly disagree with this. It is the default go-to for companies that cannot use cloud-based services for IP or regulatory reasons (think of defense contractors). Isn't that the main reason to use "open" models, which are still weaker than closed ones?
Any whiff of a cloud service and the lawyers will freak out.
That's why we run models via Ollama on our laptops (M-series is crazy powerful) and a few servers on the intranet for more oomph.
LM Studio changed their license to allow commercial use without "call me" pricing, so we might look into that more too.
That's not what I've been seeing, but obviously my perspective (as anyone's) is limited. What I'm seeing is deployments of vLLM, SGLang, llama.cpp or even HuggingFace's Transformers with their own wrapper, at least for inference with open weight models. Somehow, the only place where I come across recommendations for running Ollama was on HN and before on r/LocalLlama but not even there as of late. The people who used to run Ollama for local inference (+ OpenWebUI) now seem to mostly be running LM Studio, myself included too.
Shameless plug: I’ve been building a native AI chat client called BoltAI[0] for the last 3 years. It’s native, feature-rich, and supports multiple AI services, including Ollama and LM Studio.
Give it a try.
[0]: https://boltai.com
I'm comfy, but some of the cutting edge local LLMs have been a little bit slow to be available recently, maybe this frontend focus is why.
I will now go and look at other options like Ollama that have either been fully UI integrated since the start, or that is committed to just being a headless backend. If any of them seem better, I'll consider switching, I probably should have done this sooner.
I hope this isn't the first step in Ollama dropping the local CLI focus, offering a subscription and becoming a generic LLM interface like so many of these tools seem to converge on.
This looks like a version of Ollama that bundles one.
I just can't see a user-focused benefit for a backend service provider to start building and bundling their own frontend when there's already a bunch of widely used frontends available.
There are so many choices for having an interface, and as a developer you should have a choice in selecting the UI you want. It will all continue to work with Ollama. Nothing about that changes.
This is sending a very loud message that your focus is drifting away from why I use your product. If it was drifting away into something new and original that supplements my usage of your product, I could see the value, but like you said: there's already so many choices of good interface. Now you're going to have to play catchup against people whose first choice and genuine passion is LLM frontend UIs.
Sorry! I will still use ollama, and thank you so much for all the time and effort put in. I probably wouldn't have had a fraction of the local LLM fun I've had if it wasn't for ollama, even if my main usage is through openwebui. Ultimately, my personal preference is software that does 1 thing and does it well. Others prefer the opposite: tightly integrated all-bells-and-whistles, and I'm sure those people will appreciate this more than me - do what works for you, it's worked so far:)
You can pull models directily from hugging face ollama pull hf.co/google/gemma-3-27b-it
Just checked: https://github.com/ollama/ollama/issues/11340 still open issue.
Also a display of whether a model fits into vram would be nice.
For me what is needed is an assistant native app for Android, something like perplexitys assistant mode that replaces gemini.
It would make using your own LLM with your phone, since you can then interact with apps.
I installed this last night on one of my cheaper computers. Ran gemma3:4b on a 16GB Ram laptop, I know HN loves specifics, so it's this exact computer ( with an upgraded 2 TB SSD).
ASUS - Vivobook S 14 - 14" OLED Laptop - Copilot+ PC - Intel Core Ultra 5 - 16GB Memory - 512GB SSD - Neutral Black
It's a bit slower than o4-mini and probably not as smart, but I feel more secure in asking for a resume review. The GUI really makes pasting in text significantly easier. Yeah I know I could just use the cli app and postman previously, but I didn't want to set that up.
Vendor lock-in: AFAIK it now uses a proprietary llama.cpp fork and builts its own registry on ollama.com in a kind of docker way (I heard docker ppl are actually behind ollama) and it's a bit difficult to reuse model binaries with other inference engines due to their use of hashed filenames on disk etc.
Closed-source tweaks: Many llama.cpp improvements haven’t been upstreamed or credited, raising GPL concerns. They since switched to their own inference backend.
Mixed performance: Same models often run slower or give worse outputs than plain llama.cpp. Tradeoff for convenience - I know.
Opaque model naming: Rebrands or filters community models without transparency, biggest fail was calling the smaller Deepseek-R1 distills just "Deepseek-R1" adding to a massive confusion on social media and from "AI Content Creators", that you can run "THE" DeepSeek-R1 on any potato.
Difficult to change Context Window default: Using Ollama as a backend, it is difficult to change default context window size on the fly, leading to hallucinations and endless circles on output, especially for Agents / Thinking models.
---If you want better, (in some cases more open) alternatives:
llama.cpp: Battle-tested C++ engine with minimal deps and faster with many optimizations
ik_llama.cpp: High-perf fork, even faster than default llama.cpp
llama-swap: YAML-driven model swapping for your endpoint.
LM Studio: GUI for any GGUF model—no proprietary formats with all llama.cpp optimizations available in a GUI
Open WebUI: Front-end that plugs into llama.cpp, ollama, MPT, etc.
> Vendor lock-in
That is, probably the most ridiculous of the statements. Ollama is open source, llama.cpp is open source, llamafiles are zip files that contain quantized versions of models openly available to be run with numerous other providers. Their llama.cpp changes are primarily for performance and compatibility. Yes, they run a registry on ollama.com for pre-packed, pre-quantized versions of models that are, again, openly available.
> Closed-source tweaks
Oh so many things wrong in a short sentence. Llama.cpp is MIT licensed, not GPL license. A proprietary fork is perfectly legitimate use. Also.. “proprietary“? The source code is literally available, including the patches, on GitHub in ollama/ollama project, in the “llama” folder with a patch file as recent as yesterday?
> Mixed Performance
Yes, almost anything suffers degraded performance when the goal is usability instead of performance. It is why people use C# instead of Assembly or punch cards. Performance isn’t the only metric, which makes this a useless point.
> Opaque model name
Sure, their official models have some ambiguities sometimes. I don’t know know that is the “problem” that people make it out to be when ollama is designed for average people to run models, and so a decision like “ollama run qwen3” not being the absolutely maximum best option possible rather than the option most people can run makes sense. Do really think it is advantageous or user friendly, when Tommy wants to try out “Deepseek-r1” on his potato laptop that a 671b parameter model too large to fit on almost anything consumer computer is the right choice and that it is instead meant as a “deception”? That seems…disingenuous. Not to mention, they are clearly listed as such on ollama.com, where in black and white it says the deep seek-r1 by default refers with the qwen model, and that the full model is available as deep seek-r1:671b
> Context Window
Probably the only fair and legitimate criticism of your entire comment.
I’m not an ollama defender or champion, couldn’t care about the company, and I barely use ollama (mostly just to run qwen3-8b for embedding). It really is just that most of these complaints you’re sharing from others seem to have TikTok-level fact checking.
rihegher•1d ago
amelius•23h ago
permalac•23h ago
SchemaLoad•20h ago
dwaaa•13h ago
pkaye•23h ago
mchiang•23h ago
https://github.com/open-webui/open-webui