https://www.reddit.com/r/LocalLLaMA/comments/1jzocoo/finally...
https://github.com/ollama/ollama/issues/5245
Sadly it is not and the issue still remains open after over a year meaning ollama cannot run the latest SOTA open source models unless they covert them to their proprietary format which they do not consistently do.
No surprise I guess given they've taken VC money, refuse to properly attribute the use things like llama.cpp and ggml, have their own model format for.. reasons? and have over 1800 open issues...
Llama-server, ramallama or whatever model switcher ggerganov is working on (he showed previews recently) feel like the way forward.
Even using llama.cpp as a library seems like an overkill for most use cases. Ollama could make its life much easier by spawning llama-server as a subprocess listening on a unix socket, and forward requests to it.
One thing I'm curious about: Does ollama support strict structured output or strict tool calls adhering to a json schema? Because it would be insane to rely on a server for agentic use unless your server can guarantee the model will only produce valid json. AFAIK this feature is implemented by llama.cpp, which they no longer use.
Here is some relevant drama on the subject:
https://github.com/ollama/ollama/issues/11714#issuecomment-3...
llama.cpp is designed to rapidly adopt research-level optimisations and features, but the downside is that reported speeds change all the time (sometimes faster, sometimes slower) and things break really often. You can't hope to establish contracts with simultaneous releases if there is no guarantee the model will even function.
By reimplementing this layer, Ollama gets to enjoy a kind of LTS status that their partners rely on. It won't be as feature-rich, and definitely won't be as fast, but that's not their goal.
Makes their VCs think they're doing more, and have more ownership, rather than being a do-nothing wrapper with some analytics and S3 buckets that rehost models from HF.
As far as I understand this is generally not possible at the model level. Best you can do is wrap the call in a (non-llm) json schema validator, and emit an error json in case the llm output does not match the schema, which is what some APIs do for you, but not very complicated to do yourself.
Someone correct me if I'm wrong
> Ollama does not use llama.cpp anymore; we do still keep it and occasionally update it to remain compatible for older models for when we used it.
The linked PR is doing "occasionally update it" I guess? Note that "vendored" in the PR title often means to take a snapshot to pin a specific version.
indigodaddy•4h ago
magicalhippo•4h ago
Figured it had to be Ollama doing Ollama things, seems that was indeed the case.
polotics•1h ago
LeoPanthera•59m ago