frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open models by OpenAI

https://openai.com/open-models/
1354•lackoftactics•8h ago•527 comments

Genie 3: A new frontier for world models

https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/
1105•bradleyg223•11h ago•403 comments

Spotting base64 encoded JSON, certificates, and private keys

https://ergaster.org/til/base64-encoded-json/
216•jandeboevrie•5h ago•98 comments

Ollama Turbo

https://ollama.com/turbo
238•amram_art•6h ago•145 comments

Create personal illustrated storybooks in the Gemini app

https://blog.google/products/gemini/storybooks/
69•xnx•4h ago•25 comments

Consider using Zstandard and/or LZ4 instead of Deflate

https://github.com/w3c/png/issues/39
125•marklit•7h ago•70 comments

Claude Opus 4.1

https://www.anthropic.com/news/claude-opus-4-1
638•meetpateltech•8h ago•240 comments

Things that helped me get out of the AI 10x engineer imposter syndrome

https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/
694•coltonv•11h ago•532 comments

Scientific fraud has become an 'industry,' analysis finds

https://www.science.org/content/article/scientific-fraud-has-become-industry-alarming-analysis-finds
270•pseudolus•14h ago•230 comments

What's wrong with the JSON gem API?

https://byroot.github.io/ruby/json/2025/08/02/whats-wrong-with-the-json-gem-api.html
36•ezekg•4h ago•8 comments

The First Widespread Cure for HIV Could Be in Children

https://www.wired.com/story/the-first-widespread-cure-for-hiv-could-be-in-children/
61•sohkamyung•3d ago•11 comments

Ask HN: Have you ever regretted open-sourcing something?

108•paulwilsonn•3d ago•143 comments

uBlock Origin Lite now available for Safari

https://apps.apple.com/app/ublock-origin-lite/id6745342698
963•Jiahang•16h ago•383 comments

Kyber (YC W23) is hiring enterprise account executives

https://www.ycombinator.com/companies/kyber/jobs/6RvaAVR-enterprise-account-executive-ae
1•asontha•4h ago

Show HN: Stagewise (YC S25) – Front end coding agent for existing codebases

https://github.com/stagewise-io/stagewise
31•juliangoetze•10h ago•34 comments

Build Your Own Lisp

https://www.buildyourownlisp.com/
216•lemonberry•13h ago•58 comments

US reportedly forcing TSMC to buy 49% stake in Intel to secure tariff relief

https://www.notebookcheck.net/Desperate-measures-to-save-Intel-US-reportedly-forcing-TSMC-to-buy-49-stake-in-Intel-to-secure-tariff-relief-for-Taiwan.1079424.0.html
293•voxadam•7h ago•341 comments

Quantum machine learning via vector embeddings

https://arxiv.org/abs/2508.00024
8•adbabdadb•2h ago•0 comments

Los Alamos is capturing images of explosions at 7 millionths of a second

https://www.lanl.gov/media/publications/1663/dynamics-of-dynamic-imaging
104•LAsteNERD•10h ago•86 comments

Under the Hood of AFD.sys Part 1: Investigating Undocumented Interfaces

https://leftarcode.com/posts/afd-reverse-engineering-part1/
24•omegadev•2d ago•5 comments

The mystery of Winston Churchill's dead platypus was finally solved

https://www.bbc.com/news/articles/cglzl1ez283o
43•benbreen•2d ago•7 comments

Cannibal Modernity: Oswald de Andrade's Manifesto Antropófago (1928)

https://publicdomainreview.org/collection/manifesto-antropofago/
19•Thevet•2d ago•3 comments

AI is propping up the US economy

https://www.bloodinthemachine.com/p/the-ai-bubble-is-so-big-its-propping
111•mempko•5h ago•126 comments

No Comment (2010)

https://prog21.dadgum.com/57.html
60•ColinWright•10h ago•49 comments

Tell HN: Anthropic expires paid credits after a year

176•maytc•23h ago•87 comments

Cow vs. Water Buffalo Mozzarella

http://itscheese.com/reviews/mozzarella
18•indigodaddy•3d ago•17 comments

Eleven Music

https://elevenlabs.io/blog/eleven-music-is-here
163•meetpateltech•9h ago•203 comments

Apache ECharts 6

https://echarts.apache.org/handbook/en/basics/release-note/v6-feature/
261•makepanic•18h ago•30 comments

GitHub pull requests were down

https://www.githubstatus.com/incidents/6swp0zf7lk8h
113•lr0•9h ago•150 comments

Using Dspy to Detect Document Boundaries

https://kmad.ai/Using-DSPy-to-Detect-Document-Boundaries
4•aberoham•2d ago•1 comments
Open in hackernews

Ollama Turbo

https://ollama.com/turbo
238•amram_art•6h ago

Comments

turnsout•5h ago
Man, busy day in the world of AI announcements! This looks coordinated with OpenAI, as it launches with `gpt-oss-20b` and `gpt-oss-120b`
sambaumann•5h ago
Yep, on the ollama home page (https://ollama.com/) it says

> OpenAI and Ollama partner to launch gpt-oss

jasonjmcghee•5h ago
Interested to see how this plays out - I feel like Ollama is synonymous with "local".
Aurornis•5h ago
There's a small but vocal minority of users who don't trust big companies, but don't mind paying small companies for a similar service.

I'm also interested to see if that small minority of people are willing to pay for a service like this.

recursivegirth•5h ago
Ollama, run by Facebook. Small company, huh.
mchiang•5h ago
Ollama is not run by Facebook. We are a small team building our dreams.
criddell•4h ago
I thought it was a Meta company because the name is so close to Llama which is a Meta product.

I looked up the Ollama trademark and was surprised to see it's a Canadian company.

threetonesun•4h ago
I view it a bit like I do cloud gaming, 90% of the time I'm fine with local use, but sometimes it's just more cost effective to offload the cost of hardware to someone else. But it's not an all-or-nothing decision.
moralestapia•5h ago
Ollama is great but I feel like Georgi Gerganov deserves way more credit for llama.cpp.

He (almost) single-handedly brought LLMs to the masses.

With the latest news of some AI engineers' compensation reaching up to a billion dollars, feels a bit unfair that Georgi is not getting a much larger slice of the pie.

freedomben•5h ago
Is Georgi landing any of those big-time money jobs? I could see a conflict-of-interest given his involvment with llama.cpp, but I would think he'd be well positioned for something like that
moralestapia•5h ago
(This is mere speculation)

I think he's happy doing his own thing.

But then, if someone came in with a billion ... who wouldn't give it a thought?

webdevver•5h ago
really a billion bucks is far too much, that is beyond the curve.

$50M, now thats just perfect. you're retired, nor burdened with a huge responsibility

apwell23•5h ago
https://ggml.ai/

> ggml.ai is a company founded by Georgi Gerganov to support the development of ggml. Nat Friedman and Daniel Gross provided the pre-seed funding.

mrs6969•5h ago
Agreed. Ollama itself is kind a wrapper around llamacpp anyway. Feel like the real guy is not included to the process.

Now I am going to go and write a wrapper around llamacpp, that is only open source, truly local.

How can I trust ollama to not to sell my data.

rafram•5h ago
Ollama is not a wrapper around llama.cpp anymore, at least for multimodal models (not sure about others). They have their own engine: https://ollama.com/blog/multimodal-models
iphone_elegance•1h ago
looks like the backend is ggml, am I missing something? same diff
Patrick_Devine•5h ago
Ollama only uses llamacpp for running legacy models. gpt-oss runs entirely in the ollama engine.

You don't need to use Turbo mode; it's just there for people who don't have capable enough GPUs.

extr•5h ago
Nice release. Part of the problem right now with OSS models (at least for enterprise users) is the diversity of offerings in terms of:

- Speed

- Cost

- Reliability

- Feature Parity (eg: context caching)

- Performance (What quant level is being used...really?)

- Host region/data privacy guarantees

- LTS

And that's not even including the decision of what model you want to use!

Realistically if you want to use an OSS model instead of the big 3, you're faced with evalutating models/providers across all these axes, which can require a fair amount of expertise to discern. You may even have to write your own custom evaluations. Meanwhile Anthropic/OAI/Google "just work" and you get what it says on the tin, to the best of their ability. Even if they're more expensive (and they're not that much more expensive), you are basically paying for the priviledge of "we'll handle everything for you".

I think until providers start standardizing OSS offerings, we're going to continue to exist in this in-between world where OSS models theoretically are at performance parity with closed source, but in practice aren't really even in the running for serious large scale deployments.

coderatlarge•3h ago
true but ignores handing over all your prompt traffic without any real legal protections as sama has pointed out:

[1] https://californiarecorder.com/sam-altman-requires-ai-privil...

supermatt•3h ago
> OpenAI confirmed it has been preserving deleted and non permanent person chat logs since mid-Might 2025 in response to a federal court docket order

> The order, embedded under and issued on Might 13, 2025, by U.S. Justice of the Peace Decide Ona T. Wang

Is this some meme where “may” is being replaced with “might”, or some word substitution gone awry? I don’t get it.

kekebo•2h ago
:)) Apparently. I don't have a better guess. Well spotted
satellite2•5h ago
"All hardware is located in the United States."

If I use local/OSS models it's specifically to avoid running in a country with no data protection laws. It's a big close miss here.

bangaladore•5h ago
I think what matters more here is "All hardware is located outside of China". Located in the US means little because that's not good enough for many regulated industries even within the US.

All things considered though, Europe is getting confusing. They have GDPR but now pushing to backdoor encryption within the EU? [1]

At least there isn't a strong movement in the US trying to outlaw E2E encryption.

[1] https://www.eff.org/deeplinks/2025/06/eus-encryption-roadmap...

Which brings up the point are truly private LLMs possible? Where the input I provide is only meaningful to me, but the LLM can still transform it without gaining any contextual value out of it? Without sharing a key? If this can be done, can it be done performantly?

blitzar•5h ago
I would feel safer if the hardware was located in China than in the US.
wkat4242•4h ago
Even the backdoor is an American lobby. Ashton Kutcher and Demi Moore's Thorn.
bangaladore•4h ago
Maybe I hit a nerve with the EU part? I thought it was a fair observation, but I'm open to being corrected if there's more nuance I missed.
spookie•4h ago
The bill has been stalled since 2022.

Yes, there is gonna be a new discussion for it on October 15, but I've already seen section of governments being against their own government position on the bill (Swedish Military for example).

riazrizvi•4h ago
No I think the point is to choose the best jurisdiction to have cloud hosted data where your data is best protected from access by very wealthy entities via intelligence services bribery. That’s still hands down the USA.
pphysch•4h ago
Any evidence for this claim that e.g. Mossad has less penetration into digital systems of USA than it does RF or PRC?
observationist•4h ago
They might have access to any given machine, but they lack the broad scope of general surveillance. If they want to get you, just like most of the other nation state level threats, you will get got. For other threat models, the US works pretty well.

I guarantee that nobody cares about or will be surveilling your private AI use unless you're doing other things that warrant surveillance.

The reason big providers suck, as OpenAI is so nicely demonstrating for us, is that they retain everything, the user is the product, and court cases, other situations can unmask and expose everything you do on a platform to third parties. This country seriously needs a digital bill of rights.

riazrizvi•4h ago
Nobody cares? That seems ludicrous to me. The last 3 decades of business have been characterized most of all by the increased access of private information on people for online business competitive insights. Sure if you are just a consumer you have nothing of real value except in the aggregate, but if you are an up-and-coming business drawing customers away from other businesses, your private AI use is absolutely of interest. Which is why serious businesses here scour the ToS.

The biggest game in town has been managing platforms that give owners an information advantage. But at least the world generally trusts the USA to abide by laws and user agreements, which is why, to my mind, the USA retains the near monopoly on information platforms.

I personally wouldn’t trust a UK platform for example, being a Brit native. The top echelon talent pool is so small and incestuous I don’t believe I would experience a fair playing field if a business of mine passed a certain size of national reach/importance.

EDIT: from ChatGPT, new money entrepreneurs with no inheritence/political ties by economic region, USA ~63%, UK/HongKong/Singapore ~45%, Emerging Markets ~35%, EU ~22%, Russia ~10%

impulser_•4h ago
Then don't use it and keep using models locally?
computegabe•5h ago
Why does everything AI-related have to be $20? Why can't there be tiers? OpenAI setting the standard of $20/m for every AI application is one of the worst things to ever happen.
thimabi•5h ago
My guess is that’s the lowest price point that provides a modicum of profitability — LLMs are quite expensive to run, and even more so for providers like Ollama, which are entering the market and don’t have idle capacity.
furyofantares•4h ago
Claude has $20, $100 and $200, ChatGPT $20, and $200, Google has $20 and $250. Those all have free tiers as well, and metered APIs. Grok has $30 and $300 it looks like, the list probably goes on and on.
colesantiago•4h ago
Tokens are expensive and nobody is making any money.
senectus1•1h ago
yep. this is the 2nd half of why the AI bubble is going to pop.
joecot•4h ago
I strongly recommend together.ai, which allows you to use a lot of different open source models and charges for usage, not a monthly fee.
paxys•3h ago
https://openai.com/chatgpt/pricing/ - $0 / $20 / $200 / $25 (team) / custom enterprise pricing / on-demand API pricing

https://www.anthropic.com/pricing - $0 / $17 (if billed annually) / $20 (if billed monthly) / $100 / $25 (team) / custom enterprise pricing / on-demand API pricing

Sounds like tiers to me.

smlacy•5h ago
Watching ollama pivot from a somewhat scrappy yet amazingly important and well designed open source project to a regular "for-profit company" is going to be sad.

Thankfully, this may just leave more room for other open source local inference engines.

user-•5h ago
I remember them pivoting from being infra.hq
smeeth•5h ago
Their FOSS local inference service didn't go anywhere.

This isn't Anaconda, they didn't do a bait and switch to screw their core users. It isn't sinful for devs to try and earn a living.

blitzar•5h ago
Yet. Their FOSS local inference service hasn't go anywhere ... yet.
kermatt•4h ago
Another perspective:

If you earn a living using something someone else built, and expect them not to earn a living, your paycheck has a limited lifetime.

“Someone” in this context could be a person, a team, or a corporate entity. Free may be temporary.

dcreater•2h ago
You can build this and go build something else as well. You don't need to morph the thing you built. That's underhanded
satvikpendem•5h ago
> important and well designed open source project

It was always just a wrapper around the real well designed OSS, llama.cpp. Ollama even messes up the names of models by calling distilled models the name of the actual one, such as DeepSeek.

Ollama's engineers created Docker Desktop, and you can see how that turned out, so I don't have much faith in them to continue to stay open given what a rugpull Docker Desktop became.

Philpax•3h ago
I wouldn't go as far as to say that llama.cpp is "well designed" (there be demons there), but I otherwise agree with the sentiment.
mchiang•5h ago
we have always been building in the open, and so is Ollama. All the core pieces of Ollama are open. There are areas where we want to be opinionated on the design to build the world we want to see.

There are areas we will make money, and I wholly believe if we follow our conscious we can create something amazing for the world while making sure we can keep it fueled to keep it going for the long term.

Some of the ideas in Turbo mode (completely optional) is to serve the users who want a faster GPU, and adding in additional capabilities like web search. We loved the experience so much that we decided to give web search to non-paid users too. (Again, it's fully optional). Now to prevent abuse and make sure our costs don't go out of hand, we require login.

Can't we all just work together and create a better world? Or does it have to be so zero sum?

xiphias2•4h ago
I wanted to try web search to increase my privacy but it wanted to do login.

For Turbo mode I understand the need for paying but the main poing of running a local model with web search is browsing from my computer without using any LLM provider. Also I want to get rid of the latency to US servers from Europe.

If ollama can't do it, maybe a fork.

mchiang•3h ago
login does not mean payment. It is free to use. It costs us to perform the web search, so we want to make sure it is not subject to abuse.
dcreater•2h ago
I'm sorry but your words don't match your actions.
shepardrtc•4h ago
I think this offering is a perfectly reasonable option for them to make money. We all have bills to pay, and this isn't interfering with their open source project, so I don't see anything wrong with it.
Aeolun•1h ago
> this isn't interfering with their open source project

Wait until it makes significant amounts of money. Suddenly the priorities will be different.

I don’t begrudge them wanting to make some money off it though.

otabdeveloper4•4h ago
[flagged]
mchiang•4h ago
sorry that you feel the way you feel. :(

I'm not sure which package we use that is triggering this. My guess is llama.cpp based on what I see on social? Ollama has long shifted to using our own engine. We do use llama.cpp for legacy and backwards compatibility. I want to be clear it's not a knock on the llama.cpp project either.

There are certain features we want to build into Ollama, and we want to be opinionated on the experience we want to build.

Have you supported our past gigs before? Why not be more happy and optimistic in seeing everyone build their dreams (success or not).

If you go build a project of your dreams, I'd be supportive of it too.

dangoodmanUT•4h ago
Yes everyone should just write cpp to call local LLMs obviously
api•4h ago
> Repackaging existing software while literally adding no useful functionality was always their gig.

Developers continue to be blind to usability and UI/UX. Ollama lets you just install it, just install models, and go. The only other thing really like that is LM-Studio.

It's not surprising that the people behind it are Docker people. Yes you can do everything Docker does with Linux kernel and shell commands, but do you want to?

Making software usable is often many orders of magnitude more work than making software work.

llmtosser•4h ago
This is not true.

No inference engine does all of:

- Model switching

- Unload after idle

- Dynamic layer offload to CPU to avoid OOM

ekianjo•3h ago
this can be added to llama.cpp with llama.swap currently so even without Ollama you are not far off
dang•4h ago
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

dangoodmanUT•4h ago
It was always a company
colesantiago•4h ago
ollama is YC and VC backed, this was inevitable and not surprising.

All companies that raise outside investment follow this route.

No exceptions.

And yes this is how ollama will fall due to enshittification, for lack of a better word.

TuringNYC•4h ago
>> Watching ollama pivot from a somewhat scrappy yet amazingly important and well designed open source project to a regular "for-profit company" is going to be sad.

if i could have consistent and seamless local-cloud dev that would be a nice win. everyone has to write things 3x over these days depending on your garden of choice, even with langchain/llamaindex

decide1000•5h ago
It was fun because it was open. Now it's just another brand seeking dollars.
mchiang•5h ago
Ollama at its core will always be open. Not all users have the computer to run models locally, and it is only fair if we provide GPUs that cost us money and let the users who optionally want it to pay for it.
ciaranmca•4h ago
I think it’s the logical move to ensure Ollama can continue to fund development. I think you will probably end up having to add more tiers or some way for users to buy more credits/gpu time. See anthropic’s recent move with Claude code due to the usage of a number of 24/7 users.
thimabi•5h ago
I’m not throwing the towel on Ollama yet. They do need dollars to operate, but still provide excellent software for running models locally and without paying them a dime.
recursivegirth•5h ago
^ this. As a developer, Ollama has been my go-to for serving offline models. I then use cloudflare tunnels to make them available where I need them.
jnmandal•5h ago
I see a lot of hate for ollama doing this kind of thing but also they remain one of the easiest to use solutions for developing and testing against a model locally.

Sure, llama.cpp is the real thing, ollama is a wrapper... I would never want to use something like ollama in a production setting. But if I want to quickly get someone less technical up to speed to develop an LLM-enabled system and run qwen or w/e locally, well then its pretty nice that they have a GUI and a .dmg to install.

mchiang•4h ago
Thanks for the kind words.

Since the new multimodal engine, Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library, and ask hardware partners to help optimize it.

Ollama might look like a toy and what looks trivial to build. I can say, to keep its simplicity, we go through a deep amount of struggles to make it work with the experience we want.

Simplicity is often overlooked, but we want to build the world we want to see.

dcreater•2h ago
But Ollama is a toy, it's meaningful for hobbyists and individuals to use locally like myself. Why would it be the right choice for anything more? AWS, vLLM, SGLang etc would be the solutions for enterprise

I knew a startup that deployed ollama on a customers premises and when I asked them why, they had absolutely no good reason. Likely they did it because it was easy. That's not the "easy to use" case you want to solve for.

steren•4h ago
> I would never want to use something like ollama in a production setting.

We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.

ekianjo•3h ago
you need to benchmark against llama.cpp as well.
apitman•3h ago
Did you test multi-user cases?
liuliu•5h ago
Any more information on "Privacy first"? It seems pretty thin if just not retaining data.

For Draw Things provided "Cloud Compute", we don't retain any data too (everything is done in RAM per request). But that is still unsatisfactory personally. We will soon add "privacy pass" support, but still not to the satisfactory. Transparency log that can be attested on the hardware would be nice (since we run our open-source gRPCServerCLI too), but I just don't know where to start.

pagekicker•3h ago
I see no privacy advantage to working with Ollama, which can sell your data or have it subpoenaed just like anyone else.
liuliu•2h ago
In theory, "privacy pass" should help, as you can subpoena content, but cannot know who made these. But that is still thin (and Ollama not doing that too anyway).
pogue•1h ago
I would pay more if they let you run the models in Switzerland or some other GDPR respecting country, even if there was extra latency. I would also hope everything is being sent over SSL or something similar.
seanmcdirmid•1h ago
I had to do a double take here. Switzerland surely isn’t in the GDPR, so you mean their own privacy laws or GDPR in the EU?
jmort•6m ago
I don't see a privacy policy and their desktop app is closed source. So, not encouraging.

[full disclosure I am working on something with actual privacy guarantees for LLM calls that does use a transparency log, etc.]

colesantiago•4h ago
No matter if a project is "open source" as long as they announce that they have raised millions amount of dollars from investors...

It is completely compromised, especially if it is an AI company.

How do you think ollama was able to provide the open source AI models to everyone for free?

I am pretty sure ollama was losing money on every pull of those images from their infrastructure.

Those that are now angry at ollama charging money or not focusing on privacy should have been angry when they raised money from investors.

llmtosser•4h ago
Distractions like this probably the reason they still, over a year now, do not support sharded GGUF.

https://github.com/ollama/ollama/issues/5245

If any of the major inference engines - vLLM, Sglang, llama.cpp - incorporated api driven model switching, automatic model unload after idle and automatic CPU layer offloading to avoid OOM it would avoid the need for ollama.

jychang•4h ago
That’s just llama-swap and llama.cpp
llmtosser•4h ago
Interesting - it does indeed seem like llama-server has the needed endpoints to do the model swapping and llama.cpp as of recently also has a new flag for the dynamic CPU offload now.

However the approach to model swapping is not 'ollama compatible' which means all the OSS tools supporting 'ollama' Ex Openwebui, Openhands, Bolt.diy, n8n, flowise, browser-use etc.. aren't able to take advantage of this particularly useful capability as best I can tell.

jacekm•4h ago
What could be the benefit of paying $20 to Ollama to run inferior models instead of paying the same amount of money to e.g. OpenAI for access to sota models?
vanillax•4h ago
nothing lmao. this is just ollama trying to make money.
ibejoeb•4h ago
I run a lot of mundane jobs that work fine with less capable models, so I can see the potential benefit. It all depends on the limits though.
AndroTux•3h ago
Privacy, I guess. But at this point it’s just believing that they won’t log your data.
daft_pink•3h ago
I feel the primary benefit of this Ollama Turbo is that you can quickly test and run different models in the cloud that you could run locally if you had the correct hardware.

This allows you to try out some open models and better assess if you could buy a dgx box or Mac Studio with a lot of unified memory and build out what you want to do locally without actually investing in very expensive hardware.

Certain applications require good privacy control and on-prem and local are something certain financial/medical/law developers want. This allows you to build something and test it on non-private data and then drop in real local hardware later in the process.

dawnerd•2h ago
Quickly test… the two models they support? This is just another subscription to quantized models.
fluidcruft•42m ago
Me at home: $20/mo while I wait for a card that can run this or dgx box? Decisions, decisions.
rapind•3h ago
I'm not sure the major models will remain at $20. Regardless, I support any and all efforts to keep the space crowded and competitive.
michelsedgh•3h ago
I think its the data privacy is the main point and probably more usage before you hit limits? But mainly data privacy i guess
_--__--__•3h ago
Groq seems to do okay with a similar service but I think their pricing is probably better.
Geezus_42•2h ago
Yeah, the NAZI sex not will be great for business!
gabagool•2h ago
You are thinking of Elon Grok, not Groq
janalsncm•2h ago
When Grok originally came out I thought it was unlucky on Groq’s part. Now that Grok has certain connotations, it’s even more true.
fredoliveira•2h ago
Groq (the inference service) != Grok (xAI's model)
timmg•4h ago
It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.

I pay $20 to Anthropic, so I don’t think I’d get enough use out of this for the $20 fee. But being able to spin up any of these models and use as needed (and compare) seems extremely useful to me.

I hope this works out well for the team.

ac29•3h ago
> It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.

Agreed, though there are already several providers of these new OpenAI models available, so I'm not sure what ollama's value add is there (there are plenty of good chat/code/etc interfaces available if you are bringing your own API keys).

Aeolun•1h ago
I mean $20/month for API access is definitely new.
domatic1•3h ago
Open router competition?
philip1209•3h ago
Seems like an easy way to run gpt-oss for development environments on laptops. Probably necessary if you plan to self-host in production.
paxys•3h ago
A subscription fee for API usage is definitely an interesting offering, though the actual value will depend on usage limits (which are kept hidden).
mchiang•3h ago
we are learning the usage patterns to be able to price this more properly.
orliesaurus•3h ago
Does anyone know if this is like like OpenRouter?
ivape•2h ago
Often the math works out that you get a lot more for $20 a month if you settle for smaller sized but capable models (8b-30b). I don’t see how it’s better other than Ollama can “promise” they don’t store your data where as OpenRouter is dependent on which host you choose (and there’s no indicator on OpenRouter exposing which ones do or don’t).

In a universe where everything you say can be taken out of context, things like OpenAi will be a data leak nightmare.

Need this soon:

https://arxiv.org/abs/2410.02486

dcreater•2h ago
Called it.

It's very unfortunate that the local inference community has aggregated around Ollama when it's clear that's not their long term priority or strategy.

Its imperative we move away ASAP

mchiang•2h ago
hmm, how so? Ollama is open and the pricing is completely optional for users who want additional GPUs.

Is it bad to fairly charge money for selling GPUs that cost us money too, and use that money to grow the core open-source project?

At one point, it just has to be reasonable. I'd like to believe by having a conscientious, we can create something great.

tomrod•41m ago
Everyone just wants to solarpunk this up.
idiotsecant•2h ago
Oh no this is a positively diabolical development, offering...hosting services tailored to a specific use case at a reasonable price ...
mrcwinn•2h ago
Yes, better to get free sh*t unsustainably. By the way, you're free to create an open source alternative and pour your time into that so we can all benefit. But when you don't — remember I called it!
rpdillon•2h ago
What? The obvious move is to never have switched to Ollama and just use Llama.cpp directly, which I've been doing for years. Llama.cpp was created first, is the foundation for this product, and is actually open source.
tarruda•2h ago
Llama.cpp (library which ollama uses under the hoods) has its own server, and it is fully compatible with open-webui.

I moved away from ollama in favor of llama-server a couple of months ago and never missed anything, since I'm still using the same UI.

A4ET8a8uTh0_v2•2h ago
Interesting, admittedly, I am slowly getting to the point, where ollama's defaults get a little restrictive. If the setup is not too onerous, I would not mind trying. Where did you start?
tarruda•2h ago
Download llama-server from llama.cpp Github and install it some PATH directory. AFAIK they don't have an automated installer, so that can be intimidating to some people

Assuming you have llama-server installed, you can download + run a hugging face model with something like

    llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja

And access http://localhost:8080
mchiang•2h ago
totally respect your choice, and it's a great project too. Of course as a maintainer of Ollama, my preference is to win you over with Ollama. If it doesn't meet your needs, it's okay. We are more energized than ever to keep improving Ollama. Hopefully one day we will win you back.

Ollama does not use llama.cpp anymore; we do still keep it and occasionally update it to remain compatible for older models for when we used it. The team is great, we just have features we want to build, and want to implement the models directly in Ollama. (We do use GGML and ask partners to help it. This is a project that also powers llama.cpp and is maintained by that same team)

tarruda•2h ago
> Ollama does not use llama.cpp anymore

That is interesting, did Ollama develop its own proprietary inference engine or did you move to something else?

Any specific reason why you moved away from llama.cpp?

mchiang•2h ago
it's all open, and specifically, the new models are implemented here: https://github.com/ollama/ollama/tree/main/model/models
kristjansson•43m ago
> Ollama does not use llama.cpp anymore;

> We do use GGML

Sorry, but this is kind of hiding the ball. You don't use llama.cpp, you just ... use their core library that implements all the difficult bits, and carry a patchset on top of it?

Why do you have to start with the first statement at all? "we use the core library from llama.cpp/ggml and implement what we think is a better interface and UX. we hope you like it and find it useful."

om8•1h ago
It’s unfortunate that llama.cpp’s code is a mess. It’s impossible to make any meaningful contributions to it.
kristjansson•1h ago
I'm the first to admit I'm not a heavy C++ user, so I'm not a great judge of the quality looking at the code itself ... but ggml-org has 400 contributors on ggml, 1200 on llama.cpp and has kept pace with ~all major innovations in transformers over the last year and change. Clearly some people can and do make meaningful contributions.
halJordan•1h ago
Fully compatible is a stretch, it's important we dont fall into a celebrity "my guy is perfect" trap. They implement a few endpoints.
jychang•1h ago
They implement more openai-compatible endpoints than ollama at least
janalsncm•2h ago
Huggingface also offers a cloud product, but that doesn’t take away from downloading weights and running them locally.
sitkack•1h ago
I believe that is what https://github.com/containers/ramalama set out to do.
cchance•41m ago
I stopped using them when they started doing the weird model naming bullshit stuck with lmstudio since
Aurornis•31m ago
> Its imperative we move away ASAP

Why? If the tool works then use it. They’re not forcing you to use the cloud.

jcelerier•3m ago
happy sglang user here :)
irthomasthomas•2h ago
If these are FP4 like the other ollama models then I'm not very interested. If I'm using an API anyway I'd rather use the full weights.
mchiang•2h ago
OpenAI has only provided MXFP4 weights. These are the same weights used by other cloud providers.
irthomasthomas•1h ago
Oh, I didn't know that. Weird!
reissbaker•1h ago
It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).
irthomasthomas•1h ago
Interesting, thanks. I didn't know you could even train at FP4 on H100s
captainregex•2h ago
I am so so so confused as to why Ollama of all companies did this other than an emblematic stab at making money-perhaps to appease someone putting pressure on them to do so. Their stuff does a wonderful job of enabling local for those who want it. So many things to explore there but instead they stand up yet another cloud thing? Love Ollama and hope it stays awesome
janalsncm•2h ago
The problem is that OSS is free to use but it is not free to create or maintain. If you want it to remain free to use and also up to date, Ollama will need someone to address issues on GitHub. Usually people want to be paid money for that.
captainregex•2h ago
money is great! I like money! but if this is their version of buy me a coffee I think there’s room to run elsewhere for their skillset/area of expertise
mchiang•2h ago
hmm, I don't think so. This is more of, we want to keep improving Ollama so we can have a great core.

For the users who want GPUs, which cost us money, we will charge money for it. Completely optional.

scosman•2h ago
I build an app against the Ollama API. If this will let me test all Ollama models, I'm so in.
ahmedhawas123•2h ago
So much that is interesting about this

For one of the top local open model inference engines of choice - only supporting OSS out of the gate feels like an angle to just ride the hype knowing OSS is announced today "oh OSS came out and you can use Ollama Turbo to use it"

The subscription based pricing is really interesting. Other players offer this but not for API type services. I always imagine that there will be a real pricing war with LLMs with time / as capabilities mature, and going monthly pricing on API services is possibly a symptom of that

What does this mean for the local inference engine? Does Ollama have enough resources to maintain both?

Havoc•1h ago
That'll be an uphill battle on value proposition tbh. $20 a month for access to a widely available MoE 120B with ~5B active parameters at unspecified usage limits?

I guess their target audience values convenience and easy of use above all else so that could play well there maybe.

selcuka•40m ago
> Turbo includes hourly and daily limits to avoid capacity issues. Usage-based pricing will soon be available to consume models in a metered fashion.

Doesn't look that much better than a ChatGPT Plus subscription.

cchance•40m ago
20$ ... for the openai opensource models in preview only?
radioradioradio•40m ago
Looks like Docker's "offload" product, but with less functionality and more vendor lock-in, the simple pricing both excites and worries me.
agnishom•6m ago
> What is Turbo?

> Turbo is a new way to run open models using datacenter-grade hardware.

What? Why not just say that it is a cloud-based service for running models? Why this language?