Ggml.ai joins Hugging Face to ensure the long-term progress of Local AI

https://github.com/ggml-org/llama.cpp/discussions/19759

215•lairv•1h ago

Comments

rvz•1h ago

This acquisition is almost the same as the acquisition of Bun by Anthropic.

Both $0 revenue "companies", but have created software that is essential to the wider ecosystem and has mindshare value; Bun for Javascript and Ggml for AI models.

But of course the VCs needed an exit sooner or later. That was inevitable.

andsoitis•23m ago

I believe ggml.ai was funded by angel investors, not VC.

jimmydoe•1h ago

Amazing. I like the openness of both project and really excited for them.

Hopefully this does not mean consolidation due to resource dry up but true fusion of the bests.

mnewme•1h ago

Huggingface is the silent GOAT of the AI space, such a great community and platform

lairv•1h ago

Truly amazing that they've managed to build an open and profitable platform without shady practices

al_borland•1h ago

It’s such a sad state of affairs when shady practices are so normal that finding a company without them is noteworthy.

geooff_•1h ago

As someone who's been in the "AI" space for a while its strange how Hugging Face went from one of the biggest name to not a part of the discussion at all.

r_lee•1h ago

I think that's because there's less local AI usage now since there's all kinds of image models by the big labs, so there's really no rush of people self hosting stable diffusion etc anymore

the space moved from Consumer to Enterprise pretty fast due to models getting bigger

zozbot234•1h ago

Today's free models are not really bigger when you account for the use of MoE (with ever increasing sparsity, meaning a smaller fraction of active parameters), and better ways of managing KV caching. You can do useful things with very little RAM/VRAM, it just gets slower and slower the more you try to squeeze it where it doesn't quite belong. But that's not a problem if you're willing to wait for every answer.

LatencyKills•1h ago

It isn't necessary to be part of the discussion if you are truly adding value (which HF continues to do). It's nice to see a company doing what it does best without constantly driving the hype train.

segmondy•28m ago

part of what discussion? anyone in the AI space knows and uses HF, but the public doesn't give a care and why should they? It's just an advanced site were nerds download AI stuff. HF is super valuable with their transformers library, their code, tutorials, smol-models, etc, but how does it translate to investor dollars?

HanClinto•1h ago

I'm regularly amazed that HuggingFace is able to make money. It does so much good for the world.

How solid is its business model? Is it long-term viable? Will they ever "sell out"?

I_am_tiberius•1h ago

I once tried hugging face because I wanted I worked through some tutorial. They wanted my credit card details during the registration as far as I remember. After a month they invoiced me some amount of money and I had no idea what it was. To be honest, I don't understand what exactly they do and what services I was paying for, but I cancelled my account and never touched it again. For me that was a totally intransparent process.

shafyy•1h ago

Their pricing seems pretty transparent: https://huggingface.co/pricing

dmezzetti•1h ago

They have paid hosting - https://huggingface.co/enterprise and paid accounts. Also consulting services. Seems like a pretty good foundation to me.

dmezzetti•1h ago

This is really great news. I've been one of the strongest supporters of local AI dedicating thousands of hours towards building a framework to enable it. I'm looking forward to seeing what comes of it!

logicallee•46m ago

>I've been one of the strongest supporters of local AI, dedicating thousands of hours towards building a framework to enable it.

Sounds like you're very serious about supporting local AI. I have a query for you (and anyone else who feels like donating) about whether you'd be willing to donate some memory/bandwidth resources p2p to hosting an offline model:

We have a local model we would like to distribute but don't have a good CDN.

As a user/supporter question, would you be willing to donate some spare memory/bandwidth in a simple dedicated browser tab you keep open on your desktop that plays silent audio (to not be put in the background and deloaded) and then allocates 100mb -1 gb of RAM and acts as a webrtc peer, serving checksumed models?[1] (Then our server only has to check that you still have it from time to time, by sending you some salt and a part of the file to hash and your tab proves it still has it by doing so). This doesn't require any trust, and the receiving user will also hash it and report if there's a mismatch.

Our server federates the p2p connections, so when someone downloads they do so from a trusted peer (one who has contributed and passed the audits) like you. We considered building a binary for people to run but we consider that people couldn't trust our binaries, or would target our build process somehow, we are paranoid about trust, whereas a web model is inherently untrusted and safer. Why do all this?

The purpose of this would be to host an offline model: we successfully ported a 1 GB model from C++ and Python to WASM and WegGPU (you can see Claude doing so here, we livestreamed some of it[2]), but the model weights at 1 GB are too much for us to host.

Please let us know whether this is something you would contribute a background tab to hosting on your desktop. It wouldn't impact you much and you could set how much memory to dedicate to it, but you would have the good feeling of knowing that you're helping people run a trusted offline model if they want - from their very own browser, no download required. The model we ported is fast enough for anyone to run on their own machines. Let me know if this is something you'd be willing to keep a tab open for.

[1] filesharing over webrtc works like this: https://taonexus.com/p2pfilesharing/ you can try it in 2 browser tabs.

[2] https://www.youtube.com/watch?v=tbAkySCXyp0and and some other videos

beoberha•1h ago

Seems like a great fit - kinda surprised it didn’t happen sooner. I think we are deep in the valley of local AI, but I’d be willing to bet it breaks out in the next 2-3 years. Here’s hoping!

mythz•1h ago

I consider HuggingFace more "Open AI" than OpenAI - one of the few quiet heroes (along with Chinese OSS) helping bring on-premise AI to the masses.

I'm old enough to remember when traffic was expensive, so I've no idea how they've managed to offer free hosting for so many models. Hopefully it's backed by a sustainable business model, as the ecosystem would be meaningfully worse without them.

We still need good value hardware to run Kimi/GLM in-house, but at least we've got the weights and distribution sorted.

zozbot234•1h ago

> We still need good value hardware to run Kimi/GLM in-house

If you stream weights in from SSD storage and freely use swap to extend your KV cache it will be really slow (multiple seconds per token!) but run on basically anything. And that's still really good for stuff that can be computed overnight, perhaps even by batching many requests simultaneously. It gets progressively better as you add more compute, of course.

HPsquared•39m ago

At a certain point the energy starts to cost more than renting some GPUs.

data-ottawa•1h ago

Can we toss in the work unsloth does too as an unsung hero?

They provide excellent documentation and they’re often very quick to get high quality quants up in major formats. They’re a very trustworthy brand.

cubie•50m ago

I'm a big fan of their work as well, good shout.

disiplus•26m ago

Yeah, they're the good guys. I suspect the open source work is mostly advertisements for them to sell consulting and services to enterprises. Otherwise, the work they do doesn't make sense to offer for free.

sowbug•20m ago

Why doesn't HF support BitTorrent? I know about hf-torrent and hf_transfer, but those aren't nearly as accessible as a link in the web UI.

the__alchemist•1h ago

Does anyone have a good comparison of HuggingFace/Candle to Burn? I am testing them concurrently, and Burn seems to have an easier-to-use API. (And can use Candle as a backend, which is confusing) When I ask on Reddit or Discord channels, people overwhelmingly recommend Burn, but provide no concrete reasons beyond "Candle is more for inference while Burn is training and inference". This doesn't track, as I've done training on Candle. So, if you've used both: Thoughts?

dhruv3006•54m ago

Huggingface is actually something thats driving good in the world. Good to see this collab/

androiddrew•53m ago

One of the few acquisitions I do support

tkp-415•52m ago

Can anyone point me in the direction of getting a model to run locally and efficiently inside something like a Docker container on a system with not so strong computing power (aka a Macbook M1 with 8gb of memory)?

Is my only option to invest in a system with more computing power? These local models look great, especially something like https://huggingface.co/AlicanKiraz0/Cybersecurity-BaronLLM_O... for assisting in penetration testing.

I've experimented with a variety of configurations on my local system, but in the end it turns into a make shift heater.

xrd•47m ago

I think a better bet is to ask on reddit.

https://www.reddit.com/r/LocalLLM/

Everytime I ask the same thing here, people point me there.

zozbot234•46m ago

The general rule of thumb is that you should feel free to quantize even as low as 2 bits average if this helps you run a model with more active parameters. Quantized models are not perfect at all, but they're preferable to the models with fewer, bigger parameters. With 8GB usable, you could run models with up to 32B active at heavy quantization.

mft_•21m ago

There’s no way around needing a powerful-enough system to run the model. So you either choose a model that can fit on what you have —i.e. via a small model, or a quantised slightly larger model— or you access more powerful hardware, either by buying it or renting it. (IME you don’t need Docker. For an easy start just install LM Studio and have a play.)

I picked up a second-hand 64GB M1 Max MacBook Pro a while back for not too much money for such experimentation. It’s sufficiently fast at running any LLM models that it can fit in memory, but the gap between those models and Claude is considerable. However, this might be a path for you? It can also run all manner of diffusion models, but there the performance suffers (vs. an older discrete GPU) and you’re waiting sometimes many minutes for an edit or an image.

sigbottle•17m ago

Are mac kernels optimized compared to CUDA kernels? I know that the unified GPU approach is inherently slower, but I thought a ton of optimizations were at the kernel level too (CUDA itself is a moat)

ryandrake•5m ago

I wasn't able to have very satisfying success until I bit the bullet and threw a GPU at the problem. Found an actually reasonably priced A4000 Ada generation 20GB GPU on eBay and never looked back. I still can't run the insanely large models, but 20GB should hold me over for a while, and I didn't have to upgrade my 10 year old Ivy Bridge vintage homelab.

option•46m ago

Isn't HF banned in China? Also, how are many Chinese labs on Twitter all the time?

In either case - huge thanks to them for keeping AI open!

woadwarrior01•39m ago

HF is indeed banned in China. The Chinese equivalent of HF is ModelScope[1].

[1]: https://modelscope.cn/

disiplus•24m ago

I think in the West we think everything is blocked. But for example, if you book an eSIM, when you visit you already get direct access to Western services because they route it to some other server. Hong Kong is totally different: they basically use WhatsApp and Google Maps, and everything worked when I was there.

dragonwriter•1m ago

> Isn't HF banned in China?

I think, for some definition of “banned”, that’s the case. It doesn’t stop the Chinese labs from having organization accounts on HF and distributing models there. ModelScope is apparently the HF-equivalent for reaching Chinese users.

segmondy•31m ago

Great news! I have always worried about ggml and long term prospect for them and wished for them to be rewarded for their effort.

Show HN: I speak 9 languages. Most apps didn't work for me

A.I. Isn't Coming for Every White-Collar Job. At Least Not Yet

CA bill restricts 3Dprinters to state-approved models to stop printing gun parts

The Gay Tech Mafia

How a Zomato "Feature" Enables Stalking – Which They Call "Working as Intended"

Show HN: MetaTrader 5 in Windows via Docker and QEMU/KVM with a REST API

Ask HN: Anyone here working harder than they ever have before?

How I built a minimal-knowledge sync for WorkLedger

Banish: A declarative DSL embedded in Rust, for defining rule-based state machin

Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt

The New Mexico cave expanding our search for alien life

Unfavorable Semicircle

Show HN: My 7 year old makes games with AI, I made kidhubb.com to share them

Faster PlanetScale Postgres Connections with Cloudflare Hyperdrive

I Hate Fish

Time could be different on jobs

The AI breakthrough isn't intelligence – it's existence

NASA Confirms China's Three Gorges Dam Has Slowed Earth's Daily Rotation (2025)

How the Golden Gate Bridge Was Built: A 3D Animated Introduction In

Supreme Court strikes down most of Trump's tariffs in a blow to the president

repgrep – Interactive Find/Replace

CircleCI study: code throughput up, delivery down, reliability slipping

Marble Machine X – Siegfrieds Mechanisches Musikkabinett

Stack Computers (1989)

Supreme Court rules most of Trump tariffs illegal

Artisanal Code, AI, and the Right to Repair the Future

U.S. Supreme Court strikes down tariffs

Trump's global tariffs struck down by US Supreme Court

What the Spring Festival robots show about China's technological prowess [video]

Notepad++ updater could've been hijacked to install malicious binaries (fixed)