Unsloth GLM-5.2 – How to Run Locally

50•TechTechTech•1h ago

Comments

xrd•46m ago

So close! My machine with 192GB RAM + RTX 3090 24GB can almost run this. It says it needs 24GB of VRAM and 256GB of RAM for MoE offloading.

https://unsloth.ai/docs/models/glm-5.2#usage-guide

In a prior thread, someone said it would take $500k in hardware:

https://news.ycombinator.com/item?id=48629970

mgambati•42m ago

With 2 wouldn’t have good results. Ideal range for coding is at least Q8.

kibibu•34m ago

According to this very article, 4-bit dynamic is essentially lossless

cheema33•29m ago

I have the RAM, but not the VRAM. What kind of speed/tps could you expect from a 3090 with 24GBs of RAM? I am somewhat tempted to pick a GPU with 24GBs of RAM.

zuzululu•29m ago

wonder if AMD's new ai chip can run this with ease? I'm seriously consider buying it. GLM 5.2 is just shy of GPT 5.4 so I would welcome offloading any grunt work locally

I am very excited for local LLMs I think we may have GPT 5.5-xhigh level of performance for under 2000 EUR

This should put more pressure on the frontier models to avoid sitting on any fancy stuff and lower token prices as a whole.

Nothing beats a local LLM disconnected from the cloud.

Iolaum•22m ago

At full quantization GLM 5.2 may be close to GPT 5.4. But at Q2 or whatever one needs in order to run it on a pro-sumer device it will be worse.

Also I m not sure where you are getting the under 2k value. I bought a Framework desktop 128GB last year and my setup was around 2.7k. The same setup now sells for around 4.7k.

kccqzy•18m ago

The AMD 395 chip supports up to 128GB unified RAM. So still not enough even at 1-bit quant unfortunately.

benjiro29•14m ago

"GLM 5.2 is just shy of GPT 5.4"... If your running the full model. As in have 750 (FP8) to 1.5TB(FP16) of memory available.

Do not mix the benchmark results of GLM 5.2 FP16/FP8 with FP4 or FP2.

* FP4 will mean a accuracy loss of about 3%. Not noticeable but more chance for mistakes.

* FP2 ... what is what most people are able to run at home, for a "reasonable" price. Your looking at over 17% loss in accuracy.

At that point, your running at less then claude-sonnet-4.6, as the issues compound with accuracy losses. And reasonable priced is still in the ~ $5000 range (192GB + GPU 32GB active/kv cache system).

For that price your using a Codex / Claude Pro subscription for the next 4+ years with better models (by default), let alone with a FP2 GLM 5.2 version. And your looking at < 10 fps. A MacStudio with 512GB will net you 18 a 20fps+ with FP4, but ... i mean, those used to be $10.000.

Unfortunately the local hardware cost is a major issue for running large models like that.

pheggs•28m ago

I feel like the gap is closing to be able to run good enough models locally even for coding and I would assume it could make some companies a bit nervous. Am I wrong about that?

fny•27m ago

The RAM requirements are still pretty painful.

yieldcrv•9m ago

equilibrium in one or two more years on the consumer/prosumer side

think Apple M6 or M7 with a currently unforeseen denser memory style, 256gb RAM

a couple inference or cache improvements on the algorithmic side, using less ram for context windows and doubling token speed again

denser open source models, packing more experts for smaller active layers

it'll still be expensive but like $8,000 - $13,000 instead of $450,000 worth of B200s

CamouflagedKiwi•23m ago

The hardware requirements to run this locally are still very high. Seems far enough off mainstream for those companies not to be too worried yet.

cogman10•19m ago

I don't think so. I could easily see a company deciding to host and run these models for their own development. If you have a dev team of about 10 people, a one time $50k investment in an LLM server has to be pretty tempting. Unlimited tokens, decent performance, upgrade options, and potential product integrations.

For companies wanting LLMs in their products in general, I have to think going the local llm route is even more tempting. Somewhat dumb models are more than good enough for a lot of the things people are integrating LLMs into their products.

ContextMaestro – Curated Engineering Feed

Knowledge Catalog – universal context engine for agents

Google Investing in 'Backrooms' Studio A24 in AI research partnership

Citroën Ami Is an Ultra Affordable EV (2020)

AI's PR Problem

Frozen Reformer

Ask HN: How do you make the LLM generate good code?

Why AI Is a Bubble

Europe must choose between AI and climate goals, data center lobby says

AI Is Not a Tool

Robots will replace 700k delivery workers 'sooner or later' warns JD.com boss

The AI shift in cyber risk: why leaders must act now

Kya is hiring an AI/ML Engineer

Bipartite Matching Is in NC

Q.js: modern front-end framework for 2026. No build scripts unlike React et al.

Hyperbolic Discounting

Report: Kennedy Space Center not ready for era of super heavy rockets

Show HN: FastAPI Cloud is in public beta, deploy apps with FastAPI deploy

Payoff Progress of an Amortizated Loan

Daybreak

Mod Logs: Save every change, thank yourself later

Knowledge Agents: Beat Frontier Models with Better Structure

Show HN: Who's in the weights? – which people 13 language models know

PsychAdapter: Personality in LLM output via trait-language patterns, not prompts

A Source of Mysterious Repeating Radio Signals from Space Has Been Identified

The database that refused to die: How Postgres survived its own creators

The fake ABC News articles trying to sell you a scam

Vibedrop: Ephemeral Hosting for Agents

Trump Demands "?" For the "Vandalism" of a $14M Swimming Pool

Worldfall- a beautiful web novel about change and the diffusion of technology