GLM 4.5 with Claude Code

https://docs.z.ai/guides/llm/glm-4.5

32•vincirufus•2h ago

Comments

apparent•2h ago

I stopped when I got to this sentence and realized the article is written by one of the companies mentioned.

> GLM-4.5 and GLM-4.5-Air are our latest flagship models

Maybe it is great, but with a conflict of interest so obvious I can't exactly take their word for it.

JimDabell•2h ago

Z.AI is the company that created GLM and the link goes to their official documentation. It’s really weird to complain that their official documentation on their official website has a “conflict of interest”.

nicce•50m ago

Just out of curiosity, is the cost of such domain worth it or whether they were just lucky.

stingraycharles•2h ago

Available on OpenRouter as well for those who want to test it: https://openrouter.ai/z-ai/glm-4.5

I would be interested to know where the claim of the “killer combination” comes from. I would also like to know who the people behind Z.ai are — I haven’t heard of them before. Their plans seem crazy cheap compared to Anthropic, especially if their models actually perform better than Opus.

ekidd•1h ago

> I would also like to know who the people behind Z.ai are — I haven’t heard of them before.

To be clear, Z.ai are the people who built GLM 4.5, so they're talking up their own product.

But to be fair, GLM 4.5 and GLM 4.5 Air are genuinely good coding models. GLM 4.5 Air costs about 10% of what Claude Sonnet does (when hosted on DeepInfra, at least), and it can perform simple coding tasks quite quickly. I haven't tested GLM 4.5 Air, but it seems to be popular as well.

If you can easily afford all the Claude Code tokens you want, then you'll probably get better results from Sonnet. But if you already know enough programming to work around any issues that arise, the GLM models are quite usable.

But you can't easily run GLM 4.5 Air quickly without professional workstation- or server-grade hardware (RTX 6000 Pro 96GB would be nice), at least not without a serious speed hit.

Still, it's a very interesting sign for the future of open coding models.

esafak•1h ago

For agentic coding I found the price difference more modest due to prompt caching, which most GLM providers on Openrouter don't offer, but Anthropic does. Look at the cache read/write columns: https://openrouter.ai/z-ai/glm-4.5

SparkyMcUnicorn•1h ago

When it comes to "real-world development scenarios" they claim to be closer to Sonnet 4.

This is the data for that claim: https://huggingface.co/datasets/zai-org/CC-Bench-trajectorie...

turingbook•1h ago

Actually Z.ai is a spinoff of Tsinghua University and one of the first China labs open sourcing its own large models (GLM released in 2021) . https://github.com/THUDM/GLM

throwaway314155•1h ago

It's a spinoff of the whole university?

cyp0633•1h ago

With a little search you can find it's a laboratory within the CS department of THU. It's a fairly large lab though, not those led by just one or two professors.

vincirufus•1h ago

Well I'd call them the poor person's claude code, wouldnt compare it with Opus but very close to Sonnet and Kimi

vincirufus•1h ago

update the title to not seem biased / hyped

arjie•2h ago

Okay, I'm going to try it, but why didn't you link the information on how to integrate it with Claude Code: https://docs.z.ai/scenario-example/develop-tools/claude

Chinese software always has such a design language:

- prepaid and then use credit to subscribe

- strange serif font

- that slider thing for captcha

But I'm going to try it out now.

vincirufus•1h ago

Ahh bugger I pasted the wrong link I had this one open in another tab..

steipete•1h ago

Been using that for a while, first Chinese model that works REALLY well!

Also fascinating how they solved the issue that Claude expects a 200+k token model while GLM 4.5 has 128k.

raincole•1h ago

I wonder how you justify this editorialized title, and if HN mods share your justification. The linked article has no the word "killer" in it.

I think this is why many people have concerns about AI. This group can't express neutral ideas. They have to hype about a simple official documentation page.

vincirufus•1h ago

feedback accepted got rid of the killer bits

Jcampuzano2•1h ago

Hmm with the lower context length I'm wonder how it holds up for problems requiring slightly larger context given we know most models tend to degrade fairly quickly with context length.

Maybe it's best for shorter tasks or condensed context?

I find it interesting the number of models latching onto Claude codes harness. I'm still using Cursor for work and personal but tried out open code and Claude for a bit. I just miss having the checkpoints and whatnot.

CuriouslyC•1h ago

https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/o...

chisleu•1h ago

I've been using GLM 4.5 and GLM 4.5 Air for a while now. The Air model is light enough to run on a macbook pro and is useful for Cline. I can run the full GLM model on my Mac Studio, but the TPS is so slow that it's only useful for chatting. So I hooked up with openrouter to try but didn't have the same success. Any of the open weight models I try with open router give sub standard results. I get better results from Qwen 3 coder 30b a3b locally than I get from Qwen 3 Coder 480b through open router.

I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.

vincirufus•1h ago

yeah I too have heard similar concerns with Open models on OpenRouter, but haven't been able to verify it, as I don't use that a lot

numlocked•1h ago

(OpenRouter COO here) We are starting to test this and verify the deployments. More to come on that front -- but long story short is that we don't have good evidence that providers are doing weird stuff that materially affects model accuracy. If you have data points to the contrary, we would love them.

We are heavily incentivized to prioritize/make transparent high-quality inference and have no incentive to offer quantized/poorly-performing alternatives. We certainly hear plenty of anecdotal reports like this, but when we dig in we generally don't see it.

An exception is when a model is first released -- for example this terrific work by artificial analysis: https://x.com/ArtificialAnlys/status/1955102409044398415

It does take providers time to learn how to run the models in a high quality way; my expectation is that the difference in quality will be (or already is) minimal over time. The large variance in that case was because GPT OSS had only been out for a couple of weeks.

For well-established models, our (admittedly limited) testing has not revealed much variance between providers in terms of quality. There is some but it's not like we see a couple of providers 'cheating' by secretly quantizing and clearly serving less intelligence versions of the model. We're going to get more systematic about it though and perhaps will uncover some surprises.

chandureddyvari•41m ago

Unsolicited advice: Why doesn’t open router provide hosting services for OSS models that guarantee non-quantised versions of the LLMs? Would be a win-win for everyone.

jatins•27m ago

In fact I thought that's what OpenRouter was hosting them all along

sergiotapia•53m ago

Used it to fix a couple of bugs just now in Elixir and it runs very fast, faster than Codex with GPT-5 medium or high.

This is quite nice. Will try it out a bit longer over the weekend. I tested it using Claude Code with env variables overrides.

How to Blow Up a Planet

Show HN: Random Meme Generator – A Fun, Lightweight Web App

Chemical Breakthrough Could Solve Our Plastic Waste Problem

GOP Cries Censorship over Spam Filters That Work

Silksong brings Steam to its knees,attracts more day1 players than BF6 open beta

Analog vs. Digital: The Race Is on to Simulate Our Quantum Universe

Exclusive and adaptive therapeutic music for psychedelic therapy

Zero-Click Remote Code Execution: Exploiting MCP and Agentic IDEs

Delete X Tweets – Remove Twitter Posts

Why Forums Died (and what makes them thrive)

Ask HN: What are some great use cases for Google's new NanoBanana?

MonoGo – .NET 8 C# 2D game engine build ontop of MonoGame

Things Every Hacker Once Knew (2017)

How Apple's face ID works [video]

AI and the Rise of Techno-Fascism in the United States

Transforming access to the brain, without drilling through the skull

Physicists create a time crystal that humans can see

At least 475 workers detained in major ICE raid at US Hyundai factory

Evals are a scam. And we're being gaslit into believing they aren't

Build a home thermostat with a Raspberry Pi

Two Valuable Satellites Are in 'Perfect Health.' They May Be Scrapped

U of Oregon student who reported privacy lapse placed under investigation

Show HN: A Chrome extension to export structured data from any website using AI

Kevin Barry, original developer of Nova Launcher, has left the project

Norman Mingo

A noise attack on license plate readers (Flock AI)

Featured Article: Final Fantasy VII

AI Datacenters Eat the World [video]

Comment by AlgoBasket (YouTube), MS Windows, SSD: "Errors" "all storage related" [video]

Microsoft Releases Historic 6502 Basic

How to Blow Up a Planet

Show HN: Random Meme Generator – A Fun, Lightweight Web App

Chemical Breakthrough Could Solve Our Plastic Waste Problem

GOP Cries Censorship over Spam Filters That Work

Silksong brings Steam to its knees,attracts more day1 players than BF6 open beta

Analog vs. Digital: The Race Is on to Simulate Our Quantum Universe

Exclusive and adaptive therapeutic music for psychedelic therapy

Zero-Click Remote Code Execution: Exploiting MCP and Agentic IDEs

Delete X Tweets – Remove Twitter Posts

Why Forums Died (and what makes them thrive)

Ask HN: What are some great use cases for Google's new NanoBanana?

MonoGo – .NET 8 C# 2D game engine build ontop of MonoGame

Things Every Hacker Once Knew (2017)

How Apple's face ID works [video]

AI and the Rise of Techno-Fascism in the United States

Transforming access to the brain, without drilling through the skull

Physicists create a time crystal that humans can see

At least 475 workers detained in major ICE raid at US Hyundai factory

Evals are a scam. And we're being gaslit into believing they aren't

Build a home thermostat with a Raspberry Pi

Two Valuable Satellites Are in 'Perfect Health.' They May Be Scrapped

U of Oregon student who reported privacy lapse placed under investigation

Show HN: A Chrome extension to export structured data from any website using AI

Kevin Barry, original developer of Nova Launcher, has left the project

Norman Mingo

A noise attack on license plate readers (Flock AI)

Featured Article: Final Fantasy VII

AI Datacenters Eat the World [video]

Comment by AlgoBasket (YouTube), MS Windows, SSD: "Errors" "all storage related" [video]

Microsoft Releases Historic 6502 Basic

GLM 4.5 with Claude Code

Comments