GLM 4.5 with Claude Code

https://docs.z.ai/guides/llm/glm-4.5

184•vincirufus•23h ago

Comments

apparent•22h ago

I stopped when I got to this sentence and realized the article is written by one of the companies mentioned.

> GLM-4.5 and GLM-4.5-Air are our latest flagship models

Maybe it is great, but with a conflict of interest so obvious I can't exactly take their word for it.

JimDabell•22h ago

Z.AI is the company that created GLM and the link goes to their official documentation. It’s really weird to complain that their official documentation on their official website has a “conflict of interest”.

nicce•21h ago

Just out of curiosity, is the cost of such domain worth it or whether they were just lucky.

apparent•18h ago

The title has been changed. The original title was wildly positive, and OP has acknowledged it was inappropriate and changed it (see comments below).

My issue was with an article being posted with a title saying how amazing two things are together (making it seem like it was somehow an independent review), when it was actually just a marketing post by one of the companies.

stingraycharles•22h ago

Available on OpenRouter as well for those who want to test it: https://openrouter.ai/z-ai/glm-4.5

I would be interested to know where the claim of the “killer combination” comes from. I would also like to know who the people behind Z.ai are — I haven’t heard of them before. Their plans seem crazy cheap compared to Anthropic, especially if their models actually perform better than Opus.

ekidd•22h ago

> I would also like to know who the people behind Z.ai are — I haven’t heard of them before.

To be clear, Z.ai are the people who built GLM 4.5, so they're talking up their own product.

But to be fair, GLM 4.5 and GLM 4.5 Air are genuinely good coding models. GLM 4.5 Air costs about 10% of what Claude Sonnet does (when hosted on DeepInfra, at least), and it can perform simple coding tasks quite quickly. I haven't tested GLM 4.5 Air, but it seems to be popular as well.

If you can easily afford all the Claude Code tokens you want, then you'll probably get better results from Sonnet. But if you already know enough programming to work around any issues that arise, the GLM models are quite usable.

But you can't easily run GLM 4.5 Air quickly without professional workstation- or server-grade hardware (RTX 6000 Pro 96GB would be nice), at least not without a serious speed hit.

Still, it's a very interesting sign for the future of open coding models.

esafak•21h ago

For agentic coding I found the price difference more modest due to prompt caching, which most GLM providers on Openrouter don't offer, but Anthropic does. Look at the cache read/write columns: https://openrouter.ai/z-ai/glm-4.5

magicalhippo•19h ago

Been playing with Grok Code Fast 1 in Cline via Open Router. It supports prompt caching as far as I can tell, and it certainly is cheap. It's been quite good for the stuff I've tried. YMMV.

SparkyMcUnicorn•22h ago

When it comes to "real-world development scenarios" they claim to be closer to Sonnet 4.

This is the data for that claim: https://huggingface.co/datasets/zai-org/CC-Bench-trajectorie...

turingbook•22h ago

Actually Z.ai is a spinoff of Tsinghua University and one of the first China labs open sourcing its own large models (GLM released in 2021) . https://github.com/THUDM/GLM

throwaway314155•22h ago

It's a spinoff of the whole university?

cyp0633•21h ago

With a little search you can find it's a laboratory within the CS department of THU. It's a fairly large lab though, not those led by just one or two professors.

vincirufus•21h ago

Well I'd call them the poor person's claude code, wouldnt compare it with Opus but very close to Sonnet and Kimi

vincirufus•21h ago

update the title to not seem biased / hyped

arjie•22h ago

Okay, I'm going to try it, but why didn't you link the information on how to integrate it with Claude Code: https://docs.z.ai/scenario-example/develop-tools/claude

Chinese software always has such a design language:

- prepaid and then use credit to subscribe

- strange serif font

- that slider thing for captcha

But I'm going to try it out now.

vincirufus•21h ago

Ahh bugger I pasted the wrong link I had this one open in another tab..

tonyhart7•18h ago

I called it "chinnese chatpcha", back then chinnese chaptcha is so much harder than western counterpart

but now gchaptcha spam me with 5 different image if I missing a tiles for crossroad, so chinnese chaptcha is much better in my opinion

also there is variant that match the image based on shadow and different order of shape

its much better in my opinion because its use much more interactivity, solving western chaptcha is so much mind numbing now that they require you at least multiple image identification for crossroad,sign,cars etc

they want those self driving car are they

awestroke•18h ago

I assume both of the approaches are useless at actually stopping bots

whatevermom•17h ago

They deter newbies but this is not a problem for experienced developers.

Szpadel•18h ago

you can use any model with Claude code thanks to https://github.com/musistudio/claude-code-router

but in my testing other models do not work well, looks like prompts are either very optimized for Claude, or other models are just not great yet with such agentic environment

I was especially disappointed with grok code. it is very fast as advertised but in generating spaces and new lines in function calling until it hits max tokens. I wonder if that isn't why it gets so much tokens on openrouter.

gpt-5 just wasn't using the tools very well

I didn't tested glm yet, but with current anthropic subscription value, alternative would need to be very cheap if you consider daily use

edit: I noticed that also have very inexpensive subscription https://z.ai/subscribe, if they trained model to work well with CC this might actually be viable alternative

CuriouslyC•17h ago

You don't need claude code router to use GLM, just set the env var to the GLM url. Also, I generally advise people not to bother with claude code router, Bifrost can do the same job and it's much better software.

Szpadel•15h ago

I wasn't aware that there is an alternative

Quick glance over readme suggest it's only openai compatible but I also found HN post [1] explaining use of claude code with ollama

But anyways, claude-code-router have advantage of allowing request transformers, those are required for getting GitHub copilot as provider and grok-code limitations to messages format.

[1] https://news.ycombinator.com/item?id=45127509

sdesol•17h ago

> But in my testing, other models do not work well. It looks like prompts are either very optimized for Claude, or other models are just not great yet with such an agentic environment.

Anybody who has done any serious development with LLMs would know that prompts are not universal. The reason why Claude Code is good is because Anthropic knows Claude Sonnet is good, and that they only need to create prompts that work well with their models. They also have the ability to train their models to work with specific tools and so forth.

It really is a kind of fool's errand to try to create agents that can work well with many different models from different providers.

diggan•12h ago

> but in my testing other models do not work well, looks like prompts are either very optimized for Claude, or other models are just not great yet with such agentic environment

I think there are multiple things going on. First, models are either trained with tool calling in mind or not, the ones that don't, won't work well as agents. Secondly, each companies models are trained with the agent software in mind, and the agent software is built with their specific models in mind. Thirdly, each model responds differently to different system/user prompts, and the difference can be really stark.

I'm currently working on a tool that lets me execute the same prompts with the same environment over multiple agents. Currently I'm running Codex, Claude Code, Gemini, Qwen Code and AMP for every single change, just to see the differences in responses, and even reusing the same system prompt across all of them gives wildly different results. Not to mention how quickly the quality drops off the cliff as soon as you switch out any non-standard model for any of those CLIs. Mix-and-match models between those five tools, and it becomes clear as day that the model<>software is more interlocked than it seems.

The only project I've had success with switching out the model of, has been using GPT-OSS-120b locally with Codex, but that still required me to manually hack in support for changing the temperature, and changing the prompts Codex use a bit, to get OK results.

oceanplexian•6h ago

It’s probably not that hard to take the OSS models and fine tune them for CC. Which means with a little bit of reverse engineering and some free time you could get an open source models working perfectly with it.

Claude code router is a good first step. But you also need to MITM CC while it’s running and collect the back and forth for a while. I would do it if I had more free time, surprised someone smart hasn’t already tried.

Szpadel•5h ago

Of course it already was tried, example: https://github.com/Yuyz0112/claude-code-reverse

partyboy•14h ago

> prepaid and then use credit to subscribe

This is mainly because Chinese online payment infrastructure didn't have good support for subscriptions or auto payments (at least until relatively recently) so this pattern is the norm

rfoo•14h ago

It's more of a culture thing. People just hate the concept of "idk how much I'm going to pay let's just try this and find out later".

Also people would be confused as they expect things to be prepaid, so if you let them use the service they'd think it's a free trial or something, unless you literally put very big, clear price tag and require like triple confirmation. If not and you ask them to pay later they would perceive this as unfair deceptive tricks, and may scam you by report the loss of their credit card (!), because apparently disputing transactions in China is super hard.

tw1984•13h ago

I have been using alipay to pay for my Tencent Video monthly subscription for the past 8-9 years.

d4rkp4ttern•12h ago

For models available via Anthropic-compatible (currently Kimi-K2, GLM, Deepseek), the simplest way to use them with CC is by setting up a function in .zshrc:

https://github.com/pchalasani/claude-code-tools/tree/main?ta...

Surprised that Qwen didn’t do the same, (though I know they have their own CLI-coding agent).

steipete•22h ago

Been using that for a while, first Chinese model that works REALLY well!

Also fascinating how they solved the issue that Claude expects a 200+k token model while GLM 4.5 has 128k.

raincole•22h ago

I wonder how you justify this editorialized title, and if HN mods share your justification. The linked article has no the word "killer" in it.

I think this is why many people have concerns about AI. This group can't express neutral ideas. They have to hype about a simple official documentation page.

vincirufus•21h ago

feedback accepted got rid of the killer bits

Jcampuzano2•22h ago

Hmm with the lower context length I'm wonder how it holds up for problems requiring slightly larger context given we know most models tend to degrade fairly quickly with context length.

Maybe it's best for shorter tasks or condensed context?

I find it interesting the number of models latching onto Claude codes harness. I'm still using Cursor for work and personal but tried out open code and Claude for a bit. I just miss having the checkpoints and whatnot.

CuriouslyC•21h ago

https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/o...

saretup•18h ago

Interesting, although how hard is it to add a sorting functionality to the table?

chisleu•21h ago

I've been using GLM 4.5 and GLM 4.5 Air for a while now. The Air model is light enough to run on a macbook pro and is useful for Cline. I can run the full GLM model on my Mac Studio, but the TPS is so slow that it's only useful for chatting. So I hooked up with openrouter to try but didn't have the same success. Any of the open weight models I try with open router give sub standard results. I get better results from Qwen 3 coder 30b a3b locally than I get from Qwen 3 Coder 480b through open router.

I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.

vincirufus•21h ago

yeah I too have heard similar concerns with Open models on OpenRouter, but haven't been able to verify it, as I don't use that a lot

numlocked•21h ago

(OpenRouter COO here) We are starting to test this and verify the deployments. More to come on that front -- but long story short is that we don't have good evidence that providers are doing weird stuff that materially affects model accuracy. If you have data points to the contrary, we would love them.

We are heavily incentivized to prioritize/make transparent high-quality inference and have no incentive to offer quantized/poorly-performing alternatives. We certainly hear plenty of anecdotal reports like this, but when we dig in we generally don't see it.

An exception is when a model is first released -- for example this terrific work by artificial analysis: https://x.com/ArtificialAnlys/status/1955102409044398415

It does take providers time to learn how to run the models in a high quality way; my expectation is that the difference in quality will be (or already is) minimal over time. The large variance in that case was because GPT OSS had only been out for a couple of weeks.

For well-established models, our (admittedly limited) testing has not revealed much variance between providers in terms of quality. There is some but it's not like we see a couple of providers 'cheating' by secretly quantizing and clearly serving less intelligence versions of the model. We're going to get more systematic about it though and perhaps will uncover some surprises.

chandureddyvari•21h ago

Unsolicited advice: Why doesn’t open router provide hosting services for OSS models that guarantee non-quantised versions of the LLMs? Would be a win-win for everyone.

jatins•20h ago

In fact I thought that's what OpenRouter was hosting them all along

jjani•20h ago

Would make very little business sense at this point - currently they have an effective monopoly on routing. Hosting would just make them one provider among a few dozen. It would make the other providers less likely to offer their services through openrouter. It would come with lots of concerns that openrouter would favor routing towards their own offerings. It would be a huge distraction to their core business which is still rapidly growing. Would need massive capital investment. And another thousand reasons I haven't thought of.

indigodaddy•20h ago

So what's the deal with Chutes and all the throttling and errors. Seems like users are losing their minds over this.. at least from all the reddit threads I'm seeing

typpilol•15h ago

What's chutes?

arcanemachiner•15h ago

Cheap provider on OpenRouter:

https://openrouter.ai/provider/chutes

typpilol•5h ago

Ahh. Thanks

blitzar•17h ago

> We ... have no incentive to offer quantized/poorly-performing alternatives

However your providers do have such an incentive.

KronisLV•17h ago

> I get better results from Qwen 3 coder 30b a3b locally than I get from Qwen 3 Coder 480b through open router. I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.

This doesn't match my experience precisely, but I've definitely had cases where some of the providers had consistently worse output for the same model than others, the solution there was to figure out which ones those are and to denylist them in the UI.

As for quantized versions, you can check it for each model and provider, for example: https://openrouter.ai/qwen/qwen3-coder/providers

You can see that these providers run FP4 versions:

  * DeepInfra (Turbo)

And these providers run FP8 versions:

  * Chutes
  * GMICloud
  * NovitaAI
  * Baseten
  * Parasail
  * Nebius AI Studio
  * AtlasCloud
  * Targon
  * Together
  * Hyperbolic
  * Cerebras

I will say that it's not all bad and my experience with FP8 output has been pretty decent, especially when I need something done quickly and choose to use Cerebras - provided their service isn't overloaded, their TPS is really, really good.

You can also request specific precision on a per request basis: https://openrouter.ai/docs/features/provider-routing#quantiz... (or just make a custom preset)

snthpy•16h ago

Interesting. Thanks for sharing. What about qwen3-coder on Cerebras? I'm happy to pay the $50 for the speed as long as results are good. How does it compare with glm-4.5?

KronisLV•14h ago

I wish that Cerebras had a direct pay per use API option instead of pushing you towards OpenRouter and HuggingFace (the former sometimes throws 429, so either the speed is great, or there is no speed): https://www.cerebras.ai/pricing but I imagine that for most folks their subscription would be more than enough!

As for how Qwen3 Coder performs, there's always SWE-bench: https://www.swebench.com/

By the numbers:

  * it sits between Gemini 2.5 Pro and GPT-5 mini
  * it beats out Kimi K2 and the older Claude Sonnet 3.7
  * but loses out to Claude Sonnet 4 and GPT-5

Personally, I find it sufficient for most tasks (from recommendations and questions to as close to vibe coding as I get) on a technical level. GLM 4.5 isn't on the site at the time of writing this, but they should match one another pretty closely. Feeling wise, I still very much prefer Sonnet 4 to everything else, but it's both expensive and way slower than Cerebras (not even close).

Update: also seems like the Growth plan on their page says "Starting from 1500 USD / month" which is a bit silly when the new cheapest subscription is 50 USD / month.

jbellis•10h ago

Quantization matters a lot more than r/locallama wants to believe. Here's Qwen3 Coder vs Qwen3 Coder @fp8: https://brokk.ai/power-ranking?version=openround-2025-08-20&...

sergiotapia•21h ago

Used it to fix a couple of bugs just now in Elixir and it runs very fast, faster than Codex with GPT-5 medium or high.

This is quite nice. Will try it out a bit longer over the weekend. I tested it using Claude Code with env variables overrides.

cpursley•13h ago

Good the hear it’s competent with Elixir! How would you compare it to Claude?

abrookewood•20h ago

So you can use Claude Code with other models? I had assumed that it was tied to your subscription and that was that.

adastra22•18h ago

It is, but people figure out the Claude Code API and provide API compatible endpoints.

sagarpatil•20h ago

I was blown away by this model. It was definitely comparable to sonnet 4. In some of my tests, it performed as good as Opus. I subscribed to their paid plan, and now the model seems dumb? I asked it to find and replace a string. It only made the change in one file. Codex worked fine. Can Z.ai confirm if this is the model we get through their API or is it quantized for Claude Code use?

KronisLV•17h ago

This is really cool and should work well with something like RooCode as well. Usually I keep going back to either Claude Sonnet or Gemini 2.5 Pro (also tried out GPT-5, was quite unimpressed) but both of those are relatively expensive.

I've tried using the more expensive model for planning and something a bit cheaper for doing the bulk of changes (the Plan / Ask and Code modes in RooCode) which works pretty nicely, but settling on just one model like GLM 4.5 would be lovely! Closest to that I've gotten to up until now has been the Qwen3 Coder model on OpenRouter.

I think I used about 40M tokens with Claude Sonnet last month, more on Gemini and others, that's a bit expensive for my liking.

jedisct1•17h ago

Not just Claude Code. Their plans $3 and $15 plans work even better with tools like Roo Code.

After Claude models have recently become dumb, I switched to Qwen3-Coder (there's a very generous free tier) and GLM4.5, and I'm not looking back.

faangguyindia•14h ago

How do you use qwen3 coder free tier?

runze•14h ago

https://x.com/Alibaba_Qwen/status/1953835877555151134

peepee1982•14h ago

You can install Qwen Code (the Gemini CLI fork) and use OAuth for authentication. That will give you 2000 free requests per day, no token limits.

RooCode can use OAuth as well (but you have to install Qwen Code first).

unsupp0rted•16h ago

Anthropic can't compete with this on cost. They're probably bleeding money as it is.

But they can sort of compete on model quality, by no longer dumbing down their models. That'll be expensive too, but it's a lever they have.

aurareturn•13h ago

One of the articles posted here a week ago says Claude Code has about a ~20x margin on inference. So they can compete on cost if they want.

unsupp0rted•12h ago

Is Claude Code enterprise usage keeping the rest of the company afloat and able to keep raising?

prmph•15h ago

It's weird, but using Claude Code (CC) with GLM 4.5 from Z.ai, I spent $5 just starting CC and asking it to read the spec and guideline files. This on an API that advertises $0.6 per million tokens input, and $2.2 per million tokens output.

Then, I noticed that after the original prompt, any further prompt I gave CC (other that approving its actions) did not have any effect at all on it. Yes, any prompt is totally ignored. I have to stop CC and restart with the new prompt for it to take effect.

This is really strange to me. Does CC have some way of detecting when it is running against an API other than Anthropic's? The massive cost and/or token usage and crippled agentic mode is clearly not due to GLM 4.5 not being capable. Either CC is crippled in some way when using a 3rd party API, or else Z.ai's API is somehow not working properly.

conradev•12h ago

The graphs that some of these companies make are brutal: https://docs.z.ai/guides/llm/glm-4.5#higher-parameter-effici...

"So, we only know the Y-axis for some models in the scatter plot. Let's make up an X-axis value on the bad side of the graph and include the data points anyway."

Visually disingenuous!

croemer•12h ago

Which x axis was made up? You mean showing barplots in descending order of scores?

conradev•5h ago

No, the scatter plot on parameter efficiency!

“Model Parameters (B)” has an “Unknown” column to the right of 1000, implying those models are 1200B or more parameters, which is just not the case.

It renders the comparison useless, they can really only claim best efficiency against other open weight models.

saretup•11h ago

Surprised to see no one here is worried about privacy. Z.ai's privacy policy pretty much allows them to store and use prompts/code in a perpetual, irrevocable, worldwide, sublicensable license to use and train their models.

Alifatisk•11h ago

I have already assumed that on every chat service I use, they store and train on my prompts either way, since I use the service for free. If you actually want privacy, host it yourself or look for a llm service with another business-model

Alifatisk•11h ago

Something I noted is that Z.ai is the first chat app that does not ask for feedback on each response. Don't they want it?

mehdibl•10h ago

CC system prompt is tuned for Sonnet 4/Opus

Why trying CC why not using alternatives like Opencode or Goose that are open and made for this purpose instead of trying to use a tool tailored by Anthropic for Anthropic models?

anuramat•9h ago

I guess "claude code but 20x cheaper" sells better; you can still use their subscription with other agents, just use their anthropic-compatible API

exographicskip•8h ago

Was thinking the same. opencode seems like a better dx for this kind of thing

hamdingers•5h ago

I have all the CC tokens I want at work and like the tool, but can't justify the cost at home for personal projects.

I've been using it with DeepSeek to great effect, and just took GLM 4.5 for a Saturday morning test drive and early results are even better.

SingAlong•7h ago

Been using this for two days now. Notes below:

* This is an excellent alternative to Sonnet - which was my daily driver. I'm glad I tried GLM 4.5. You won't find any difference.

* My context usage per session was about 50% with Sonnet earlier, but it fills up fast with GLM, and I hit 80-90% often. Could be the lower context size that is hurting.

* Sonnet used to be very painful to work with as the context size goes beyond 80% (hence my habit of shorter conversations). GLM holds itself well even until the last bit of remaining context. Does not deviate from the assigned task.

mkw2000•5h ago

Is it just me or is the documentation for setting this up extremely unhelpful?

Utah's hottest new power source is 15k feet below the ground

How the "Kim" dump exposed North Korea's credential theft playbook

A Navajo weaving of an integrated circuit: the 555 timer

Shipping textures as PNGs is suboptimal

I'm Making a Beautiful, Aesthetic and Open-Source Platform for Learning Japanese

C++26: Erroneous Behaviour

Troubleshooting ZFS – Common Issues and How to Fix Them

A history of metaphorical brain talk in psychiatry

Over 80% of Sunscreen Performed Below Their Labelled Efficacy (2020)

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

We hacked Burger King: How auth bypass led to drive-thru audio surveillance

The maths you need to start understanding LLMs

Oldest recorded transaction

What to Do with an Old iPad

Anonymous recursive functions in Racket

Stop writing CLI validation. Parse it right the first time

Using Claude Code SDK to reduce E2E test time

Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul

GigaByte CXL memory expansion card with up to 512GB DRAM

Microsoft Azure: "Multiple international subsea cables were cut in the Red Sea"

Why language models hallucinate

Processing Piano Tutorial Videos in the Browser

Gloria funicular derailment initial findings report (EN) [pdf]

AI surveillance should be banned while there is still time

Baby's first type checker

Qantas is cutting executive bonuses after data breach

William James at CERN (1995)

Rug pulls, forks, and open-source feudalism

Rust tool for generating random fractals

Europe enters the exascale supercomputing league with Jupiter

Utah's hottest new power source is 15k feet below the ground

How the "Kim" dump exposed North Korea's credential theft playbook

A Navajo weaving of an integrated circuit: the 555 timer

Shipping textures as PNGs is suboptimal

I'm Making a Beautiful, Aesthetic and Open-Source Platform for Learning Japanese

C++26: Erroneous Behaviour

Troubleshooting ZFS – Common Issues and How to Fix Them

A history of metaphorical brain talk in psychiatry

Over 80% of Sunscreen Performed Below Their Labelled Efficacy (2020)

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

We hacked Burger King: How auth bypass led to drive-thru audio surveillance

The maths you need to start understanding LLMs

Oldest recorded transaction

What to Do with an Old iPad

Anonymous recursive functions in Racket

Stop writing CLI validation. Parse it right the first time

Using Claude Code SDK to reduce E2E test time

Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul

GigaByte CXL memory expansion card with up to 512GB DRAM

Microsoft Azure: "Multiple international subsea cables were cut in the Red Sea"

Why language models hallucinate

Processing Piano Tutorial Videos in the Browser

Gloria funicular derailment initial findings report (EN) [pdf]

AI surveillance should be banned while there is still time

Baby's first type checker

Qantas is cutting executive bonuses after data breach

William James at CERN (1995)

Rug pulls, forks, and open-source feudalism

Rust tool for generating random fractals

Europe enters the exascale supercomputing league with Jupiter

GLM 4.5 with Claude Code

Comments