I suppose Flash is merely a distillation of that. Filed under mildly interesting for now.
Also, according to the gpt-oss model card 20b is 60.7 (GLM claims they got 34 for that model) and 120b is 62.7 on SWE-Bench Verified vs GLM reports 59.7
And yet, in terms of coding performance (at least as measured by SWE-Bench Verified), it seems to be roughly on par with o3/GPT-5 mini, which would be pretty impressive if it translated to real-world usage, for something you can realistically run at home.
They also charge for cached tokens, so I burned through $4 for 1 relatively simple coding task - would've cost <$1 using GPT-5.2-Codex or any other model besides Opus and maybe Sonnet that supports caching. And it would've been much faster.
This is a terrible "test" of model quality. All these models fail when your UI is out of distribution; Codex gets close but still fails.
In my ime small tier models are good for simple tasks like translation and trivia answering, but are useless for anything more complex. 70B class and above is where models really start to shine.
Codex is notably higher quality but also has me waiting forever. Hopefully these small models get better and better, not just at benchmarks.
This user has also done a bunch of good quants:
And while it usually leads to higher quality output, sometimes it doesn't, and I'm left with a bs AI slop that would have taken Opus just a couple of minutes to generate anyway.
epolanski•1h ago
xena•1h ago
idiliv•1h ago
PhilippGille•1h ago
https://openrouter.ai/z-ai/glm-4.7-flash/providers
epolanski•1h ago
saratogacx•41m ago
dvs13•1h ago
latchkey•38m ago
ssh admin.hotaisle.app
Yes, this should be made easier to just get a VM with it pre-installed. Working on that.
omneity•34m ago
It took me quite some time to figure the magic combination of versions and commits, and to build each dependency successfully to run on an MI325x.
latchkey•27m ago
Here is the magic (assuming a 4x)...