frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Shift more left with coding agents

https://gricha.dev/blog/shift-more-left-with-coding-agents
1•Gricha•36s ago•0 comments

The BoardGameGeek Hall of Fame 2026 – Day 1 Inductee

https://boardgamegeek.com/blog/1/blogpost/182402/the-boardgamegeek-hall-of-fame-2026-day-1-inductee
1•Tomte•43s ago•0 comments

Running a vLLM LXC on Proxmox 9 with Nvidia GPU Passthrough

https://medium.com/@jakeasmith/running-a-vllm-lxc-on-proxmox-9-f7fbb8a7db2f
1•jakeasmith•2m ago•1 comments

IntelliJ IDEA: The Documentary, trailer [video]

https://www.youtube.com/watch?v=TwMXi6tDzLE
1•dmitrijbelikov•2m ago•0 comments

GameTank: Open-source, 8-bit console live on CrowdSupply

https://www.crowdsupply.com/clydeware/gametank
1•nick_g•2m ago•0 comments

An introduction to XET, Hugging Face's storage system

https://00f.net/2026/01/19/xet-intro-1/
1•zdw•3m ago•0 comments

IMF warns global economic resilience at risk if AI falters

https://www.ft.com/content/2af4d92a-452c-4d35-ab55-3afce930f98a
2•ptrhvns•4m ago•0 comments

Show HN: Picker – a simple macOS app to help you choose when you can't decide

1•gnoky•6m ago•0 comments

QMD – Quick Markdown Search

https://github.com/tobi/qmd
1•kenrose•7m ago•1 comments

Fix Your Robots.txt or Your Site Disappears from Google

https://www.alanwsmith.com/en/37/wa/jz/s1/
1•bobbiechen•7m ago•0 comments

Mozilla's Firefox Nightly .rpm package for RPM based Linux distros

https://blog.nightly.mozilla.org/2026/01/19/introducing-mozillas-firefox-nightly-rpm-package-for-...
1•pascalchevrel•8m ago•1 comments

Drawing Truchet Tiles in SVG

https://alexwlchan.net/2025/truchet-tiles/
1•surprisetalk•9m ago•0 comments

Freesound

https://freesound.org/
1•surprisetalk•9m ago•1 comments

Preserved Fish, Boss of New York City

https://signoregalilei.com/2025/12/21/preserved-fish-boss-of-new-york-city/
2•surprisetalk•9m ago•0 comments

Show HN: Antigravity-usage – CLI to check your AI quota without opening your IDE

https://github.com/skainguyen1412/antigravity-usage
2•skainguyen1412•12m ago•0 comments

What it's like to be banned from the US for fighting online hate

https://www.technologyreview.com/2026/01/19/1131384/what-its-like-to-be-banned-from-the-us-for-fi...
3•HotGarbage•13m ago•0 comments

Bootstrapping Bun

https://walters.app/blog/bootstrapping-bun
1•zerf•15m ago•0 comments

Motorola, Intel, IBM Make a Mainframe in a PC – The PC XT/370

https://thechipletter.substack.com/p/motorola-intel-ibm-make-a-mainframe
2•rbanffy•15m ago•0 comments

San Francisco and Richmond Fed Presidents on What's Happening in the Economy

https://kyla.substack.com/p/san-francisco-fed-president-mary
1•mooreds•16m ago•0 comments

Things I miss from professional networking

https://thehumansource.com/
1•salbertengo•16m ago•0 comments

I'm Going to Dig a Hole

https://randsinrepose.com/archives/im-going-to-dig-a-hole/
1•mooreds•17m ago•0 comments

AskSary – All-in-One AI Platform with GPT-5.2, Grok, and Coding Canvas

https://www.asksary.com
1•sarymismail•17m ago•0 comments

Blobject

https://en.wikipedia.org/wiki/Blobject
1•themaxdavitt•18m ago•0 comments

Show HN: Multi-Cloud Data Migration Platform

https://dataraven.io/
1•coreylane•18m ago•0 comments

What Happens When Users Hit Your Postgres at Once

https://engrlog.substack.com/p/what-happens-when-thousands-of-users
1•tirtha•21m ago•0 comments

Clockwork: Runtime agnostic async executor with powerful configurable scheduling

https://github.com/nikhilgarg28/clockwork
1•tanelpoder•22m ago•0 comments

College Football Teams Are Now Worth Billions–and Their Values Are Skyrocketing

https://www.wsj.com/sports/football/college-football-team-value-16fd09bb
2•bookofjoe•22m ago•1 comments

Project Cybersyn

https://en.wikipedia.org/wiki/Project_Cybersyn
2•cromulent•23m ago•1 comments

Cows Can Use Sophisticated Tools

https://nautil.us/the-far-side-had-it-all-wrong-cows-really-can-use-sophisticated-tools-1262026/
3•Tomte•23m ago•0 comments

Mark Nelsen's Weather Page

https://www.marknelsenweather.com/
1•oregoncurtis•26m ago•0 comments
Open in hackernews

GLM-4.7-Flash

https://huggingface.co/zai-org/GLM-4.7-Flash
155•scrlk•1h ago

Comments

epolanski•1h ago
Any cloud vendor offering this model? I would like to try it.
xena•1h ago
The model literally came out less than a couple hours ago, it's going to take people a while in order to tool it for their inference platforms.
idiliv•1h ago
Sometimes model developers coordinate with inference platforms to time releases in sync.
PhilippGille•1h ago
z.ai itself, or Novita fow now, but others will follow soon probably

https://openrouter.ai/z-ai/glm-4.7-flash/providers

epolanski•1h ago
Interesting, it costs less than a tenth than Haiku.
saratogacx•41m ago
GLM itself is quite inexpensive. A year sub to their coding plan is only $29 and works with a bunch of various tools. I use it heavily as a "I don't want to spend my anthropic credits" day-to-day model (mostly using Crush)
dvs13•1h ago
https://huggingface.co/inference/models?model=zai-org%2FGLM-... :)
latchkey•38m ago
We don't have lot of GPUs available right now, but it is not crazy hard to get it running on our MI300x. Depending on your quant, you probably want a 4x.

ssh admin.hotaisle.app

Yes, this should be made easier to just get a VM with it pre-installed. Working on that.

omneity•34m ago
Unless using docker, if vllm is not provided and built against ROCm dependencies it’s going to be time consuming.

It took me quite some time to figure the magic combination of versions and commits, and to build each dependency successfully to run on an MI325x.

latchkey•27m ago
Agreed, the OOB experience kind of suck.

Here is the magic (assuming a 4x)...

  docker run -it --rm \
  --pull=always \
  --ipc=host \
  --network=host \
  --privileged \
  --cap-add=CAP_SYS_ADMIN \
  --device=/dev/kfd \
  --device=/dev/dri \
  --device=/dev/mem \
  --group-add render \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  -v /home/hotaisle:/mnt/data \
  -v /root/.cache:/mnt/model \
  rocm/vllm-dev:nightly
  
  mv /root/.cache /root/.cache.foo
  ln -s /mnt/model /root/.cache
  
  VLLM_ROCM_USE_AITER=1 vllm serve zai-org/GLM-4.7-FP8 \
  --tensor-parallel-size 4 \
  --kv-cache-dtype fp8 \
  --quantization fp8 \
  --enable-auto-tool-choice \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --load-format fastsafetensors \
  --enable-expert-parallel \
  --allowed-local-media-path / \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1 \
  --mm-encoder-tp-mode data
karmakaze•1h ago
Not much info than being a 31B model. Here's info on GLM-4.7[0] in general.

I suppose Flash is merely a distillation of that. Filed under mildly interesting for now.

[0] https://z.ai/blog/glm-4.7

lordofgibbons•54m ago
How interesting it is depends purely on your use-case. For me this is the perfect size for running fine-tuning experiments.
redrove•51m ago
A3.9B MoE apparently
XCSme•1h ago
Seems to be marginally better than gpt-20b, but this is 30b?
strangescript•1h ago
I find gpt-oss 20b very benchmaxxed and as soon as a solution isn't clear it will hallucinate.
blurbleblurble•32m ago
Every time I've tried to actually use gpt-oss 20b it's just gotten stuck in weird feedback loops reminiscent of the time when HAL got shut down back in the year 2001. And these are very simple tests e.g. I try and get it to check today's date from the time tool to get more recent search results from the arxiv tool.
lostmsu•1h ago
It actually seems worse. gpt-20b is only 11 GB because it is prequantized in mxfp4. GLM-4.7-Flash is 62 GB. In that sense GLM is closer to and actually is slightly larger than gpt-120b which is 59 GB.

Also, according to the gpt-oss model card 20b is 60.7 (GLM claims they got 34 for that model) and 120b is 62.7 on SWE-Bench Verified vs GLM reports 59.7

vessenes•1h ago
Looks like solid incremental improvements. The UI oneshot demos are a big improvement over 4.6. Open models continue to lag roughly a year on benchmarks; pretty exciting over the long term. As always, GLM is really big - 355B parameters with 31B active, so it’s a tough one to self-host. It’s a good candidate for a cerebras endpoint in my mind - getting sonnet 4.x (x<5) quality with ultra low latency seems appealing.
mckirk•1h ago
Note that this is the Flash variant, which is only 31B parameters in total.

And yet, in terms of coding performance (at least as measured by SWE-Bench Verified), it seems to be roughly on par with o3/GPT-5 mini, which would be pretty impressive if it translated to real-world usage, for something you can realistically run at home.

HumanOstrich•34m ago
I tried Cerebras with GLM-4.7 (not Flash) yesterday using paid API credits ($10). They have rate limits per-minute and it will get eaten up in the first few seconds - then you have to wait the rest of the minute. Every minute, until your task is done. So they're "fast" at 1000 tok/sec - but not really for practical usage. You effectively get about 17 tok/sec with rate limits.

They also charge for cached tokens, so I burned through $4 for 1 relatively simple coding task - would've cost <$1 using GPT-5.2-Codex or any other model besides Opus and maybe Sonnet that supports caching. And it would've been much faster.

behnamoh•5m ago
> The UI oneshot demos are a big improvement over 4.6.

This is a terrible "test" of model quality. All these models fail when your UI is out of distribution; Codex gets close but still fails.

twelvechess•1h ago
Excited to test this out. We need a SOTA 8B model bad though!
cipehr•56m ago
Is essentialai/rnj-1 not the latest attempt at that?

https://huggingface.co/EssentialAI/rnj-1

dfajgljsldkjag•51m ago
Interesting they are releasing a tiny (30B) variant, unlike the 4.5-air distill which was 106B parameters. It must be competing with gpt mini and nano models, which personally I have found to be pretty weak. But this could be perfect for local LLM use cases.

In my ime small tier models are good for simple tasks like translation and trivia answering, but are useless for anything more complex. 70B class and above is where models really start to shine.

dajonker•45m ago
Great, I've been experimenting with OpenCode and running local 30B-A3B models on llama.cpp (4 bit) on a 32 GB GPU so there's plenty of VRAM left for 128k context. So far Qwen3-coder gives the me best results. Nemotron 3 Nano is supposed to benchmark better but it doesn't really show for the kind of work I throw at it, mostly "write tests for this and that method which are not covered yet". Will give this a try once someone has quantized it in ~4 bit GGUF.

Codex is notably higher quality but also has me waiting forever. Hopefully these small models get better and better, not just at benchmarks.

latchkey•41m ago
https://huggingface.co/unsloth/GLM-4.7-GGUF

This user has also done a bunch of good quants:

https://huggingface.co/0xSero

dajonker•27m ago
Yes I usually run Unsloth models, however you are linking to the big model now, which I can't run on my consumer hardware.
latchkey•24m ago
There are a bunch of 4bit quants in the GGUF link and the 0xSero has some smaller stuff too. Might still be too big and you'll need to ungpu poor yourself.
disiplus•20m ago
yeah there is no way to run 4.7 on a 32g vram this flash is something that im also waiting to try later tonight
behnamoh•6m ago
> Codex is notably higher quality but also has me waiting forever.

And while it usually leads to higher quality output, sometimes it doesn't, and I'm left with a bs AI slop that would have taken Opus just a couple of minutes to generate anyway.

eurekin•39m ago
I'm trying to run it, but getting odd errors. Has anybody managed to run it and can share the command?
bilsbie•34m ago
What’s the significance of this for someone out of the loop?
epolanski•20m ago
You can run gpt 5 mini level ai on your MacBook with 32 gb ram.
baranmelik•11m ago
For anyone who’s already running this locally: what’s the simplest setup right now (tooling + quant format)? If you have a working command, would love to see it.