frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts

https://github.com/Zyora-Dev/zse
22•zyoralabs•3h ago
I've been building ZSE (Z Server Engine) for the past few weeks — an open-source LLM inference engine focused on two things nobody has fully solved together: memory efficiency and fast cold starts.

The problem I was trying to solve: Running a 32B model normally requires ~64 GB VRAM. Most developers don't have that. And even when quantization helps with memory, cold starts with bitsandbytes NF4 take 2+ minutes on first load and 45–120 seconds on warm restarts — which kills serverless and autoscaling use cases.

What ZSE does differently:

Fits 32B in 19.3 GB VRAM (70% reduction vs FP16) — runs on a single A100-40GB

Fits 7B in 5.2 GB VRAM (63% reduction) — runs on consumer GPUs

Native .zse pre-quantized format with memory-mapped weights: 3.9s cold start for 7B, 21.4s for 32B — vs 45s and 120s with bitsandbytes, ~30s for vLLM

All benchmarks verified on Modal A100-80GB (Feb 2026)

It ships with:

OpenAI-compatible API server (drop-in replacement)

Interactive CLI (zse serve, zse chat, zse convert, zse hardware)

Web dashboard with real-time GPU monitoring

Continuous batching (3.45× throughput)

GGUF support via llama.cpp

CPU fallback — works without a GPU

Rate limiting, audit logging, API key auth

Install:

----- pip install zllm-zse zse serve Qwen/Qwen2.5-7B-Instruct For fast cold starts (one-time conversion):

----- zse convert Qwen/Qwen2.5-Coder-7B-Instruct -o qwen-7b.zse zse serve qwen-7b.zse # 3.9s every time

The cold start improvement comes from the .zse format storing pre-quantized weights as memory-mapped safetensors — no quantization step at load time, no weight conversion, just mmap + GPU transfer. On NVMe SSDs this gets under 4 seconds for 7B. On spinning HDDs it'll be slower.

All code is real — no mock implementations. Built at Zyora Labs. Apache 2.0.

Happy to answer questions about the quantization approach, the .zse format design, or the memory efficiency techniques.

Comments

medi_naseri•1h ago
This is so freaking awesome, I am working on a project trying run 10 models on two GPUs, loading/off loading is the only solution I have in mind.

Will try getting this deployed.

Does cold start timings advertised for a condition where there is no other model loaded on GPUs?

Google API keys weren't secrets, but then Gemini changed the rules

https://trufflesecurity.com/blog/google-api-keys-werent-secrets-but-then-gemini-changed-the-rules
81•hiisthisthingon•8h ago•12 comments

Jimi Hendrix was a systems engineer

https://spectrum.ieee.org/jimi-hendrix-systems-engineer
369•tintinnabula•8h ago•125 comments

First Website (1992)

https://info.cern.ch
145•shrikaranhanda•5h ago•28 comments

RAM now represents 35 percent of bill of materials for HP PCs

https://arstechnica.com/gadgets/2026/02/ram-now-represents-35-percent-of-bill-of-materials-for-hp...
90•jnord•1h ago•30 comments

How will OpenAI compete?

https://www.ben-evans.com/benedictevans/2026/2/19/how-will-openai-compete-nkg2x
90•iamskeole•6h ago•76 comments

Making MCP cheaper via CLI

https://kanyilmaz.me/2026/02/23/cli-vs-mcp.html
157•thellimist•8h ago•76 comments

The Pleasures and Pains of Coffee (1830)

https://quod.lib.umich.edu/m/mqrarchive/act2080.0035.002/10
16•jxmorris12•3d ago•1 comments

Artist who “paints” portraits on glass by hitting it with a hammer

https://simonbergerart.com
82•cs702•3d ago•33 comments

Windows 11 Notepad to support Markdown

https://blogs.windows.com/windows-insider/2026/01/21/notepad-and-paint-updates-begin-rolling-out-...
224•andreynering•11h ago•361 comments

Gauss's Weekday Algorithm, Visualized

https://lukasmetzner.github.io/blog/gauss-weekday.html
12•lukasmetzner•4d ago•0 comments

Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts

https://github.com/Zyora-Dev/zse
22•zyoralabs•3h ago•1 comments

Bus stop balancing is fast, cheap, and effective

https://worksinprogress.co/issue/the-united-states-needs-fewer-bus-stops/
319•surprisetalk•12h ago•483 comments

PA bench: Evaluating web agents on real world personal assistant workflows

https://vibrantlabs.com/blog/pa-bench
16•shahules•8h ago•2 comments

Show HN: Respectify – A comment moderator that teaches people to argue better

https://respectify.org/
119•vintagedave•14h ago•128 comments

Large-Scale Online Deanonymization with LLMs

https://simonlermen.substack.com/p/large-scale-online-deanonymization
218•DalasNoin•1d ago•169 comments

Tech companies shouldn't be bullied into doing surveillance

https://www.eff.org/deeplinks/2026/02/tech-companies-shouldnt-be-bullied-doing-surveillance
135•pseudolus•3h ago•29 comments

Self-improving software won't produce Skynet

https://contalign.jefflunt.com/self-improving-software/
6•normalocity•55m ago•1 comments

The First Fully General Computer Action Model

https://si.inc/posts/fdm1/
179•nee1r•2d ago•58 comments

The Om Programming Language

https://www.om-language.com/
241•tosh•10h ago•56 comments

An autopsy of AI-generated 3D slop

https://aircada.com/blog/ai-vs-human-3d-ecommerce
36•sech8420•7h ago•28 comments

Dissecting the CPU-memory relationship in garbage collection (OpenJDK 26)

https://norlinder.nu/posts/GC-Cost-CPU-vs-Memory/
62•jonasn•1d ago•17 comments

Learnings from 4 months of Image-Video VAE experiments

https://www.linum.ai/field-notes/vae-reconstruction-vs-generation
85•schopra909•1d ago•12 comments

Launch HN: TeamOut (YC W22) – AI agent for planning company retreats

https://app.teamout.com/ai
47•vincentalbouy•14h ago•55 comments

Quasi-Zenith Satellite System

https://en.wikipedia.org/wiki/Quasi-Zenith_Satellite_System
11•teleforce•3d ago•2 comments

Show HN: OpenSwarm – Multi‑Agent Claude CLI Orchestrator for Linear/GitHub

https://github.com/Intrect-io/OpenSwarm
8•unohee•2h ago•1 comments

GNU Texmacs

https://www.texmacs.org/tmweb/home/welcome.en.html
137•remywang•12h ago•45 comments

Show HN: I ported Tree-sitter to Go

https://github.com/odvcencio/gotreesitter
198•odvcencio•10h ago•82 comments

The Hydrogen Truck Problem Isn't the Truck

https://www.mikeayles.com/blog/hydrogen-refuelling-road-freight/
48•mikeayles•1d ago•49 comments

Access to a Shared Unix Computer

http://tilde.club/
56•TigerUniversity•3d ago•18 comments

The Misuses of the University

https://www.publicbooks.org/the-misuses-of-the-university/
141•ubasu•11h ago•105 comments