frontpage.

dlgo is an LLM inference engine written in Go. CPU path has zero dependencies beyond the standard library. GPU path uses Vulkan compute — no CUDA required.

I benchmarked it against Ollama using the exact same GGUF files on an RTX 4070 Ti SUPER:

GPU (dlgo Vulkan vs Ollama CUDA):

Qwen3.5 0.8B: 239 tok/s vs 187 tok/s — 28% faster Gemma 3 270M: 456 tok/s vs 503 tok/s (−9%) SmolLM2 360M: 420 tok/s vs 451 tok/s (−7%) 10 models tested, within 7–25% of CUDA on standard architectures CPU (dlgo vs Ollama, same GGUF):

6 of 10 models within 9% of Ollama 2 models faster (Gemma 270M +3%, SmolLM2 360M +7%) The Qwen3.5 result surprised me. Qwen3.5 uses a hybrid Gated Delta Net + attention architecture (SSM layers with a recurrent delta rule). I wrote 6 custom Vulkan compute shaders for it — conv1d, delta rule recurrence, L2 normalization, sigmoid gating — and the fused Vulkan pipeline ended up outperforming llama.cpp's CUDA kernels.

Vulkan means this runs on AMD, Intel, and mobile GPUs too — not just NVIDIA. Ollama's own Vulkan backend is 66–126% slower than dlgo on the models I tested.

Supports LLaMA, Qwen2/3/3.5, Gemma, Phi, SmolLM2, Mistral, plus Whisper speech-to-text. 25+ quantization formats (Q4_0 through Q8_0, all K-quants).

Three lines to run:

model, _ := dlgo.LoadLLM("model.gguf") response, _ := model.Chat("", "What is the capital of France?") fmt.Println(response)

Show HN: Proxly – Self-hosted tunneling on your own domain in 60 second

Show HN: Conflicts.app, Iran conflict dashboard better then alternatives

Show HN: J2Download – A simple online downloader supporting 40 platforms

Bippy: React Internals Toolkit

The Window Chrome of Our Discontent

How I've learned that certainty is the thing to fear

Show HN: Muffle – Blur everything except the active window in macOS

I was "early" in agentic coding. Here's my story

Show HN: Drizby – WIP Metabase Alternative

The First Multi-Behavior Brain Upload

Anthropic CEO reveals the reasons he rejected The Pentagon

Show HN: Stardial – a highly customizable terminal clock (Rust)

Emporion: A P2P Economy for Agents

Microsoft/Hve-Core

Solving Compaction with Lobotomy

Pushing and pulling: three reactivity algorithms

Reverse engineering a DOS game with no source code using Codex 5.4

Show HN: OpenClaw – Self-host OpenClaw in one command

Money and collateral in an AI-first society

Ask HN: Can I repurpose a Bluetooth voice remote as input device for a PC?

Ask HN: How are you handling persistent memory across local Ollama sessions

Show HN: Spadyum – An Open-Source Civilization Backup Protocol

Julia Snail – An Emacs Development Environment for Julia Like Clojure's Cider

Notes on Writing WASM

Making Firefox's right-click not suck, more, with userChrome.css

Run prompts on a schedule with Claude Code

Show HN: Open-source self-hosted Intercom and CCTV platform

Show HN: Self-Evolving Skill – empirical results from a 5-round experiment

What Is AI Reading?

Rcarmo/piclaw: An all-in one agent environment with a mobile-first web UI

Show HN: Go LLM inference with a Vulkan GPU back end that beats Ollama's CUDA