The parser is the part I'm most proud of. Instead of allocating strings for each parsed field, everything is a Span { off: u16, len: u16 } — a 4-byte view into the original buffer. The full header table is [Header; 64] on the stack (640 bytes). During parsing, it also extracts content-length/chunked/keep-alive and builds an O(1) known-header index (21 common headers tracked in a fixed array). Header lookup after parsing is a single array dereference — about 0.6 ns vs 20-23 ns for a linear scan.
I benchmarked head-to-head against httparse (the parser behind hyper/axum/actix-web), same machine, same inputs, Criterion: - Small request (35B): 42 ns vs 52 ns - 1.25x faster - Medium request (368B, 9 headers): 200 ns vs 230 ns - 1.15x faster - Large request (733B, 20 headers): 420 ns vs 466 ns - 1.11x faster
synapserve does strictly more work per parse than httparse (semantic extraction + header indexing) and is still faster. The gap widens to 1.38-1.46x when you add equivalent semantic extraction to httparse. SIMD scanning (AVX2/SSE4.2 with runtime detection, NEON on ARM64) handles header name validation, header value validation, and URI scanning at 16-32 bytes per instruction.
The I/O layer uses io_uring with: - Multishot accept (one SQE, N connections) - Multishot recv with provided buffer rings (kernel picks the buffer, no userspace allocation) - Zero-copy send (SEND_ZC) and splice for static files and proxy relay - kTLS — rustls does the TLS 1.3 handshake, then session keys are installed in the kernel via setsockopt(SOL_TLS). After that, the kernel handles encrypt/decrypt transparently, so SEND_ZC and splice still work through TLS.
Each worker thread owns its connections, buffers, and ring. Connection state is a flat array indexed by slot, with generation counters for stale CQE detection. What works today: HTTP/1.1 request handling, radix-tree router, virtual hosts, static file serving (ETag, Range, Brotli), reverse proxy with upstream load balancing (weighted round-robin, least-conn, IP hash, health tracking, automatic failover, zero-copy splice relay), TLS 1.3 with kTLS.
Static file serving benchmarks (wrk, 256 connections): 205K req/s on small files (+79% vs nginx), 14.5MB RSS.
What doesn't exist yet: HTTP/2, HTTP/3, WebSocket. These are next. Honest limitations: - Linux-only (io_uring). No plans for macOS/Windows support. - HTTP/1.1 only for now. HTTP/2 is in progress. - The parser uses u16 spans, so max header area is 64KB. Fine for real traffic, but it's a hard limit. - Single-machine only. No clustering or distributed config. - Not production-battle-tested yet. It works and benchmarks well, but it hasn't handled real traffic at scale.
All the benchmark code is a separate crate with the exact same inputs for both parsers — nothing cherry-picked. The parser deep dive with methodology is on the site.
Parser benchmark writeup: https://synapserve.io/posts/http-parser-performance/ Happy to answer any questions about the architecture, the io_uring integration, or the SIMD scanning approach.