Design highlights: - worker thread per CPU (pinned) and RX/TX queue - one io_uring instance per worker - SO_REUSEPORT listener sharding - lock-free shared KV store - Redis and Memcached protocols (small subset)
Quick benchmark using memtier_benchmark: - ~3.9M ops/sec - 230 µs average client latency - 823 µs p99.99 client latency - ~18.8M ops/sec with pipelining
About 90% of the code was written using AI coding agents. I mainly focused on the architecture and reviewed the generated code.