The project is designed for education, systems research, and latency instrumentation, not for live trading. It focuses on understanding exactly where every nanosecond goes in a trading execution path.
Key features:
- Kernel-bypass networking: Direct userspace access to NICs via custom drivers, 20-50 ns RX latency - Lock-free SPSC/MPSC queues: Zero-copy architecture - SIMD feature extraction: About 40 ns per update using AVX-512 - Deterministic replay: Bit-identical execution paths, SHA-256 verified - Nanosecond-level metrics: Full audit logs and performance dashboard
Technical stack: C++17 and Rust, NUMA-aware memory allocation, cache-line alignment, inline assembly for hot paths.
The framework is modular, allowing experimentation with different NIC drivers, feature extraction pipelines, or order-flow models such as Hawkes processes or Avellaneda-Stoikov logic. Everything is open source and documented.
Links:
Live demo: https://submicro.krishnabajpai.me/ Source code: https://github.com/krish567366/submicro-execution-engine Bare-metal NIC drivers: https://baremetalnic.krishnabajpai.me/
I would welcome feedback from anyone working on low-latency systems, networking, or HFT research.
Some questions for discussion:
- Which part of the execution path is typically hardest to optimize? - What measurement techniques do you trust for sub-microsecond systems?
This project is for research and educational purposes only. It does not connect to exchanges or execute real trades. It is intended as a sandbox for understanding ultra-low-latency execution.
I am happy to answer questions about methodology, performance, or design trade-offs.