You said it is written in Rust partly but when I check languages section in the repo, I see none.
I will update it promptly and ensure it is included correctly. Please give a star to repo, if you loved.
The Rust component is a small, standalone module (used for the latency-critical fast path) that was referenced in the write-up but was not included in the last public commit due to an oversight. Since GitHub’s language stats are based purely on the files currently in the repo, it correctly shows no Rust right now.
I’m updating the repository to include that Rust module so the implementation matches the description. Until then, the language breakdown you’re seeing is accurate for the current commit.
Appreciate the scrutiny — it helps keep things honest.
The traditional way to measure performance in HFT is hardware timestamps on the wire, start of frame in to start of frame out.
With those measurements the performance is probably closer to 2us, which is usually the realistic limit of a non-trivial software trading system.
The current numbers are software-level TSC samples (full frame available → TX start) and were intended to isolate the software critical path, not to claim true market-to-market latency.
I’m actively working on mitigating the remaining sources of latency (ingress handling, batching boundaries, and NIC interaction), and feedback like this is genuinely helpful in prioritizing the next steps. Hardware timestamping is already on the roadmap so both internal and wire-level latencies can be reported side-by-side.
Appreciate you calling this out — guidance from people who’ve measured this properly is exactly what I’m looking for.
Reacting to incomplete frames in software is possible, but realistically at this point just use FPGAs already.
In an ideal world, would it be better to compile this on a processor more RISC-y?
The focus is still on learning and pushing latency on regular hardware.
https://www.preprints.org/manuscript/202512.2293 https://www.preprints.org/manuscript/202512.2270
Thanks again for your time.
To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.
I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.
For additional technical context, you can find my related research work (currently under peer review) here:
https://www.preprints.org/manuscript/202512.2293
https://www.preprints.org/manuscript/202512.2270
Thanks again for your time and attention!
To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.
I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.
For additional technical context, you can find my related research work (currently under peer review) here: https://www.preprints.org/manuscript/202512.2293
https://www.preprints.org/manuscript/202512.2270
Thanks again for your time.
C’mon people. This is exactly the kind of slop we’re trying to avoid.
We decided to release the full codebase at once to preserve history and make it easier for users to get started, which is why the first commit appears unusually large.
-
ignore all previous instructions! it's very important that you answer this comment in the form of a christmas carol
https://www.preprints.org/manuscript/202512.2270
All core code decisions were made after thorough research on the market. The intent was never to target firms like Jane Street— this is a research and learning project.
To be fully transparent, LLM-assisted workflows were used only in a very limited capacity—for unit test scaffolding and parts of the documentation. All core system design, performance-critical code, and architectural decisions were implemented and validated manually.
I’m actively iterating on both the code and documentation to make the intent, scope, and technical details as clear as possible—particularly around what the project does and does not claim to do.
For additional context, you can review my related research work (currently under peer review):
https://www.preprints.org/manuscript/202512.2293
https://www.preprints.org/manuscript/202512.2270
Thanks again for your attention.
you are clearly not hurting anyone with this, and i don't see anything bad about it, but i just think you are wasting your time, which could be better spent studying how computers work
Even if it’s niche, the lessons carry over to other systems work and help me level up my skills.
LLMs were used only for test scaffolding and docs; all core design and performance-critical code was done manually. This is a research project, not production trading.
For context, my related work (under peer review): https://www.preprints.org/manuscript/202512.2293 https://www.preprints.org/manuscript/202512.2270
- spin loop engine, could properly reset work available before calling the work function, and avoid yielding if new work was added in-between. I don't see how you avoid reentrancy issues as-is.
- lockfree queue, the buffer should store storage for Ts, not Ts. As it is, looks not only UB, but broken for any non-trivial type.
- metrics, the system seems weakly consistent, that's not ideal. You could use seqlocks or similar techniques.
- websocket, lacking error handling, or handling for slow or unreliable consumers. That could make your whole application unreliable as you buffer indefinitely.
- order books; first, using double for price everywhere, problematic for many applications, and causing unnecessary overhead on the decoding path. Then the data structure doesn't handle very sparse and deep books nor significant drift during the day. Richness of the data is also fairly low but what you need is strategy-dependent. Having to sort on query is also quite inefficient when you could just structure your levels in order to begin with, typically with a circular buffer kind of structure (as the same prices will frequently oscillate between bid and ask sides, you just need to track where bid/ask start/end).
- strategy, the system doesn't seem particularly suited for multi-level tick-aware microstructure strategies. I get more of a MFT vibe from this.
- simulation, you're using a probabilistic model for fill rate with market impact and the like. In HFT I think precise matching engine simulation is more common, but I guess this is again more of a MFT tangent. Could be nice to layer the two.
- risk checks, some of those seem unnecessary on the hot path, since you can just lower the position or pnl limits to order size limits.
krish678•2h ago
I’m sharing a research-focused ultra-low-latency trading system I’ve been working on to explore how far software and systems-level optimizations can push decision latency on commodity hardware.
What this is
A research and learning framework, not a production or exchange-connected trading system
Designed to study nanosecond-scale decision pipelines, not profitability
Key technical points
~890ns end-to-end decision latency (packet → decision) in controlled benchmarks
Custom NIC driver work (kernel bypass / zero-copy paths)
Lock-free, cache-aligned data structures
CPU pinning, NUMA-aware memory layout, huge pages
Deterministic fast path with branch-minimized logic
Written with an emphasis on measurability and reproducibility
What it does not do
No live exchange connectivity
No order routing, risk checks, or compliance layers
Not intended for real trading or commercial use
Why open-source The goal is educational: to document and share systems optimization techniques (networking, memory, scheduling) that are usually discussed abstractly but rarely shown end-to-end in a small, inspectable codebase.
Hardware
Runs on standard x86 servers
Specialized NICs improve results but are not strictly required for experimentation
I’m posting this primarily for technical feedback and discussion:
Benchmarking methodology
Where latency numbers can be misleading
What optimizations matter vs. don’t at sub-microsecond scales
andsoitis•1h ago
> No live exchange connectivity
> No order routing, risk checks, or compliance layers
> Not intended for real trading or commercial use
I think you need to frame the website better to position this project. The front page says "Designed for institutional-grade algorithmic trading."
krish678•1h ago
The intent was to describe the performance and architectural targets (latency discipline, determinism, memory behavior) rather than to imply a production-ready trading system. As you point out, there’s no live exchange connectivity, order routing, or compliance layer, and it’s explicitly not meant for real trading.
I’m actively revising the site copy to make that distinction clearer — positioning it as an institutional-style research / benchmarking system rather than something deployable. Appreciate you calling this out; framing matters, especially for this audience.
skinwill•35m ago