Tailslayer: Library for reducing tail latency in RAM reads

https://github.com/LaurieWired/tailslayer

31•hasheddan•3h ago

Comments

shaicoleman•2h ago

* Announcement [1]

* Video [2]

1. https://x.com/lauriewired/status/2041566601426956391 (https://xcancel.com/lauriewired/status/2041566601426956391)

2. https://www.youtube.com/watch?v=KKbgulTp3FE

jeffbee•1h ago

This readme, this header do not seem to discuss in any way the tradeoff, which is that you're paying by the same factor with median latency to buy lower tail latency. Nobody thinks of a load as taking 800 cycles but that is the baseline load latency here.

Also, having sacrificed my own mental health to watch the disgustingly self-promoting hour-long video that announces this small git commit, I can confidently say that "Graviton doesn't have any performance counters" is one of the wrongest things I've heard in a long time.

Overall, I give it an F.

Anyway if you want to hide memory refresh latency, IBM zEnterprise is your platform. It completely hides refresh latency by steering loads to the non-refreshing bank, and it only costs half the space, not up to 92% of your space like this technique.

PunchyHamster•1h ago

The video was about how rowhammer works, the lib was byproduct.

lauriewired•59m ago

Nope, there isn’t a tradeoff; median latency isn’t affected. I don’t think you understand the code. The p50 is identical between a single read and the hedged strategy.

The clflush is there because the technique targets data that will miss the cache anyway. If your working set fits in L1, you don’t need this.

Also, AWS Graviton instances absolutely do not expose per-channel memory controller counter PMUs. That’s why you have to use timing-based channel discovery.

The IBM z-system is neat! But my technique will work on commodity hardware in userspace, and you can easily only sacrifice half the space if you accept 2-way instead of 8+ way hedging. It’s entirely up to you how many channel copies you want to use.

Your reply was quite rude, but I hope this is informative.

hedgehog•54m ago

I was just trying to reconcile his reply with the charts. Have you tested how this scales down for smaller systems, as one might find in on the management side of a network switch?

jeffbee•35m ago

I won't be tone-policed by a person who is clearly trying to mislead and confuse people. I leave it to the other HNers to read your benchmark code and see for themselves that it is an exercise in absurdity, a work-around for its own library that doesn't measure anything other than with N threads, because of the laws of probability, this technique of reading timestamps as fast as possible and cramming them into a vector yields lower measurements with higher N.

ysleepy•59m ago

Loved the details about how memory access actually maps addresses to channels, ranks, blocks and whatever, this is rarely discussed.

Not sure how this works for larger data structures, but my first thought was that this should be implemented as some microcode or instruction.

Most computation is not thaat jitter sensitive, perception is not really in the nano to microsecond scale, but maybe a cool gadget for like dtrace or interrupt handers etc.

jagged-chisel•32m ago

My understanding is that this is making a trade off of using more space to get shorter access times. Do I have that right?

OT: Tail Slayer. Not Tails Layer. My brain took longer to parse that than I’d have wanted.

addaon•15m ago

This addresses the “short long tail” (known bounded variance due to the multiple physical operations underlying a single logical memory op), but for hard real time applications the “long long tail” of correctable-ECC-error—and-scrub may be the critical case.

Project Glasswing: Securing critical software for the AI era

System Card: Claude Mythos Preview [pdf]

S3 Files and the changing face of S3

Lunar Flyby

GLM-5.1: Towards Long-Horizon Tasks

Bitcoin and Quantum Computing

How to get better at guitar

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

Cambodia unveils a statue of famous landmine-sniffing rat Magawa

A truck driver spent 20 years making a scale model of every building in NYC

Move Detroit

Show HN: Brutalist Concrete Laptop Stand (2024)

Rescuing old printers with an in-browser Linux VM bridged to WebUSB over USB/IP

Cloudflare targets 2029 for full post-quantum security

A whole boss fight in 256 bytes

Show HN: An interactive map of Tolkien's Middle-earth

The Image Boards of Hayao Miyazaki

Assessing Claude Mythos Preview's cybersecurity capabilities

AI helps add 10k more photos to OldNYC

Google open-sources experimental agent orchestration testbed Scion

Cells for NetBSD: kernel-enforced, jail-like isolation

A blind man made it possible for others with low vision to build Lego sets

9 Mothers (YC P26) Is Hiring – Lead Robotics and More

John Coltrane Illustrates the Mathematics of Jazz

We found an undocumented bug in the Apollo 11 guidance computer code

Taste in the age of AI and LLMs

Boneyard: Generate pixel-perfect skeleton screens from your real DOM

Tailslayer: Library for reducing tail latency in RAM reads

Show HN: Unicode Steganography

RSoC 2026: A new CPU scheduler for Redox OS

Tailslayer: Library for reducing tail latency in RAM reads

Comments

Project Glasswing: Securing critical software for the AI era

System Card: Claude Mythos Preview [pdf]

S3 Files and the changing face of S3

Lunar Flyby

GLM-5.1: Towards Long-Horizon Tasks

Bitcoin and Quantum Computing

How to get better at guitar

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

Cambodia unveils a statue of famous landmine-sniffing rat Magawa

A truck driver spent 20 years making a scale model of every building in NYC

Move Detroit

Show HN: Brutalist Concrete Laptop Stand (2024)

Rescuing old printers with an in-browser Linux VM bridged to WebUSB over USB/IP

Cloudflare targets 2029 for full post-quantum security

A whole boss fight in 256 bytes

Show HN: An interactive map of Tolkien's Middle-earth

The Image Boards of Hayao Miyazaki

Assessing Claude Mythos Preview's cybersecurity capabilities

AI helps add 10k more photos to OldNYC

Google open-sources experimental agent orchestration testbed Scion

Cells for NetBSD: kernel-enforced, jail-like isolation

A blind man made it possible for others with low vision to build Lego sets

9 Mothers (YC P26) Is Hiring – Lead Robotics and More

John Coltrane Illustrates the Mathematics of Jazz

We found an undocumented bug in the Apollo 11 guidance computer code

Taste in the age of AI and LLMs

Boneyard: Generate pixel-perfect skeleton screens from your real DOM

Tailslayer: Library for reducing tail latency in RAM reads

Show HN: Unicode Steganography

RSoC 2026: A new CPU scheduler for Redox OS