frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Zen Tools

http://postmake.io/zen-list
1•Malfunction92•35s ago•0 comments

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

https://hailey.at/posts/3mear2n7v3k2r
1•carnevalem•57s ago•0 comments

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•3m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
1•rcarmo•3m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•4m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•4m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
2•Brajeshwar•4m ago•0 comments

Extreme Inequality Presages the Revolt Against It

https://www.noemamag.com/extreme-inequality-presages-the-revolt-against-it/
1•Brajeshwar•5m ago•0 comments

There's no such thing as "tech" (Ten years later)

1•dtjb•5m ago•0 comments

What Really Killed Flash Player: A Six-Year Campaign of Deliberate Platform Work

https://medium.com/@aglaforge/what-really-killed-flash-player-a-six-year-campaign-of-deliberate-p...
1•jbegley•6m ago•0 comments

Ask HN: Anyone orchestrating multiple AI coding agents in parallel?

1•buildingwdavid•7m ago•0 comments

Show HN: Knowledge-Bank

https://github.com/gabrywu-public/knowledge-bank
1•gabrywu•13m ago•0 comments

Show HN: The Codeverse Hub Linux

https://github.com/TheCodeVerseHub/CodeVerseLinuxDistro
3•sinisterMage•14m ago•2 comments

Take a trip to Japan's Dododo Land, the most irritating place on Earth

https://soranews24.com/2026/02/07/take-a-trip-to-japans-dododo-land-the-most-irritating-place-on-...
2•zdw•14m ago•0 comments

British drivers over 70 to face eye tests every three years

https://www.bbc.com/news/articles/c205nxy0p31o
15•bookofjoe•14m ago•4 comments

BookTalk: A Reading Companion That Captures Your Voice

https://github.com/bramses/BookTalk
1•_bramses•15m ago•0 comments

Is AI "good" yet? – tracking HN's sentiment on AI coding

https://www.is-ai-good-yet.com/#home
2•ilyaizen•16m ago•1 comments

Show HN: Amdb – Tree-sitter based memory for AI agents (Rust)

https://github.com/BETAER-08/amdb
1•try_betaer•17m ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
2•anhxuan•17m ago•0 comments

Show HN: Seedance 2.0 Release

https://seedancy2.com/
2•funnycoding•18m ago•0 comments

Leisure Suit Larry's Al Lowe on model trains, funny deaths and Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
1•thelok•18m ago•0 comments

Towards Self-Driving Codebases

https://cursor.com/blog/self-driving-codebases
1•edwinarbus•18m ago•0 comments

VCF West: Whirlwind Software Restoration – Guy Fedorkow [video]

https://www.youtube.com/watch?v=YLoXodz1N9A
1•stmw•19m ago•1 comments

Show HN: COGext – A minimalist, open-source system monitor for Chrome (<550KB)

https://github.com/tchoa91/cog-ext
1•tchoa91•20m ago•1 comments

FOSDEM 26 – My Hallway Track Takeaways

https://sluongng.substack.com/p/fosdem-26-my-hallway-track-takeaways
1•birdculture•20m ago•0 comments

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
1•ivanglpz•24m ago•0 comments

Show HN: Almostnode – Run Node.js, Next.js, and Express in the Browser

https://almostnode.dev/
1•PetrBrzyBrzek•24m ago•0 comments

Dell support (and hardware) is so bad, I almost sued them

https://blog.joshattic.us/posts/2026-02-07-dell-support-lawsuit
1•radeeyate•25m ago•0 comments

Project Pterodactyl: Incremental Architecture

https://www.jonmsterling.com/01K7/
1•matt_d•25m ago•0 comments

Styling: Search-Text and Other Highlight-Y Pseudo-Elements

https://css-tricks.com/how-to-style-the-new-search-text-and-other-highlight-pseudo-elements/
1•blenderob•27m ago•0 comments
Open in hackernews

Bypassing the Branch Predictor

https://nicula.xyz/2025/03/10/bypassing-the-branch-predictor.html
70•signa11•2mo ago

Comments

tux3•2mo ago
My first instinct for a poorly predicted branch would be to use a conditional move.

This isn't always a win, because you prevent the CPU from speculating down the wrong path, but you also prevent it from speculating the correct path.

If you really don't care about the failure path and really don't mind unmaintainable low-level hacks, I can think of a few ways to get creative.

First there's the whole array of anti uarch-speculation-exploit tricks in the Kernel that you can use as inspiration to control what the CPU is allowed to speculate. These little bits of assembly were reviewed by engineers from Intel and AMD, so these tricks can't stop working without also breaking the kernel with it.

Another idea is to take inspiration from anti-reverse engineering tricks. Make the failure path an actual exception. I don't mean software stack unwinding, I mean divide by your boolean and then call your send function unconditionally. If the boolean is true, it costs nothing because the result of the division is unused and we just speculate past it. If the boolean is false, the CPU will raise a divide by 0 exception, and this invisible branch will never be predicted by the CPU. Then your exception handler recovers and calls the cold path.

sparkie•2mo ago
We could potentially use a conditional move and an unconditional jump to make the branch target predictor do the work instead - and flood it with a bunch of targets which are intended to mispredict. Eg, we could give 255 different paths for abandon and select one randomly:

    #define BYTE_VALUES \
        X(0x00) X(0x01) X(0x02) X(0x03) X(0x04) X(0x05) X(0x06) X(0x07) X(0x08) X(0x09) X(0x0A) X(0x0B) X(0x0C) X(0x0D) X(0x0E) X(0x0F) \
        X(0x10) X(0x11) X(0x12) X(0x13) X(0x14) X(0x15) X(0x16) X(0x17) X(0x18) X(0x19) X(0x1A) X(0x1B) X(0x1C) X(0x1D) X(0x1E) X(0x1F) \
        X(0x20) X(0x21) X(0x22) X(0x23) X(0x24) X(0x25) X(0x26) X(0x27) X(0x28) X(0x29) X(0x2A) X(0x2B) X(0x2C) X(0x2D) X(0x2E) X(0x2F) \
        X(0x30) X(0x31) X(0x32) X(0x33) X(0x34) X(0x35) X(0x36) X(0x37) X(0x38) X(0x39) X(0x3A) X(0x3B) X(0x3C) X(0x3D) X(0x3E) X(0x3F) \
        X(0x40) X(0x41) X(0x42) X(0x43) X(0x44) X(0x45) X(0x46) X(0x47) X(0x48) X(0x49) X(0x4A) X(0x4B) X(0x4C) X(0x4D) X(0x4E) X(0x4F) \
        X(0x50) X(0x51) X(0x52) X(0x53) X(0x54) X(0x55) X(0x56) X(0x57) X(0x58) X(0x59) X(0x5A) X(0x5B) X(0x5C) X(0x5D) X(0x5E) X(0x5F) \
        X(0x60) X(0x61) X(0x62) X(0x63) X(0x64) X(0x65) X(0x66) X(0x67) X(0x68) X(0x69) X(0x6A) X(0x6B) X(0x6C) X(0x6D) X(0x6E) X(0x6F) \
        X(0x70) X(0x71) X(0x72) X(0x73) X(0x74) X(0x75) X(0x76) X(0x77) X(0x78) X(0x79) X(0x7A) X(0x7B) X(0x7C) X(0x7D) X(0x7E) X(0x7F) \
        X(0x80) X(0x81) X(0x82) X(0x83) X(0x84) X(0x85) X(0x86) X(0x87) X(0x88) X(0x89) X(0x8A) X(0x8B) X(0x8C) X(0x8D) X(0x8E) X(0x8F) \
        X(0x90) X(0x91) X(0x92) X(0x93) X(0x94) X(0x95) X(0x96) X(0x97) X(0x98) X(0x99) X(0x9A) X(0x9B) X(0x9C) X(0x9D) X(0x9E) X(0x9F) \
        X(0xA0) X(0xA1) X(0xA2) X(0xA3) X(0xA4) X(0xA5) X(0xA6) X(0xA7) X(0xA8) X(0xA9) X(0xAA) X(0xAB) X(0xAC) X(0xAD) X(0xAE) X(0xAF) \
        X(0xB0) X(0xB1) X(0xB2) X(0xB3) X(0xB4) X(0xB5) X(0xB6) X(0xB7) X(0xB8) X(0xB9) X(0xBA) X(0xBB) X(0xBC) X(0xBD) X(0xBE) X(0xBF) \
        X(0xC0) X(0xC1) X(0xC2) X(0xC3) X(0xC4) X(0xC5) X(0xC6) X(0xC7) X(0xC8) X(0xC9) X(0xCA) X(0xCB) X(0xCC) X(0xCD) X(0xCE) X(0xCF) \
        X(0xD0) X(0xD1) X(0xD2) X(0xD3) X(0xD4) X(0xD5) X(0xD6) X(0xD7) X(0xD8) X(0xD9) X(0xDA) X(0xDB) X(0xDC) X(0xDD) X(0xDE) X(0xDF) \
        X(0xE0) X(0xE1) X(0xE2) X(0xE3) X(0xE4) X(0xE5) X(0xE6) X(0xE7) X(0xE8) X(0xE9) X(0xEA) X(0xEB) X(0xEC) X(0xED) X(0xEE) X(0xEF) \
        X(0xF0) X(0xF1) X(0xF2) X(0xF3) X(0xF4) X(0xF5) X(0xF6) X(0xF7) X(0xF8) X(0xF9) X(0xFA) X(0xFB) X(0xFC) X(0xFD) X(0xFE) X(0xFF)
        
    void resolve(Transaction *t) {
        void* jt[] = {
            #define X(n) [n] = &&abandon##n,
            BYTE_VALUES
            #undef X
            [0] = &send,
        };
        
        static uint8_t branch = 0;
        bool csend = should_send(t);
        branch = csend ? 0 : branch;      // cmov or SETcc
        goto * jt[branch];
        
        #define X(n) \
        abandon##n: \
            abandon(t); \
            branch = random() % 256; \
            return;
        BYTE_VALUES
        #undef X
        
        send:
            if (csend) send(t);
            else abandon(t);
            return;
    }
    
Assuming no inherent bias in the low byte produced by `random`, there's only a ~1/255 chance that an abandon branch will correctly predict, though this is also true for the send branch. The conditional branch in send though should only mispredict 1/256 times (when random returns 0).

If we're sending significantly more often than 1/256 calls to resolve, it may be possible to train the BTP to prefer the send branch, as it will correctly predict this branch more often than the others which are chosen randomly - though this would depend on how the branch target predictor is implemented in the processor.

Dwedit•2mo ago
rand() doesn't do what you think it does. It's a multiply then an add, then return particular bits of the current seed. See "Linear congruential generator" for more information.

On GCC, it's multiply by 1103515245 then add 12345, and return low 30 bits. On MSVC, it's multiply by 214013 and add 2531011 then return bits 16...30.

sparkie•2mo ago
I don't specifically mean stdlib `rand` (hence comment about assuming no inherent bias) - but just some arbitrary PRNG. Updated the code above to say `random` so as not confuse.
IshKebab•2mo ago
Interesting problem! Not a very satisfying solution but I can't think of anything better. Even if there were hints that were respected, you'd still have the problem of the code not being in icache, unless you actually execute it occasionally.
nneonneo•2mo ago
What a fun problem to think about.

My first instinct, knowing less about this domain than maybe I should, would be to abuse the return address predictor. I believe CPUs will generally predict the target of a “ret” instruction using an internal stack of return addresses; some ARM flavours even make this explicit (https://developer.arm.com/documentation/den0042/0100/Unified...).

The way to abuse this would be to put send() on the normal return path and call abandon() by rewriting the return address. In code:

  void resolve(Transaction *t) {
    predict(t);
    send(t);
  }

  void predict(Transaction *t) {
    if (!should_send(t)) *(void **)__builtin_return_address(0) = &abandon;
  }
This isn’t exactly correct because it ignores control flow integrity (which you’d have to bypass), doesn’t work like this on every architecture, and abandon() would need to be written partly in assembly to deal with the fact that the stack is in a weird state post-return, but hopefully it conveys the idea anyway.

The if in predict() is implementable as a branchless conditional move. The return address predictor should guess that predict() will return to send(), but in most cases you’ll smash the return address to point at abandon() instead.

jcul•2mo ago
Why not just make all the abandon transactions into fake discarded transactions, discard them at the send later. E.g. by poisoning the frame checksum or setting something invalid on them, so they get discarded.

Seems you'd be doing this anyway with the dummy transactions.

Then you have no branch, though may want to add dummy transactions anyway to keep the code in cache.

andrepd•2mo ago
This is literally what is says in TFA lol
jcul•2mo ago
I don't believe that's true.

The article suggests flooding the system with dummy "should send" transactions so that they become the majority.

Quote:

> One such solution that I know of is one that Carl Cook talked about during his CppCon 17 talk2: we can fill our system with mocked transaction data for which should_send(t) returns true. We have to do this enough times such that the mocked transaction data becomes the vast majority over the real data, so the branch predictor will be primed to assume that the send() path will be executed practically on every resolve() call. Overall, this may actually be a better approach than hard-coding prediction rules, because those rules wouldn’t really guarantee us that the whole send() path would get executed (the assumption may just lead to partial execution, until the should_send(t) condition is actually evaluated and the pipeline is flushed; so at the end we may still have important stuff not placed in the instruction/data cache).

What I am suggesting is to remove the branch entirely, and instead poison the "should abandon" transactions so they get dropped or null routed on the NIC. This is the kind of thing that low latency cut through L1 switches do.

Thereby removing the CPU branch predictor from the equation entirely.

kklisura•2mo ago
> I asked Claude if there is such a way to basically hard-code branch prediction rules into the machine code, and the answer was that there’s no way to do this on x86, but there is a way on ARM: the BEQP (predict branch taken) and BEQNP (predict branch not taken) instructions.

> Those ARM instructions are just hallucinated, and the reality is actually the other way around: ARM doesn’t have a way of hard-coding ‘predictions’, but x86 does.

This made me chuckle. Thanks.

nandomrumber•2mo ago
If a human wrote that here (on HN) someone would note the error and the poster would reply:

Yes, sorry, you’re correct. I’ve usually had 97 more double ristrettos by this time in the morning.

Some schools of though suggest this has already happened.

zavec•2mo ago
LLMs are just what happens when we hit the singularity of caffeine consumption?
api•2mo ago
I asked Claude once if there was a way to open a tun/tap on Windows without a driver. It hallucinated an entire supposedly undocumented NT kernel API that as far as I can tell has never existed, complete with parts of this hallucinated API being depreciated in favor of newer non-existent parts.

It was so detailed it makes me wonder if maybe it was in the training data somewhere. Maybe it ingested an internal MS doc for a proposed API or something. The case of the missing ARM instructions makes me wonder the same. Maybe someone on a forum proposed them and they were ingested.

I did actually verify on the OS that the calls do not exist in the kernel or any driver or DLL on the system.

Polizeiposaune•2mo ago
LLM's generating text referencing nonexistent up API's or machine instructions seems to me to be exactly like LLMs generating text referencing nonexistent legal cases or generating fictional text referencing nonexistent people.

They've been fed enough real text in each of these various styles that they can parrot the shallow surface style without reference to the internals.

monocasa•2mo ago
To be fair, on the x86 side those branch prediction hint prefixes have been functionally ignored by pretty much all cores for about two decades.
dooglius•2mo ago
They've been brought back in recent uarches
ViolentTurkey•2mo ago
Make sure your global branch history is the same when "mistraining" and predicting with your BTB. You may end up in the wrong BTB entry and still mess up your prediction :).
bluGill•2mo ago
Do cpus really track that much about branches? I know JIT does but where does a cpu find the needed memory to store those counters - and them how does reading those not result in a different miss because the cpu can't speculate until it does the if prediction?

last time I checked a cpu documentation they had a simple rule that branches are always taken, that would be easy for the compiler to code order first. However I don't recall which CPU that was. Still this whole thing feels like a citation needed with me being suspicious it is false. CPU designers know this matters and sometimes compilers have information they don't that users care about: they document how it works. (This is the CPU not the instruction set)

nkurz•2mo ago
Maybe I'm misinterpreting you, but I think you are vastly underestimating both the accuracy and benefit of branch prediction. Yes, CPU's track that much about branches. Your suspicion that this is false is misguided. Here's Dan Luu's 2017 overview: https://danluu.com/branch-prediction/.
umanwizard•2mo ago
> last time I checked a cpu documentation they had a simple rule that branches are always taken, that would be easy for the compiler to code order first.

Was that in the 80s? Modern performant CPUs all use dynamic branch prediction.

I’m not really sure I understand the “where does it get the memory” point. Yes, this requires some amount of memory per tracked branch. This memory is hardwired into the CPU, just like all the other memory a CPU needs to do its job (registers, L1 cache, TLB, various in-flight instruction state, etc.)

bluGill•2mo ago
over the 15 million lines of code I maintain there are a lot of branches. the cpu can track the most common ones but as soon as the code spills out of cache where does that memory come from?
umanwizard•2mo ago
It doesn’t track all of them, it’s a cache. The ones that were hit long enough ago get evicted, and then if you ever hit them again the processor will indeed have to fall back to a naive static prediction strategy with a high miss probability.
bluGill•2mo ago
Thanks, that makes sense now
monocasa•2mo ago
Modern cores can have more sram for branch prediction than they do for L1I$.
kronks•2mo ago
All of the side channel attacks for CPUs has been from in depth use of branch prediction techniques that make up a significant part of every modern processor’s performance. It’s one of the main things we can do aside from just clocking them higher.
amelius•2mo ago
> On modern x86 processors, those instruction prefixes are simply ignored

This sucks.

sparkie•2mo ago
The branch taken hint (3EH) was re-added in Redwood Cove (2023), but it's only for static prediction where the branch predictor has not yet encountered the branch - ie, useful for things you would only use once or twice but would likely take the branch. Once the branch predictor has some information the static prediction hint is ignored, so it's best to omit it for anything that will eventually have dynamic branch prediction (ie, run many times), because you'll be consuming bytes in the i-cache which serve no purpose.
tehjoker•2mo ago
if the branch is only taken once how can you realize a significant performance benefit more than a few ns?
ack_complete•2mo ago
Cold branch comes to mind -- something like a interrupt handler, that is run often enough but not in high enough bursts.
lordnacho•2mo ago
Sounds like you want to react to the market when you see a certain price in the incoming stream.

The trigger doesn't happen often, you when it does, the naive implementation incurs a misprediction penalty.

I wonder if the solution here might be to avoid the branch altogether?

Write some boolean logic monster that isn't an if condition. It unconditionally creates the reaction (new/cancel order), but if the trigger was not present, the payload is invalid and downstream processing will not let it out of your system onto the network.

Arcuru•2mo ago
I love the `[[likely]]` and `[[unlikely]]` tags since they nicely encapsulate the Modern C++ philosophy.

1. They don't work that well.

2. The intended use case is that you'd label an unlikely branch that you want to speed up as `[[likely]]`, which is confusing.

They are certainly motivated by good intentions (like the HFT use-case as mentioned in TFA, I remember that talk too, I think I was a volunteer that year for CppCon), but even after knowing both of my points above _they were still added to the standard_.

113245•2mo ago
1 - Can you elaborate? To what standard do you hold "work well"?

2 - Can you back that up? Generally [[likely]] is for the codepath that you want to be faster, where common path = fast path. It is a specific HFT use case to desire the fast path to be the uncommon path. [[likely]] is definitely intended to be the fast path per https://en.cppreference.com/w/cpp/language/attributes/likely

thatSteveFan•2mo ago
Not the parent, but wanting the rare case to be fast is definitely not a HFT exclusive thing.

I write software packet processors for a living, and it's common to have requirements stated as "process X packets per second, no matter what", ie youre judged by the worst case kind of packets. Also common is optimization of 99.9th percentile latency.

There's also just cases like if (being ddosed) or if (overloaded) where you definitely want the rare case to be faster.

As to point 1, there's.... a significant doubt as to whether these actually change performance. I have never personally managed to confirm a performance gain from them, and not for lack of trying. CPUs also seem to not give them a ton of weight, and will probably override you if they see a way more common other path.

m4nu3l•2mo ago
I remember reading about this or a similar problem before. I think it was posted on HN, but the solution was much different. They entirely removed the branch and instead marked the requests in such a way that the packets would be discarded by the network adapter. I can't remember the details. That only works for that kind of network transactions, though.
yeoyeo42•2mo ago
5 microseconds is a whole lot of time for just a branch misprediction, which is on the order of a few nanoseconds. probably a lot more important stuff going on and focusing on the wrong thing.

for example, predicting a different branch can lead to different memory access patterns. maybe you can get rid of most of the real world cost of the misprediction by just forcefully prefetching the memory the send() path is going to use.