frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

A leap year check in three instructions

https://hueffner.de/falk/blog/a-leap-year-check-in-three-instructions.html
133•gnabgib•2h ago

Comments

qingcharles•2h ago
I love these incomprehensible magic number optimizations. Every time I see one I wonder how many optimizations like this we missed back in the old days when we were writing all our inner loops in assembly?

Does anyone have a collection of these things?

masfuerte•2h ago
We didn't miss them. In those days they weren't optimizations. Multiplications were really expensive.
kurthr•2h ago
and divides were worse. (1 cycle add, 10 cycle mult, 60 cycle div)
genewitch•2h ago
That's fair but mod is division, or no? So realistically the new magic number version would be faster. Assuming there is 32 bit int support. Sorry, this is above my paygrade.
qingcharles•1h ago
Yeah, I'm thinking more of ones that remove all the divs from some crazy math functions for graphics rendering and replace them all with bit shifts or boolean ops.
tylerhou•2h ago
You should look at supercompilation.
owl_vision•1h ago
there is "Hacker's Delight" by Henry S. Warren, Jr.

https://en.wikipedia.org/wiki/Hacker's_Delight

qingcharles•1h ago
Looks awesome, thank you :)
ryao•35m ago
Here is a short list:

https://graphics.stanford.edu/~seander/bithacks.html

It is not on the list, but #define CMP(X, Y) (((X) > (Y)) - ((X) < (Y))) is an efficient way to do generic branchless comparisons for things that want UNIX-style comparators. If you compare the output against 0 to check for some form of greater than, less than or equality, the compiler should automatically simplify it. For example, CMP(X, Y) > 0 is simplified to (X > Y) by a compiler.

The signum(x) function that is equivalent to CMP(X, 0) can be done in 3 or 4 instructions depending on your architecture without any comparison operations:

https://www.cs.cornell.edu/courses/cs6120/2022sp/blog/supero...

It is such a famous example, that compilers probably optimize CMP(X, 0) to that, but I have not checked.

There are a few more superoptimized mathematical operations listed here:

https://www2.cs.arizona.edu/~collberg/Teaching/553/2011/Reso...

Note that the assembly code appears to be for the Motorola 68000 processor and it makes use of flags that are set in edge cases to work.

Finally, there is a list of helpful macros for bit operations that originated in OpenSolaris (as far as I know) here:

https://github.com/freebsd/freebsd-src/blob/master/sys/cddl/...

There used to be an Open Solaris blog post on them, but Oracle has taken it down.

Enjoy!

captaincrunch•2h ago
This is fast, READABLE, and accurate:

bool is_leap_year(uint32_t y) { // Works for Gregorian years in range [0, 65535] return ((!(y & 3)) && ((y % 25 != 0) || !(y & 15))); }

andrepd•1h ago
This impl is mentioned in TFA.. It's much slower and includes branches.
hoten•1h ago
I'd expect even without optimizations on, there wouldn't be branches in the output for that code.
kragen•52m ago
There are, even with optimizations on. You could have checked: https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename...

I didn't find any way to get a compiler to generate a branchless version. I tried clang and GCC, both for amd64, with -O0, -O5, -Os, and for clang, -Oz.

mmozeiko•39m ago
If you change logic and/or to bitwise and/or then it'll be branchless.
kragen•56m ago
You commented out your entire function body and the closing }. Also, on 32-bit platforms, it doesn't stop working at 65535.
drewg123•2h ago
I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.

But do you know what's not free? Memory accesses[1]. So when I'm optimizing things, I focus on making things more cache friendly.

[1] http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-mem_...

drewg123•1h ago
To temper this slightly, these sorts of optimizations are useful on embedded CPUs for device firmware, IOT, etc. I've worked on smart NIC CPUs where cycles were so precious we'd do all kinds of crazy unreadable things.
sitzkrieg•1h ago
on the flip side of the topic, trying to do any datetime handling on the edge of embedded compute is going to be wrong 100% of the time anyway
RaoulP•1h ago
Would you mind elaborating?
bigiain•51m ago
I suspect most IOT device manufacturers expect/design their device to be landfill before worrying about leap year math. (In my least optimistic moments, I suspect some of them may intentionally implement known broken algorithms that make their eWaste stop working correctly at some point in the near future that's statistically likely to bear beyond the warranty period.)
kreco•1h ago
> such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.

I'm amazed by the fact there is always someone who will say that such optimization are totally unnecessary.

recursive•1h ago
Some people have significant positions on CPU manufacturers, so there will always be at least a few.
andrepd•1h ago
> I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.

What does this mean? Free? Optimisations are totally unnecessary because... instructions are free?

The implementation in TFA is probably on the order of 5x more efficient than a naive approach. This is time and energy as well. I don't understand what "free" means in this context.

Calendar operations are performed probably trillions of times every second across all types of computers. If you can make them more time- and energy-efficient, why wouldn't you?

If there's a problem with modern software it's too much bloat, not too much optimisation.

jdlshore•1h ago
GP made an important point that you seemed to have missed: in modern architectures, it’s much more important to minimize memory access than to minimize instructions. They weren’t saying optimization isn’t important, they were describing how to optimize on modern systems.
drewg123•1h ago
If this is indeed done trillions of times a second, which I frankly have a hard time believing, then sure, it might be worth it. But on a modern CPU, focusing on an optimization like this is a poor use of developer resources. There are likely several other optimizations related to cache locality that you could find in less time than it would take to do this, and those other optimizations would probably give several orders of magnitude more improvement.

Not to mention that the final code is basically a giant WTF for anybody reading it. It will be an attractive nuisance that people will be drawn to, like moths to a flame, any time there is a bug around calendar operations.

wtetzner•1h ago
> But on a modern CPU, focusing on an optimization like this is a poor use of developer resources.

How many people are rolling their own datetime code? This seems like a totally fine optimization to put into popular datetime libraries.

andrepd•1h ago
> There are likely several other optimizations related to cache locality that you could find in less time than it would take to do this, and those other optimizations would probably give several orders of magnitude more improvement.

How is cache / memory access relevant in a subroutine that performs a check on a 16bit number?

> Not to mention that the final code is basically a giant WTF for anybody reading it. It will be an attractive nuisance that people will be drawn to, like moths to a flame, any time there is a bug around calendar operations.

1: comments are your friend

2: a unit test can assert that this function is equivalent to the naive one in about half a millisecond.

croes•1h ago
Related

> The world could run on older hardware if software optimization was a priority

https://news.ycombinator.com/item?id=43971464

PaulKeeble•1h ago
The thing is about these optimisations (assuming they test as higher performance) is that they can get applied in a library and then everyone benefits from the speedup that took some hard graft to work out. Very few people bake their own date API nowadays if they can avoid it since it already exists and techniques like this just speed up every programme whether its on the critical path or not.
codexb•1h ago
That's basically compilers these days. It used to be that you could try and optimize your code, inline things here and there, but these days, you're not going to beat the compiler optimization.
ryao•45m ago
Meanwhile, GCC will happily implement bsearch() without cmov instructions and the result will be slower than a custom implementation on which it emits cmov instructions. I do not believe anyone has filed a bug report specifically about the inefficient bsearch(), but the bug report I filed a few years ago on inefficient code generation for binary search functions is still open, so I see no point in bothering:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110001

Meanwhile, binary searches in lookups OpenZFS B-Tree nodes are faster in part because we did not wait for the compiler:

https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56d7a...

Eliminating comparator function overhead via inlining is also a part of the improvement, which we would not have had because the OpenZFS code is not built with LTO, so even if the compiler fixes that bug, the patch will still have been useful.

kragen•26m ago
That is a meme that people repeat a lot, but it turns out to be wrong:

https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf (though see https://blog.regehr.org/archives/1515: "This piece (...) explains why Daniel J. Bernstein’s talk, The death of optimizing compilers (audio [http://cr.yp.to/talks/2015.04.16/audio.ogg]) is wrong", citing https://news.ycombinator.com/item?id=9397169)

https://blog.royalsloth.eu/posts/the-compiler-will-optimize-...

http://lua-users.org/lists/lua-l/2011-02/msg00742.html

https://web.archive.org/web/20150213004932/http://x264dev.mu...

achierius•1h ago
Just because CPU performance is increasing faster than DRAM speeds doesn't mean that CPU performance is "free" while memory is "expensive". One thing that you're ignoring is the impact of caches and prefetching logic which have significantly improved the performance of memory-bound workloads in the last 5-10 years. DRAM might be slow, but if you avoid going out to DRAM...

More broadly, it 100% depends on your workload. You'd be surprised at how many workloads are compute-bound, even today: LLM inference might be memory bound (thus how it's possible to get remotely good performance on a CPU), but training, esp. prefill, is very much not. And once you get out of LLMs, I'd say that most applications of ML tend to be compute bound.

GuB-42•1h ago
Integer division (and modulo) is not cheap on most CPUs. Along with memory access and branch prediction, it is something worth optimizing for.

And since you are talking about memory. Code also goes in memory. Shorter code is more cache friendly.

I don't see a use case where it matters for this particular application (it doesn't mean there isn't) but well targeted micro-optimizations absolutely matter.

klysm•1h ago
I think about branches a lot too when optimizing
adonovan•1h ago
Quite right. If you use a compiled language (e.g. Go) the difference between the two implementations is indeed negligible.

https://go.dev/play/p/i72xCyhqRkC

kragen•34m ago
It's true that this code was optimized from 2.6ns down to 0.9ns, a saving of 1.7ns, while an L2 cache miss might be 80ns. But 1.7ns is still about 2% of the 80ns, and it's about 70% of the 2.6ns. You don't want to start optimizing by reducing things that are 2% of your cost, but 2% isn't insignificant.

The bigger issue is that probably you don't need to do leap-year checks very often so probably your leap-year check isn't the place to focus unless it's, like, sending a SQL query across a data center or something.

__turbobrew__•26m ago
This is why linear arrays are the fastest datastructure unless proven otherwise.
22c•2h ago
Part-way through the section on bit-twiddling, I thought to myself "Oh I wonder if we could use a solver here". Lo and behold, I was pleasantly surprised to see the author then take that exact approach. Love the attention to detail in this post!
andrepd•1h ago
Interesting cppcon talk related to this (author is cited in TFA): https://www.youtube.com/watch?v=0s9F4QWAl-E
olq•1h ago
This is gold! HN Kino! Taking the hardest problem in the world, date checking, and casually bitflipping the hell out of it. Hats off man :D
dndn1•1h ago
If you need to know a leap year and it's before the year 6000, I made an interactive calculator and visualization [1].

It's >3 machine instructions (and I admire the mathematical tricks included in the post), but it does do thousands of calculations fairly quickly :)

[1] https://calculang.dev/examples-viewer?id=leap-year

high_pathetic•1h ago
https://news.ycombinator.com/item?id=42915723#42924485
esafak•54m ago
I'd be impressed if an LLM derived this independently.
NelsonMinar•49m ago
It's rare to read code that makes me literally laugh out loud. What a delight.
userbinator•20m ago
At first I thought it would be using the BCD instructions in some way, as shown near the bottom of this article: https://news.ycombinator.com/item?id=8477254
kragen•11m ago
Yeah, I was thinking AAM, but that wouldn't get you to 3.

Altman mocks Musk's Grok AI over its sudden 'white genocide' obsession

https://www.neowin.net/news/altman-mocks-musks-grok-ai-over-its-sudden-white-genocide-obsession/
1•bundie•2m ago•0 comments

Marine life's latest hotspot could be an underwater volcano primed to erupt

https://text.npr.org/nx-s1-5398591
1•mooreds•4m ago•0 comments

"You Knew What You Were Signing Up For" – A Harmful Narrative in DFIR?

https://www.forensicfocus.com/articles/you-knew-what-you-were-signing-up-for-a-harmful-narrative-in-dfir/
1•WaitWaitWha•9m ago•1 comments

Removing atmospheric CO₂ through scaleup of crops with enhanced root systems

https://iopscience.iop.org/article/10.1088/1748-9326/adc31b
1•PaulHoule•9m ago•0 comments

British ICC chief prosecutor lost access to email and has bank accounts frozen

https://www.lbc.co.uk/world-news/british-icc-chief-prosecutor-lost-email-bank-accounts-frozen-trump-sanctions/
4•perihelions•11m ago•0 comments

The Colosseum: Hidden Mechanisms of Ancient Rome [video]

https://www.youtube.com/watch?v=bmvxRMYxlhE
1•gmays•11m ago•0 comments

The Teal Programming Language

https://teal-language.org/
4•generichuman•13m ago•0 comments

Concepts, Techniques, and Models of Computer Programming [pdf]

https://webperso.info.ucl.ac.be/~pvr/VanRoyHaridi2003-book.pdf
1•nextos•13m ago•0 comments

Google's Advanced Protection for Vulnerable Users Comes to Android

https://www.wired.com/story/google-advanced-protection-vulnerable-users-lockdown-android-16/
1•dadrian•14m ago•0 comments

Show HN: Gigapixel AI – Free AI-Powered Image Enhancer with 4x Upscaling

https://gigapixelai.app
1•0xCafeBabee•19m ago•0 comments

Sitting for a long time shrinks your brain even if you exercise

https://alz-journals.onlinelibrary.wiley.com/doi/full/10.1002/alz.70157
2•codexon•21m ago•1 comments

I Heart Cardioids

https://divisbyzero.com/2018/04/02/i-heart-cardioids/
2•thunderbong•22m ago•0 comments

NASA keeps ancient Voyager 1 spacecraft alive with Hail Mary thruster fix

https://www.theregister.com/2025/05/15/voyager_1_survives_with_thruster_fix/
10•nullhole•24m ago•0 comments

A deep research agent with 10,000 sources per query

https://graphthem.com/
1•georgehill•27m ago•0 comments

The Anti-Tech Canon: 30 Books

https://www.honest-broker.com/p/the-anti-tech-canon-30-books
1•paulpauper•28m ago•0 comments

Book Review: Selfish Reasons to Have More Kids

https://www.astralcodexten.com/p/book-review-selfish-reasons-to-have
1•paulpauper•28m ago•0 comments

So You Want a Healthy Brain?

https://domofutu.substack.com/p/so-you-want-a-healthy-brain
3•wjb3•29m ago•0 comments

Ask HN: Sites Sharing Setting Files, for Apps Like Open Shell, etc.?

1•MollyRealized•36m ago•0 comments

Migrating from Postgres to CockroachDB

https://engineering.squarespace.com/blog/2025/leveraging-change-data-capture-for-database-migrations-at-scale
2•shenli3514•41m ago•0 comments

My husband was laid off from Microsoft by an algorithm – after 25 years

https://old.reddit.com/r/TrueOffMyChest/comments/1knj1sd/my_husband_was_laid_off_from_microsoft_by_an/
8•thenaturalist•42m ago•3 comments

Nuxt 3.17 Is Out

https://nuxt.com/blog/v3-17
3•danboarder•47m ago•0 comments

Show HN: Transform GitHub repositories into interactive knowledge bases

https://deepwiki.com/
1•jerawaj740•47m ago•0 comments

Once 'dead' thrusters on the farthest spacecraft from Earth are in action again

https://www.cnn.com/2025/05/14/science/voyager-1-thruster-fix
3•everybodyknows•48m ago•0 comments

Most Americans don't earn enough to afford basic costs of living

https://www.cbsnews.com/news/cost-of-living-income-quality-of-life/
10•ripe•48m ago•1 comments

Code Fast, Crash Hard: The Million-Dollar Quality Crisis

https://thenewstack.io/code-fast-crash-hard-the-million-dollar-quality-crisis/
3•MarcoDewey•48m ago•0 comments

Distributed SQL's Moment Has Arrived: How TiDB Is Leading the Way

https://sanjmo.medium.com/distributed-sqls-moment-has-arrived-how-tidb-is-leading-the-way-e448b118c897
2•shenli3514•49m ago•0 comments

Asus' Latest NUC Mini PC Is Exceptionally Powerful

https://www.howtogeek.com/asus-nuc-15-pro-plus-debut/
3•teleforce•56m ago•0 comments

Safer Intersection Legislation Signed into Law in Illinois

https://activetrans.org/blog/we-won-safer-intersections-for-illinois/
2•toomuchtodo•1h ago•1 comments

Inside the house that Asus built: New NUCs and powerful laptops

https://www.tomshardware.com/laptops/inside-the-house-that-asus-built-new-nucs-and-powerful-laptops
1•teleforce•1h ago•0 comments

Show HN: B2B Vibe Check – avoid pilots/customers who won't buy

https://b2bvibecheck.com/
1•BohoHacker•1h ago•0 comments