frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

I Ported SAP to a 1976 CPU. It Wasn't That Slow

https://github.com/oisee/zvdb-z80/blob/master/ZVDB-Z80-ABAP.md
28•weinzierl•8h ago

Comments

U1F984•6h ago
From the article: Lookup tables are always faster than calculation - is that true? I'd think that while in the distant past maybe today due to memory being much slower than CPU the picture is different nowadays. If you're calculating a very expensive function over a small domain so the lookup fits in L1 Cache then I can see it would be faster, but you can do a lot of calculating in the time needed for a single main memory access.
haiku2077•6h ago
> Lookup tables are always faster than calculation - is that true?

I know it's not always true on the Nintendo 64, because it shared a single bus between the RAM and "GPU": https://youtu.be/t_rzYnXEQlE?t=94, https://youtu.be/Ca1hHC2EctY?t=827

bayindirh•5h ago
Depends on the hardware and what you are making with that hardware. Some processors can do complicated things stupidly fast (e.g. when SIMD done right), and for some hardware platforms, a mundane operation can be very costly since they are designed for other things primarily.

My favorite story is an embedded processor which I forgot its ISA. The gist was, there was a time budget, and doing a normal memory access would consume 90% of that budget alone. The trick was to use the obscure DMA engine to pump data into the processor caches asynchronously. This way, moving data was only ~4% of the same budget, and they have beaten their performance targets by a large margin.

ay•5h ago
You will need to first sit and ballpark, and then sit and benchmark, and discover your ballpark was probably wrong anyhow:-)

Some (for me) useful pointers to that regard for both:

1. https://www.agner.org/optimize/instruction_tables.pdf - an extremely nice resource on micro architectural impacts of instructions

2. https://llvm.org/docs/CommandGuide/llvm-mca.html - tooling from Intel that allows to see some of these in real machine code

3. https://www.intel.com/content/www/us/en/developer/articles/t... - shows you whether the above is matching the reality (besides the CPU alone, more often than not your bottleneck is actually memory accesses; at least on the first access which wasn’t triggered by a hardware prefetcher or a hint to it. On Linux it would be staring at “perf top” results.

So, the answer is as is very often - “it depends”.

bayindirh•5h ago
...and we always circle back to "premature optimization is the root of all evil", since processors are a wee bit more intelligent with our instructions than we thought. :)
mananaysiempre•48m ago
Not that intelligent. If you have two loads and one store per cycle, then that’s it. (Recall that we have SSDs with 14 GB/s sequential reads now, yet CPU clocks are below 6 GHz.) Most of the computational power of a high-performance CPU is in the vector part; still the CPU won’t try to exploit it if you don’t, and the compiler will try but outside of the simplest cases won’t succeed. (Most of the computational power of a high-performance computer is in the GPU, but I haven’t gotten that deep yet.)

I don’t mean to say that inefficient solutions are unacceptable; they have their place. I do mean to say that, for example, for software running on end-users’ computers (including phones), the programmer is hardly entitled to judge the efficiency on the scale of the entire machine—the entire machine does not belong to them.

> We should forget about small inefficiences, say 97% of the time; premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

D. E. Knuth (1974), “Structured Programming with go to Statements”, ACM Comput. Surv. 6(4).

yorwba•5h ago
In this case, the lookup table is used for popcount, and there's a comment in the Z80 assembly that says "One lookup vs eight bit tests." If the code could make use of a hardware popcount instruction, the lookup table would lose, but if that isn't available, a 256-byte lookup table could be faster. So it's less "lookup tables are always faster" and more "lookup tables can be faster, this is one such case."
whizzter•5h ago
The article does mention cache friendly access patterns in the same context.

But yes, you're right. Back when I started with optimizations in the mid 90s memory _latencies_ were fairly minor compared to complex instructions so most things that wasn't additions (and multiplications on the Pentium) would be faster from a lookup table, over time memory latencies grew and grew as clock speeds and other improvements made the distance to the physical memory an actual factor and lookup tables less useful compared to recomputing things.

Still today there are things that are expensive enough that can be fit in a lookup table that is small enough that it doesn't get evicted from cache during computation, but they're few.

devnullbrain•4h ago
You are correct and I've even ran into a situation where build-time evaluation was slower than runtime calculation, thanks to code size.
mananaysiempre•4h ago
> Lookup tables are always faster than calculation - is that true?

Maybe on the Z80. Contemporary RAM was quite fast compared to it, by our sad standards.

A table lookup per byte will see you hit a pretty hard limit of about 1 cycle per byte on all x86 CPUs of the last decade. If you’re doing a state machine or a multistage table[1] where the next table index depends on both the next byte and the previous table value, you’ll be lucky to see half that. Outracing your SSD[2] you’re not, with this approach.

If instead you can load a 64-bit chunk (or several!) at a time, you’ll have quite a bit of leeway to do some computation to it before you’re losing to the lookup table, especially considering you’ve got fast shifts and even multiplies (another difference from the Z80). And if you’re doing 128- or 256-bit vectors, you’ve got even more compute budget—but you’re also going to spend a good portion of it just shuffling the right bytes into the right positions. Ultimately, though, one of your best tools there is going to be ... an instruction that does 16 resp. 32 lookups in a 16-entry table at a time[3].

So no, if you want to be fast on longer pieces of data, in-memory tables are not your friend. On smaller data, with just a couple of lookups, they could be[4]. In any case, you need to be thinking about your code’s performance in detail for these things to matter—I can’t think of a situation where “prefer a lookup table” is a useful heuristic. “Consider a lookup table” (then measure), maybe.

[1] https://www.unicode.org/versions/latest/ch05.pdf

[2] https://lemire.me/en/talk/perfsummit2020/

[3] http://0x80.pl/notesen/2008-05-24-sse-popcount.html

[4] https://tia.mat.br/posts/2014/06/23/integer_to_string_conver...

Taniwha•4h ago
I'm a sometimes CPU architect and came here to argue just this - modern CPUs have far far slower memory access (in clocks) than z80 memory access. To be fair you can probably fit any z80 table you're using into modern L1 cache, but even so you're looking at multiple clocks rather than 1.
flohofwoe•3h ago
On the Z80 any memory access had a fixed cost of 3 clock cycles (in reality the memory system could inject wait cycles, but that was an esoteric case). Together with the instruction fetch of 4 clock cycles the fastest instruction to load an 8-bit value from an address that's already in a 16-bit register (like LD A,(HL)) takes 7 clock cycles.

The fastest instructions that didn't access memory (like adding two 8-bit registers) were 4 clock cycles, so there's really not much room to beat a memory access with computation.

Today "it depends", I still use lookup tables in some places in my home computer emulators, but only after benchmarking showed that the table is actually slightly faster.

peteforde•6h ago
I have only one question: does the author know anything about coding ABAP like it's a Z80? I wish that they'd addressed this.
ooisee•2h ago
yes, it is addressed in another repo https://github.com/oisee/zvdb

and another article about zvdb-abap from ~1.5 years ago.

bravesoul2•4h ago
It's amusing that the writing style is akin to a LinkedIn what XYZ taught me about B2B sales.
ooisee•1h ago
Guilty as charged—and totally intentional.

Turns out "I want a burger" beats "we have equivalent burger at home" every time—even when the home solution is objectively better.

So yes, this reads like "What my goldfish taught me about microservices." But unlike those posts, this story has no moral—just nerdy fun with enterprise software roasting.

Sometimes you gotta speak the language they actually read there +)).

_notreallyme_•4h ago
Optimizing code on MMU-less processor versus MMU and even NUMA capable processor is vastly different.

The fact that the author achieves only a 3 to 6 times speedup on a processor running at a frequency 857 faster should have led to the conclusion that old optimizations tricks are awfully slow on modern architecture.

To be fair, execution pipeline optimization still works the same, but not taking into account the different layers of cache, the way the memory management works and even how and when actual RAM is queried will only lead to suboptimal code.

ooisee•1h ago
Seems like, You've got it backwards — and that makes it so much worse. ^_^

I ported from ABAP to Z80. Modern enterprise SAP system → 1976 processor. The Z80 version is almost as fast as the "enterprise-grade" ABAP original. On my 7MHz ZX Spectrum clone, it's neck-and-neck. On the Agon Light 2, it'll probably win. Think about that: 45-year-old hardware competing with modern SAP infrastructure on computational tasks. This isn't "old tricks don't work on new hardware." This is "new software is so bloated that Paleolithic hardware can keep up." (but even this is nonsense - ABAP is not designed for this task =)

The story has no moral, it is just for fun.

orbifold•3h ago
Writing style and length of paragraphs strongly suggest that this is AI generated in full.
ooisee•1h ago
Not yet in full, it is ~45% generated.

Two reasons: 1) English is not my native tongue, 2) I hate LinkedIn article style -> let LLM convert my hadrcore-oldschool style into something like "You won't believe what my grandmother's cat taught me about ..."

Mercury: Ultra-Fast Language Models Based on Diffusion

https://arxiv.org/abs/2506.17298
22•PaulHoule•1h ago•5 comments

François Chollet: The Arc Prize and How We Get to AGI [video]

https://www.youtube.com/watch?v=5QcCeSsNRks
53•sandslash•3d ago•34 comments

When Figma starts designing us

https://designsystems.international/ideas/when-figma-starts-designing-us/
35•bravomartin•1d ago•12 comments

Bitchat – A decentralized messaging app that works over Bluetooth mesh networks

https://github.com/jackjackbits/bitchat
487•ananddtyagi•13h ago•202 comments

Hymn to Babylon, missing for a millennium, has been discovered

https://phys.org/news/2025-07-hymn-babylon-millennium.html
47•wglb•3d ago•7 comments

Neanderthals operated prehistoric “fat factory” on German lakeshore

https://archaeologymag.com/2025/07/neanderthals-operated-fat-factory-125000-years-ago/
140•hilux•3d ago•80 comments

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics

https://alpha.lisagui.com/
395•ayaros•19h ago•115 comments

A non-anthropomorphized view of LLMs

http://addxorrol.blogspot.com/2025/07/a-non-anthropomorphized-view-of-llms.html
256•zdw•15h ago•263 comments

Intel's Lion Cove P-Core and Gaming Workloads

https://chipsandcheese.com/p/intels-lion-cove-p-core-and-gaming
220•zdw•15h ago•60 comments

Show HN: Piano Trainer – Learn piano scales, chords and more using MIDI

https://github.com/ZaneH/piano-trainer
107•FinalDestiny•2d ago•32 comments

Cpparinfer: A C++23 implementation of the parinfer algorithm

https://gitlab.com/w0utert/cpparinfer
4•tosh•3d ago•0 comments

Building the Rust Compiler with GCC

https://fractalfir.github.io/generated_html/cg_gcc_bootstrap.html
186•todsacerdoti•15h ago•41 comments

Why English doesn't use accents

https://www.deadlanguagesociety.com/p/why-english-doesnt-use-accents
188•sandbach•16h ago•259 comments

The Cat's Meat Man: Feeding Felines in Victorian London

https://publicdomainreview.org/essay/the-cats-meat-man/
30•ohjeez•2d ago•3 comments

The messy reality of SIMD (vector) functions

https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/
11•ingve•2d ago•1 comments

What every programmer should know about how CPUs work [video]

https://www.youtube.com/watch?v=-HNpim5x-IE
131•bschne•3d ago•11 comments

LLMs should not replace therapists

https://arxiv.org/abs/2504.18412
149•layer8•16h ago•190 comments

Async Queue – One of my favorite programming interview questions

https://davidgomes.com/async-queue-interview-ai/
169•davidgomes•20h ago•159 comments

High Performance Image Sensor Processing Using FPGAs [pdf]

https://oda.uni-obuda.hu/bitstream/handle/20.500.14044/10350/Gabor_S_Becker_ertekezes.pdf
61•teleforce•11h ago•3 comments

The first time I was almost fired from Apple

https://www.engineersneedart.com/blog/almostfired/almostfired.html
259•chmaynard•3d ago•105 comments

Opencode: AI coding agent, built for the terminal

https://github.com/sst/opencode
252•indigodaddy•20h ago•72 comments

Thesis: Interesting work is less amenable to the use of AI

https://remark.ing/rob/rob/Thesis-interesting-work-ie
95•koch•16h ago•64 comments

Uncommon Uses of Python in Commonly Used Libraries (2022)

https://eugeneyan.com/writing/uncommon-python/
55•sebg•3d ago•7 comments

Get the location of the ISS using DNS

https://shkspr.mobi/blog/2025/07/get-the-location-of-the-iss-using-dns/
304•8organicbits•1d ago•83 comments

Anthropic downloaded over 7M pirated books to train Claude, a judge said

https://www.businessinsider.com/anthropic-cut-pirated-millions-used-books-train-claude-copyright-2025-6
57•pyman•4h ago•85 comments

I extracted the safety filters from Apple Intelligence models

https://github.com/BlueFalconHD/apple_generative_model_safety_decrypted
471•BlueFalconHD•17h ago•343 comments

Backlog.md – Markdown‑native Task Manager and Kanban visualizer for any Git repo

https://github.com/MrLesk/Backlog.md
170•mrlesk•17h ago•43 comments

Swedish Campground (2004)

https://www.folklore.org/Swedish_Campground.html
101•CharlesW•13h ago•29 comments

New quantum paradox clarifies where our views of reality go wrong (2018)

https://www.quantamagazine.org/frauchiger-renner-paradox-clarifies-where-our-views-of-reality-go-wrong-20181203/
20•tejohnso•56m ago•7 comments

Functions Are Vectors (2023)

https://thenumb.at/Functions-are-Vectors/
210•azeemba•22h ago•108 comments