Writing memory efficient C structs

https://tomscheers.github.io/2025/07/29/writing-memory-efficient-structs-post.html

172•aragonite•19h ago

Comments

sim7c00•18h ago

definitely interesting points for C structs. regarding CPU it might be interesting to look a little further still regarding optimal sizes for structs regarding cachcing, figuring out hot and cold areas and either including padding or putting cold values behind pointer to a substruct.

like the examples though, very clear what the changes are and the specific impact. always a good reminder to review for these things after a session hacking in new things!

uticus•18h ago

for another good intro to the fun with structs, Beej's guide is always a good goto:

https://beej.us/guide/bgc/html/split-wide/structs-ii-more-fu...

...goes into offsets, padding bytes, bit-field, unions, etc etc.

degenoah•17h ago

I feel like this page would have definitely helped me when I used to try reverse engineering things I had lying around on my computer.

warmwaffles•18h ago

This is also applicable outside of C as well.

mcraiha•17h ago

"The x_position and y_position can unfortunately not get a smaller type nor can a float be unsigned." If you have latest C++, you can use float16_t or bfloat16_t to get smaller type https://en.cppreference.com/w/cpp/types/floating-point.html

david2ndaccount•16h ago

C23 has _Float16

rwmj•18h ago

No mention of pahole / dwarves? https://lwn.net/Articles/335942/ It's the standard tool used by the Linux kernel developers.

uticus•17h ago

for somebody starting out, where do you go for a list of tools like this?

petergeoghegan•16h ago

Recent versions of clangd show pahole-like struct size/alignment padding overhead annotations. I find this built-in support a lot more convenient than using pahole.

ryao•10h ago

pahole is an awesome tool.

_rend•17h ago

For completeness, this description of alignment is misleading:

> Well, dear reader, this padding is added because the CPU needs memory to be aligned in sets of 4 bytes because it’s optimized in that fashion.

> ...

> Remember: since structs are aligned to 4 bytes, any padding is therefore unnecessary if the size of the struct is a multiple of 4 without the padding.

Individual data types have their own alignment (e.g., `bool`/`char` may be 1, `short` may be 2, `int` may be 4, `long` may be 8, etc.), and the alignment of a compound type (like a struct) defaults to the maximum alignment of its constituent types.

In this article, `struct Monster` has an alignment of 4 because `int` and `float` have an alignment of 4 for the author's configuration. Expanding one of the `int`s to a `long` could increase the alignment to 8 on some CPUs, and removing the `int` and `float` fields would decrease the alignment to 1 for most CPUs.

sumtechguy•13h ago

Also keep in mind that is also all very CPU and compiler specific. Had one compiler where it packed everything at 4/8, usually 8. Not the 1/2/4/8 you would expect. That was because the CPU would just seg fault if you didnt play nice with the data access. The compiler hid a lot of it if you set the packing with offsets and mem moves and shifting. It was clever but slow. So they by default picked a wide enough packing that removed the extra instructions at the cost of using more memory. x86 was by far the most forgiving while at the time I was doing it. ARM was the least forgiving (at least on the platform I was using). With MIPS being OK in some cases but not others.

gavinsyancey•11h ago

Um ... isn't alignment generally dictated by the platform ABI so that programs compiled by different compilers can be linked together?

plorkyeran•11h ago

The widely used platforms with multiple compilers generally have one or more written down ABIs that the compilers all follow, but more niche platforms frequently have exactly one compiler (often a very out of date fork of gcc) that just does whatever they felt like implementing and may not even support linking together things built by different versions of that one compiler.

bobmcnamara•6h ago

Ideally yes, but practically there are at least a dozen just for x86.(there's like 3 big ones).

jandrewrogers•5h ago

Some of the Cray hardware was basically pure 64-bit. The systems largely didn’t recognize smaller granularity. I learned a lot of lessons about writing portable C by writing code for Cray systems.

kazinator•7h ago

I.e. the author is wrong; struct s { char x; }; is not required to be four-byte-aligned. It can have a 1 byte size, and thus alignment.

bobmcnamara•5h ago

And bool may be 4, and char may be 2(m68k)!

flohofwoe•2h ago

Not unless you're stuck in the 90s ;) These sizes have all been standardized to 1 byte since C99.

flohofwoe•2h ago

AFAIK alignment doesn't even matter anymore (for CPU data at least) since the 'word size' of a modern CPU is the size of a cache line (32 or 64 bytes?), e.g. unaligned accesses within a 32 or 64 byte block are not different than aligned accesses.

(technically there is still an advantage of items aligned to their size in that such an item can never straddle adjacent cache lines though)

And there's also still tons of different alignment requirements when working with GPU data - and interestingly those alignment requirements may differ from C's alignment rules, so you may need to explicitly use packed structs (which are still not a standard C feature!) with manual padding.

birn559•2h ago

My understanding is that C++ compilers still add padding by default for performance reasons. CPU will have to spend a few cycles to reorganize data that is not aligned in chunks of 4 bytes.

flohofwoe•1h ago

Daniel Lemire did some measuring ~~recently~~ (oops, in 2012):

https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...

TL;DR: 10% difference on what in 2012 was a low-end CPU, no difference on "new in 2012" CPUs. So my guess is that by now it really doesn't matter anymore :)

david2ndaccount•16h ago

The author relies on the compiler fitting the bitfield in unused padding bytes after “speed”, but that is implementation-defined behavior (almost everything about bitfields is implementation defined). MSVC will align the unsigned bitfield to alignof(unsigned), whereas GCC and clang will use the padding bytes. To portably pack the struct, use unsigned integers and use flags (monster.flags & CAN_FLY instead of monster.can_fly).

See https://c.godbolt.org/z/K7oY77KGj

arjvik•15h ago

Why can't there be a standard defined for bitfields in future C releases? This is a long-discussed drawback of a feature that I really really want to be able to use in production code.

ajross•14h ago

There is, but it's part of the platform ABI and not the C language standard. The latter specifies syntax and behavior, the former is what's concerned with interoperability details like memory layout.

I happen to have a copy of AAPCS64 in front of me, and you can find the specification of bit packing in section 10.1.8. The sysv ABI for x86/x86_64 has its own wording (but I'm pretty sure is compatible). MSVC does something else I believe, etc...

gizmo686•14h ago

There is. I'm not sure when it was added, but looking at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf, page 106, note 13

> An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

In practice, I've been using this feature for ages (along with __attribute__((packed))). There comes a point where you can start depending on compiler specific features instead of relying solely on the standard.

nomel•12h ago

Can this be standardized? It seems it would have accommodate the most limited architecture, in terms of memory access, probably making it practically useless. Seems best left to the compiler.

wahern•10h ago

The quoted text is from the C standard. The poplar wisdom is that anything goes wrt bit-fields, but it's an exaggeration.

The original (?) DNS library used bit-fields to read and write the packet header status and flags fields, and that original code is still used today: https://github.com/openbsd/src/blob/master/include/arpa/name... It does rely on implementation-defined behavior--it swaps the order of fields based on whether the machine is little-, big-, or middle-endian--but even that behavior has remained remarkably consistent, at least on Unix ABIs.

I think the wisdom comes from dealing with niche compilers, especially from years ago, where bit-fields were one of the areas where standard adherence was less consistent; and in embedded and kernel contexts, where the implementation-defined and (implicit) undefined behavior, such as wrt atomicity or the intersection of bit-fields and (extension) packed pragmas, counseled against any usage of bit-fields.

wahern•10h ago

The portion you quoted derives word-for-word from the original ANSI standard (C89/C90): https://port70.net/~nsz/c/c89/c89-draft.html#3.5.2.1

petermclean•12h ago

I've been working on an open source library that, while it doesn't solve bitfields, can provide convenient ergonomic access to bit-granular types:

https://github.com/PeterCDMcLean/BitLib

You can inherit from bit_array and create a pseudo bitfield:

  class mybits_t : bit::bit_array<17> {
    public:
      // slice a 1 bit sized field
      // returns a proxy reference to the single bit
      auto field1() {
        return (*this)(0, 1);
      }

      // slice a 3 bit sized field 
      // returns a proxy reference to the 3 bits
      auto field2() {
        return (*this)(1, 4);
      } 
  };
  mybits_t mybits(7);
  assert(mybits.field1() == 1);
  mybits.field1() = 0;
  assert(static_cast<int>(mybits) == 6); //no implicit cast to integers, but explicit supported

There is minimal overhead depending on how aggressively the compiler inlines or optimizes*

PaulDavisThe1st•8h ago

different/more than std::bitset<SIZE> ?

petermclean•4h ago

Only similar by a little bit. Bitset misses the mark in so many ways.

bit_array can be compile time storage (ala bitset) or dynamic at construction time. There is also bit_vector which is entirely dynamic

Critically though, is that the type/library is built on a proxy iterator/pointer that lets you do certain things with STL or the bit algorithms that you cannot otherwise do with bitset.

But back to the bitfields idea. The bit array backing storage is configurable through templates so you will be able to know for certain exactly how much stack or heap your type will take up

cryptonector•8h ago

Bitfields have other problems. Say you have two bitfields each of `uint8_t` type and totaling 16 bits: well, you might think that's _two_ fields, but the compiler is allowed to treat them as _one_ whenever it wants to, and _that_ can be a problem if you're accessing one with atomics and the other without because the other can yield unsynchronized accesses to the one that needs synchronization.

Bitfields in C leave way too much behavior to the compiler or undefined. It's really intolerable.

bullen•16h ago

Adding to this you want the struct to be exactly 64 bytes, even on Apple MX processors which have 128 bytes of cache.

jakey_bakey•16h ago

Loved this, thanks!

signa11•16h ago

and i raise you this: https://www.catb.org/esr/structure-packing/

rzzzt•10h ago

Hmm, the certificate for this site has a billion SANs but catb .org is not one of them...

variadix•15h ago

Some other things you can do to improve packing are using integer indices/offsets/relative pointers instead of absolute pointers (especially on 64-bit machines), overlapping fields for unions (as an example you may have a tag that can share some bits with another field in each member of the union), using a structs position in memory to encode information (for example, if you have a binary tree of array allocated nodes, you can specify that the right child always follows the parent in memory, then you only need to store the index of the left child), and of course structure packing pragmas/attributes.

westurner•15h ago

Is it possible to apply these optimizations to Arrow?

Arrow's struct is called StructArray. Fields of StructArray have a StructType.

StructType has .bit_width and .byte_width attrs in Python and probably the other implementations too: https://arrow.apache.org/docs/python/generated/pyarrow.Struc...

Arrow supports bitfields with BooleanArray, and enums with categoricals but

"BUG: Categorical columns using the PyArrow backend requires 4x more memory" (open) https://github.com/pandas-dev/pandas/issues/58062 :

> On disk Parquet appears to store the category data as logical type String which is compressed with snappy and encoded

Arrow Flight RPC handles nested structs with enums over the wire somehow too FWIU

Retr0id•15h ago

I don't think I've ever seen struct bitfields used in real code, it's almost always manual bit-masks against a numeric type, often via macros. (I assume due to portability)

abnercoimbre•15h ago

Yep. If I need a boolean for a struct I know I’ll want more than one. So I’ll stick an integer type that lets me OR multiple enum values (and so on).

gizmo686•14h ago

I see it decently often in embedded code for memory mapped IO registers. Other than that, I don't think I've ever seen it either.

I'm not sure portability is the reason though. I've seen it in places that have no problem spamming #define __GNU_SOURCE

nomel•12h ago

This is the only place I've seen it used, and the only placed I've used it.

It's absolutely wonderful for IO.

tom_•11h ago

In terms of the underlying bit assingments, bitfields have been pretty safe for a very long time now, as you can assume little-endian. Just a question of bearing in mind the packing rules, not too difficult given stdint.h. I went full bitfield shortly after the time I went full size_t, about 15 years ago now.

Big-endian systems sort-of still exist (I do indeed have an Amiga, and an Atari STe, and an Xbox 360, and they all work, so they must count) - but they're only slightly less weird from a practical C programming perspective than stuff like the DSPs with CHAR_BIT==24, and Harvard architecture microcontrollers with 2-byte ints. And so you can ignore them. People that use still actually these systems will be used to it.

dirt_like•11h ago

I have seen bit fields used in real code. Long ago. The reason I know it so well is that bit field updates are NOT atomic, and this caused a rare crashing bug under heavy load. Unless you read the header the code looks atomic.

S->buffer_free = 1;

This is a very serious limitation if the structure/class is being updated in a multi-threaded environment.

ziml77•11h ago

The Windows API uses bitfields like that in some places. I know because I was working with the display configuration API recently and found the bitfields very annoying to use through generated FFI bindings (especially because they were used in anonymous structs inside anonymous unions).

ack_complete•7h ago

There are some other annoyances, like not being able to inline initialize a bitfield prior to C++20, and sometimes having to use unnatural typing to get optimal packing depending on the ABI. But I've seen them used, compilers have gotten pretty good at optimizing them and can coalesce writes or tests against adjacent bitfields.

anonnon•4h ago

I have, e.g., in Julia's UTF-8 library (utf8proc).

miguel_martin•15h ago

>“… Note that I'm not an expert, in anything, so take everything I write about with a grain of salt I guess :).”

>”I’m by far an expert and I could be wrong in some specifications”

You’re missing a “not” on that 2nd sentence ;)

In all seriousness, you don’t need this disclaimer, you’re selling yourself short.

There’s nothing wrong with your post to make the disclaimer.

And if there is something wrong with the post, let the comments flood in and tell you that rather than yourself.

cogman10•14h ago

Nice little article.

The next thing to really explore is how the struct is being used. Smaller is always nice, but sometimes you need it to be fast. To do that, you'll want to group together fields that are commonly read. For example, x, y, and speed are likely always read together. So, it'd make sense to group those fields together to optimize for cache reads.

The next thing to question is if your Monster is being interacted with as an individual thing or as a group of things. If it's as a group of things then it might make sense to create a "Struct of arrays" of monsters. Imagine, for example, you have a system which removes all the dead monsters. It'd be faster to iterate through an `isAlive` array rather than an array of `Monster` structs.

And now you are on your way to reinventing an ECS :)

jesse__•14h ago

Minor nitpick, 32bit max value is ~4 billion, not 4 million

zzo38computer•14h ago

I would expect many C programmers would know these things, because I do like that when writing programs in C, too (although for bit fields, I will use a single field and then named bits by macros). There are some other considerations as well though, such as static vs dynamic allocation, and the requirements of how the specific program is working.

andrewmcwatters•14h ago

The article casually ignores the fact that more production code uses bitmasking than bit fields.

cjensen•13h ago

This is good stuff to have "in the toolbox," however habitually organizing structs this way is premature optimization. Much better to sensibly organize the fields and ignore alignment entirely unless you have a really good reason to do this. Good reasons could be that you know you will create a lot of these objects, or it has become a performance issue due to cache misses this would improve.

4gotunameagain•13h ago

Unpacked structs are also very useful for parsing network data. You take a package, cast it to an unpacked struct, and can access everything nicely.

Many a spacecraft telemetry stacks work like that.

JonChesterfield•6h ago

The alias analysis pass in your C or C++ compiler will savage you for doing that unless you're careful with annotations or compiler flags though.

foota•13h ago

Some related thoughts (more for C++ than C).

You can eliminate the cost of a wrapping class with no members by taking advantage of the empty base class optimization (or rather: by avoiding members in a base class you will be taking advantage of EBCO).

You can use an untagged union in C++ (taking care to avoid undefined behavior) to implement small object optimization. In it's general form, the idea is you can store some smaller struct in a union (as a char[], since C++ doesn't have native unions) with a larger structure, and use a sentinel in the larger structure to determine which it is. E.g., if large struct has some field that must not be zero, then make it zero in the smaller struct and you can distinguish between the two without a tag (like would be required with std::variant).

I'm also a huge fan in general of "make invalid states unrepresentable" and following this can often reduce redundancy and eliminate extra fields.

kazinator•13h ago

Generally speaking, sort the members in descending order by alignment, if you want the tightest packing without resorting to nonportable compiler extensions. This descending order eliminates padding between members producing the smallest offset for the last byte of the last member, and that then minimizes the padding required at the end of the struct.

In C, alignment for basic types is linked to size. A type may have an alignment requirement that is as much as its size: e.g. an 8 byte data type might be aligned as strictly as addresses divisible by 8.

The alignment of an array aggregate is that of its element type.

The alignment of a struct is that of its most strictly aligned member.

In portable coding, we don't exactly know, but we can make the worst-case assumption that every basic type has the strictest possible alignment---i.e. its size. We also don't know the sizes of all the types, but we know that integers of higher ranks are at least as large as lower ranks; i.e. sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long), etc.

Things are tricky when you have a mixture of floating-point, pointers and integers. E.g. pointer could be the same size as a float, or bigger.

Here is a trick you use: combine together clumps of like types, and try to make combinations that are powers of two. For instance, if your structure has two or four pointers to things, try to make them adjacent. E.g.:

  struct s {
    void *p1, *p2, *p3, p4;
    short s1, s2, s3, s4;
    float f1, f2, f3, f4;
  };

Even though float is almost certainly larger than short and may have stricter alignment, its moot because I have four of everything. Even if this is on a tiny system where pointers are 16 bits wide, four of them will make 8 bytes. So when the f1 member is being placed into the struct, the offset is at least 8 byte aligned: we don't expect padding.

If you aggregate enough members of the same type in a power-of-two sized clump, you can easily create alignment which meets or exceeds the strictest alignment in the system.

So those are the main principles: group together members of the same type. Then keeping those groups together, sort by descending alignment strictness.

TinkersW•13h ago

Could improve it by using enum MonsterName :u8, moving it down near the other byte sized item, and specifying the exact type of the bitfield(u8). Then it would be 16 bytes.

_mocha•12h ago

I'm somewhat surprised to see this on the front page of HN. I distinctly remember this material being covered in a single lecture during our freshman CS course 20 years ago. I doubt this would've reached the front page in HN's early days, and it makes me wonder if we've moved away from teaching these fundamentals due to the heavy reliance on abstraction and ai agents in today's industry.

jerf•12h ago

One of the neat things about programming is that you can develop a high degree of skill in it without any formal education at all.

But it does mean that we've always had those programmers who have had characteristic holes in their education as a result.

Personally I enjoy having a mix of self-taught and formally-taught around; I think there quite a few things the self-taught are often better at than the formally educated as a group. I respect both sides. I'm just saying that there has always been a market for this sort of thing on HN because the self-taught have always been around.

nomel•12h ago

> I think there quite a few things the self-taught are often better at than the formally educated as a group.

Would you mind expanding on this a bit?

asa400•12h ago

I've never taken a lick of CS in a formal setting, and I feel like that's increasingly common as programming has broadened its employment base. Most of the people I work with haven't done formal CS, either. Consequently I've had to learn all of this stuff on my own. There are communities out there that value education this kind of education. I can vouch for the Rust community, which has helped me learn _a ton_ about this kind of "lower level" stuff over the last 5 years despite starting my career doing Ruby.

bigstrat2003•1h ago

I have a degree in CS from around the same time frame, and we never covered topics like this. So I think it varies widely, and is not necessarily a modern thing for people to not learn about struct sizing.

o11c•12h ago

Note that although alignment and size are platform-specific, in practice padding can be minimized simply by using a fixed order:

  (various vector types)
  int128_t
  long double
  long long, int64_t, int_fast64_t, double
  off_t
  time_t (can be 64 bits on 32-bit platforms only if off_t also is)
  register_t, int_fast16_t, int_fast32_t (64-bit on e.g. x32)
  void *, void(*)(), intptr_t
  ptrdiff_t (theoretically wrong if fully aligned and bigger than pointer)
  size_t, ssize_t
  long (32-bit on win64; generally has 16-bit alignment if bigger than pointer)
  int32_t, float
  int (can be 16-bit)
  short, int16_t
  char, bool, int8_t, int_fast8_t
  (a trailing array member must come last; note when calculating combined size you should use offsetof instead of sizeof to not waste padding)

Besides this, `int_leastN_t` goes with `intN_t`, and all unsigned types go with their signed types. Note that types like `id_t` are hard to place however.

Although there do exist many platforms where this is not in order of size, the bigger types generally have smaller alignment on such platforms. Despite this I have split out rows occasionally despite not knowing of any such platform.

teo_zero•11h ago

Neither TFA nor any comment mention using a struct of arrays instead of an array of structs. This technique alone eliminates all padding bytes!

herf•11h ago

Sometimes struct-of-arrays (in-memory column store) is the right solution, and you get automatic packing that way. C makes it so much easier to store an array of structs that people don't do this often enough, but column stores prove it has its uses.

munificent•10h ago

This is a good article, but it bugs me that it sort of only tells half the story.

Why are you making your structs smaller. Probably for "performance". If your goal is literally just reducing memory usage, then this is all fine. Maybe you're running on a small microcontroller and you just straight up don't have much RAM.

But these days, in most areas, memory is cheap and plentiful. What's expensive is CPU cache. If you're optimizing memory use for that, it's really about runtime speed and less about total memory usage. You're trying to make things small so you have fewer cache misses and the CPU doesn't get stuck waiting as much.

In that case, the advice here is only part of the story. Using bitfields and smaller integers saves memory. But in order to access those fields and do anything with them, they need to get loaded into registers. I'm not an expert here, but my understanding is that loading a bitfield or small int into a word-sized register can sometimes have some bit of overhead, so you have to pay attention to the trade-off.

If I was optimizing for overall runtime speed and considering packing structs to do that, I'd really want to have good profiling and benchmarks in place first. Otherwise it's easy to think you're making progress when you're actually going backwards.

zeusk•10h ago

The real issue with packed structs and bitfields happens when concurrency gets involved. Majority of modern CPU caches are private and only allow one core to hold the cache line - so it creates more false dependencies when cores are trying to alter information that was compacted into a cache line on another core.

charleslmunger•6h ago

Avoiding false sharing is a separate problem best solved by explicitly aligning the struct or relevant members to std::hardware_destructive_interference_size.

wbl•10h ago

Everything is microarch dependant but typically the memory subsystem can handle subword loads and writes. What it usually can't do is bits.

ryao•10h ago

If you are using an intrusive data structure, keeping the key on the same cache line as the pointers can make traversals faster. Reducing padding can make this easier to do.

In general, keeping commonly accessed together things on the same cache line is a good idea and the work needed to remove excessive padding overlaps with the work needed to do this.

jcelerier•8h ago

> But these days, in most areas, memory is cheap and plentiful

and that's why I am contemplating freaking 128G of ram for my next machine, I have 64 right now and OOM regularly, 32 is unuseable

zahlman•6h ago

What software are you running? I'm humming along without a problem on 8.

jcelerier•5h ago

just an instance of GCC on a larger file with some complex C++ will easily eat multiple gigabytes of RAM

ack_complete•8h ago

> If you're optimizing memory use for that, it's really about runtime speed and less about total memory usage. You're trying to make things small so you have fewer cache misses and the CPU doesn't get stuck waiting as much.

The complication is that cache is a global resource. Code that has larger data structures, even if it runs faster, can contribute to a larger working set and a slightly higher cache miss rate in other code. This can lead to a death-by-thousand cuts scenario and it's hard to get solid data on this when profiling.

You're right, though, that there are a number of ways that smaller structs can be slower, either directly by execution time or indirectly by larger code causing more misses in the instruction cache. Some other factors include whether the compiler can coalesce accesses to multiple fields grouped together, or whether the struct hits a magic power of two size for faster array indexing or vectorization. Which means you can end up speeding up code occasionally by making the struct bigger.

bluGill•7h ago

I pack structs because the packed layout matches some hardware or network structre and thus I get direct access . This isn't a common case but it exists.

yellowapple•4h ago

That case is much more common in embedded/bare-metal programming. A decent chunk of the code I've written so far for getting Zig running on a Nintendo 64 is a bunch of packed structs representing various registers (and a bunch of methods on those structs representing common uses of those registers).

It's also a reasonably common case for parsing binary data formats (e.g. a file's header has fields at exact byte offsets, and it's much faster and easier to just grab all the bytes of that header and interpret those bytes as a packed struct).

voidnap•2h ago

> memory is cheap and plentiful. What's expensive is CPU cache.

CPU cache is memory. Also memory isn't cheap, it is relatively expensive to access. CPU cache is way cheaper. You have it backwards.

ljchen•1h ago

I most agree, only except that perf is determined by a myriad of facors. Even if this piece of data fits into a cache line, it may have no or negative effect on the following code. For complex softwares, even if it's faster after the change, it's hard to tell what's the contributing factor really is. I once read a system paper (forgot the name) from a recent system research conference on how changing the code layout, e.g., moving a block of code, randomly may have unexpected impacts on perf.

Personally, I only pay attention to the cache line and layout when the structure is a concurrent data structure and needs to sync across multiple cores.

iamcreasy•9h ago

This reminds of an amazing talk by Mike Acton on Data oriented design: https://www.youtube.com/watch?v=rX0ItVEVjHc

And there is a related talk by Andrew Kelley implementing these ideas on Zig toolchain: https://www.youtube.com/watch?v=IroPQ150F6c

anitil•8h ago

I just watched that Andrew Kelly one yesterday, and really enjoyed it. I think Zig needs to be my next language to play with.

fattah25•8h ago

Remind me about Andrew Kelley conf "data-oriented design" He had explained about this topic.

bhawks•2h ago

Optimizing struct layouts is cool. but if your motivation is to pack more monsters in your game definitely consider transposing your row oriented monster struct into a column oriented entity using Entity Component System.

Example framework:

https://github.com/skypjack/entt?tab=readme-ov-file

It has benefits besides memory/cache optimizations. It organizes logic in a much more composable way that is friendly for reuse.

Sharding Postgres at Network Speed

KIRA project launches Germany's first autonomous public transport shuttles

Claude Code: My Most Trusted Coworker and My Worst Enemy

Lethal Cambodia-Thailand border clash linked to cyber-scam slave camps

Ask HN: How do you measure "AI slop"?

Agntcy: Building Infrastructure for the Internet of Agents

Gödel: The Limits of Logic and the Foundations of Modern Mathematics

Categorising My Daily Todo List with Deepseek-R1

Dine and dash mental health toll on restaurant staff

LangExtract: A Gemini powered information extraction library

C++: "model of the hardware" vs. "model of the compiler" (2018)

If the Coronation of Charles II Had Event Marketing

Dificulties with development of anonymous location bsed chat App

I Built a Modern CSS Formatter to Replace CSSComb

Leaving LinkedIn

Printer Tracking Dots

Requests for Startups F25

Web Scraping Challenges: 12 Barriers and How to Beat Them

Our Small Agency Generated 11 Tons of CO2 Using AI–Here's Our Next Step

Invincible Title Card Generator

MyUniverse

Woman Who Died of Heart Disease in ICE Custody Wasn't Allowed to See Doctor

Zigzag Number Spiral

Trump Administration Pushes for Clear Crypto Rules in New Report

Labubu

Gerald Jay Sussman on Flexible Systems (2021) [video]

Show HN: VibeTree – Parallel Claude coding with Git worktrees

A5 Geospatial Index

Mounting the Atmosphere

AWS and Meta launching startup program for open-source LLMs (Llama)

Sharding Postgres at Network Speed

KIRA project launches Germany's first autonomous public transport shuttles

Claude Code: My Most Trusted Coworker and My Worst Enemy

Lethal Cambodia-Thailand border clash linked to cyber-scam slave camps

Ask HN: How do you measure "AI slop"?

Agntcy: Building Infrastructure for the Internet of Agents

Gödel: The Limits of Logic and the Foundations of Modern Mathematics

Categorising My Daily Todo List with Deepseek-R1

Dine and dash mental health toll on restaurant staff

LangExtract: A Gemini powered information extraction library

C++: "model of the hardware" vs. "model of the compiler" (2018)

If the Coronation of Charles II Had Event Marketing

Dificulties with development of anonymous location bsed chat App

I Built a Modern CSS Formatter to Replace CSSComb

Leaving LinkedIn

Printer Tracking Dots

Requests for Startups F25

Web Scraping Challenges: 12 Barriers and How to Beat Them

Our Small Agency Generated 11 Tons of CO2 Using AI–Here's Our Next Step

Invincible Title Card Generator

MyUniverse

Woman Who Died of Heart Disease in ICE Custody Wasn't Allowed to See Doctor

Zigzag Number Spiral

Trump Administration Pushes for Clear Crypto Rules in New Report

Labubu

Gerald Jay Sussman on Flexible Systems (2021) [video]

Show HN: VibeTree – Parallel Claude coding with Git worktrees

A5 Geospatial Index

Mounting the Atmosphere

AWS and Meta launching startup program for open-source LLMs (Llama)

Writing memory efficient C structs

Comments