Implementing a Struct of Arrays

https://brevzin.github.io/c++/2025/05/02/soa/

85•mpweiher•6h ago

Comments

TheHideout•4h ago

So, basically it's allowing you to use a struct like you would in OOP, but get the array benefits of ECS when that struct is in a vector?

throwawaymaths•4h ago

Isn't ECS often Type-Heterogeneous? I think you mean DoD.

dgb23•3h ago

It's not necessary to think about the data interface in terms of object orientation.

You can think about it as being a composition of fields, which are individually stored in their respective array.

(Slightly beside the point: Often they are also stored in pairs or larger, for example coordinates, slices and so on are almost always operated on in the same function or block.)

The benefit comes from execution. If you have functions that iterate over these structs, they only need to load the arrays that contain the fields they need.

Zig does some magic here based on comptime to make this happen automatically.

An ECS does something similar at a fundamental level. But usually there's a whole bunch of additional stuff on top, such as mapping ids to indices, registering individual components on the fly, selecting a components for entities and so on. So it can be a lot more complicated than what is presented here and more stuff happens at runtime. It is also a bit of a one size fits all kind of deal.

The article recommends watching Andrew Kelley's Talk on DoD, which inspired the post. I agree wholeheartedly, it's a very fun and interesting one.

One of the key takeaways for me is that he didn't just slap on a design pattern (like ECS), but went to each piece individually, thought about memory layout, execution, trade offs in storing information versus recalculating, doing measurements and back of the envelope calculations etc.

So the end result is a conglomerate of cleverly applied principles and learnings.

jayd16•2h ago

More like they used reflection to take a struct and generate a SOA collection for that type. Funnily enough, they skip the part where you can actually get at the arrays and focus on the struct type deconstruction and construction.

monkeyelite•4h ago

Interest in SOA is bringing to mind the “art of the meta object protocol” which argues for a stage between class definition and implementation that would allow you to choose the layout and access method for instances of a class.

corysama•3h ago

Yep. We’re in a situation where C-like languages couple layout and access interface very tightly. But, now cache is such an overriding issue in optimization, you really want to rapidly experiment with different layouts without rewriting your whole algorithm every time. AOS, SOA, AOSOA, hot/cold data for different stages, etc…

Jon Blow’s Jai language famously added a feature to references that allowed you to easily experiment with moving data members between hot/cold arrays of structs.

https://halide-lang.org/ tackles a related problem. It decouples the math to be done from the access order so as to allow you to rapidly test looping over data in complicated ways to achieve cache-friendly access patterns for your specific hardware target without rewriting your whole core loop every time.

Halide is primarily an about image processing convolution kernels. I’m not sure how general purpose it can get.

tialaramex•3h ago

AIUI Jai no longer has that SOA versus AOS functionality which Jon was very proud of when he invented it, I expect it's one of those "hot idea" things where for one week it seems as though this is a breakthrough with applications everywhere and then a week later you realise it was not as fundamental and you were just seeing applications everywhere due to recency illusion.

The week after I first saw a Bloom Filter I imagined lots of places this would surely be great, in the decades since I probably have one bona fide use for a Bloom Filter per year, maybe less.

monkeyelite•2h ago

Exactly. The SOA thing makes sense in so few cases - usually the one big data type your program operates on.

jayd16•1h ago

I suppose something like this would be considered a violation of OOO...

That said Java might be able to naturally decompose record types into SoA collections without a big lift. Same for C# struct.

You might even be able to access views types that still code like objects but point to the backing fields in the SoA collection.

mpweiher•3h ago

Yep. Objective-S[1] takes the ideas of "the art of the metaobject protocol" and expands on them...using software architectural principles [2].

One early result was the idea of storage combinators[3], and with those, AoS/SoA pretty much just falls out.

Storage Combinators are basically an interface for data. Any kind of data, so generalized a bit from the variations of "instances of a class" that you get in CLOS's MOP.

If you code against that interface, it doesn't matter how your data is actually stored: objects, dictionaries, computed on-demand, environment variables, files, http servers, whatever. And the "combinator" part means that these can be stacked/combined.

While you can do this using a library, and a lot is in fact just implemented in libraries, you need the language support so that things like objects/structs/variables can be accessed quickly using this mechanism.

SoA storage for tables has been on the list of things to do for a long time. I just started to work on some table ideas, so might actually get around to it Real Soon Now™.

Currently I am working on other aspects of the table abstraction, so for example just integrated query, so I ca write the following:

    invoices[{Amount>30}]

And have that query work the same against an array of objects (also a kind of table) and a SQL database.

[1] https://objective.st/

[2] https://dl.acm.org/doi/10.1145/3689492.3690052

[3] https://2019.splashcon.org/details/splash-2019-Onward-papers...

sph•4h ago

        define_aggregate(^^Pointers,
            nsdms(^^T)
            | std::views::transform([](std::meta::info member){
                return data_member_spec(add_pointer(type_of(member)),
                                        {.name = identifier_of(member)});
            }));

What operator is ^^type?

diath•3h ago

https://isocpp.org/files/papers/P2996R4.html#the-reflection-...

tialaramex•3h ago

back in R4 this operator is spelled ^ but in the accepted draft it was spelled ^^

tialaramex•3h ago

This is the reflection operator. The committee† spent some time bikeshedding different squiggles for this operator, but eventually it looks like ^^ won because it was least awful.

† WG21 of SC22 of JTC1 between ISO and the IEC, "the C++ committee".

See P2996R11 for the latest draft of the work - https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p29...

PessimalDecimal•3h ago

I started thinking this would be a rehashing of how column-oriented storage formats can be far more efficient for certain access patterns. But it really was a showcase of what a bloated mess C++ has become.

monkeyelite•3h ago

And you just do not need to be doing this. Just write out which columns you need and use an index. You can write generic sort and remove, and just overload the swap function for your SOA thing.

pragma_x•41m ago

That was my takeaway as well. Pivoting a conventional structure in a column-major storage system just smells like a look-aside of some kind (a database). Plus, you lose the benefits of bulk copy/move that the compiler will likely give you in a standard struct (e.g. memcpy()), should it be on the big side. From there, we can use lots of other tricks to further speed things up: hashes, trees, bloom filters...

At the same time, I don't completely understand how such a pivot would result in a speedup for random access. I suppose it would speed up sequential access, since the multiple array storage scheme might force many more cache line updates.

dexterous8339•1h ago

I recently went down this rabbit hole myself and came out with my own column-major array type.

The explanation drifts into thinking of 2D byte arrays as 3D bit matrices, but in the end it was a 20-30x improvement in speed and binary size.

I was honestly surprised that C++ doesn't have anything built in for this, but at least it's trivial to write your own array type

mac3n•3h ago

JOVIAL language had TABLE structures, which could be declared SERIAL or PARALLEL (https://ntrl.ntis.gov/NTRL/dashboard/searchResults/titleDeta... page 281).

gr4vityWall•3h ago

Interesting article. It does show how modern C++ can be quite scary.

It reminded me of a Haxe macro used by the Dune: Spice Wars devs to transform an AoS into a SoA at compile time to increase performance: https://youtu.be/pZcKyqLcjzc?t=941

The end result is quite cool, though those compile time type generation macros always look too magical to me. Makes me wonder if just getting values using an index wouldn't end up being more readable.

Enthouan•3h ago

I’m definitely missing the point, but reading the article, I kept thinking "This would’ve been so much easier in C."

trealira•2h ago

I don't see how it would be easier. Whatever you can do in C, you can do in C++. That said, I was also a bit confused, since I guess I don't keep up with the latest C++ updates enough.

quotemstr•2h ago

No, it would be impossible in C, the "it" here being the automatic conversion of a struct to an array of the struct's fields. Sure, you can do that pretty easily by hand in C, but you can also do it easily by hand in C++, so whatever. The point of the article is the metaprogramming.

gpderetta•2h ago

The use of reflection is interesting, but is there a significant advantage, compared to something like this:

  template<template<class> class G>
  struct Point {
     G<int> x;
     G<int> y;
     auto get() { return std::tie(x,y); }
  };

  template<template<template<class> class> class C>
  struct SOA  {
    template<class T> using Id = T;
    template<class T> using Ref = T&;
    C<std::vector> vs;
    void push_back(C<Id> x) {
        std::apply([&] (auto&&... r) {
            std::apply([&](auto&&... v){ ( (r.push_back(v)),...); }, x.get());
        }, vs.get());
    }
    C<Ref> operator[](size_t i) {
        return std::apply([&] (auto&&... r) { return C<Ref>{ r[i]...}; }, vs.get());
    }
  };

  int main() {
    SOA<Point> soa_point;
    soa_point.push_back({1,2});
    auto [x,y] = soa_point[0];
  }

the__alchemist•2h ago

I think I've lost the thread on the abstractions. (Me not being very familiar with Zig outside of its most basic syntax is probably why.) I've been doing a lot of SoA work in rust lately; specifically because I have numerical/scientific code that uses CPU SIMD and CUDA; SoA works great for these.

The workflow is, I set up Vec3x8, and Quaternionx8, which wrap a simd instrinsic for each field (x: f32x8, y: f32x8...) etc.

For the GPU and general flattening, I just pack the args as Vecs, then the fn signature contains them like eps: &[f32], sigma: &[f32] etc. So, I'm having trouble mapping this SoA approach to the abstractions used in the article. Then the (C++-like CUDA) kernel sees these as *float3 params etc. It also feels like a complexity reverse of the Rust/Zig stereotypes...

Examples:

  struct Vec3x8 {
    x: f32x8,
    y: f32x8,
    z: f32x8
  } // appropriate operator overloads...


  struct Setup {
      eps: Vec<f32>,
      sigma: Vec<f32>,
  }

So, Structs of Arrays, plainly. Are the abstractions used here something like Jai is attempting, where the internal implementation is decoupled from the API, so you don't compromise on performance vice ergonomics?

jayd16•2h ago

Towards the middle you realize this is about the reflection more than anything else.

I do like how directly accessing the fields individually (the whole reason you would do this) is a hypothetical presented as an after thought. Enjoyably absurd.

Show HN: AtomCard – Instant Virtual Crypto Cards with No KYC, Global Payments

Looking for Federal Data? Go Local

Richard Bernstein, Pioneer of Diabetics' Self-Monitoring Blood Sugar, Dies at 90

The Combined Cipher Machine, 1942-1962

Why Shouldn't I Invert That Matrix? (2020)

The Graphing Calculator Story (2006) [video]

Create macOS automations using a little-known app

Quasiparticle and superfluid dynamics in Magic-Angle Graphene

Leaked Interview with NIH Director Jay Bhattacharya

Show HN: Chat With Cluster – Debug k8s in natural language

Show HN: DeepCue – The Anti-Cluely Tool to Help Detect Interview Fraud

Ergodic Literature

China approves building of 10 new nuclear power units for $27B

DDoS in 2025: 358% Spike and 6.5 Tbps Record [video]

Generative AI and the War on Writing [video]

Show HN: One-click security scanner for web, code, and AI vulnerabilities

Talking to Peter Farkas of FerretDB on Talking Postgres Podcast

Proposed law would require Apple and Google to verify your age to use app stores

Orders for Pahalgam satellite images from US firm peaked 2 months before attack

The female body is not the problem

Giving V8 a Heads-Up: Faster JavaScript Startup with Explicit Compile Hints

Writing a Social Insect Civilization

Ive used deepwiki from devin.ai to document my project and its awesome

Statistically Speaking, We Should Have Heard from Aliens by Now – Universe Today

Saliva Is a Critical but Underestimated Bodily Fluid

Girls on the Run Northern Arizona Direct Give Fundraiser 2024-2025

Ghana Reaps the Fruits of Science Investment

Detecting if an expression is constant in C

UCSD Pascal: In depth 1/n – markbessey.blog

'Orwellian': planetary scientists outraged over deletion of research records

Implementing a Struct of Arrays

Comments

Show HN: AtomCard – Instant Virtual Crypto Cards with No KYC, Global Payments

Looking for Federal Data? Go Local

Richard Bernstein, Pioneer of Diabetics' Self-Monitoring Blood Sugar, Dies at 90

The Combined Cipher Machine, 1942-1962

Why Shouldn't I Invert That Matrix? (2020)

The Graphing Calculator Story (2006) [video]

Create macOS automations using a little-known app

Quasiparticle and superfluid dynamics in Magic-Angle Graphene

Leaked Interview with NIH Director Jay Bhattacharya

Show HN: Chat With Cluster – Debug k8s in natural language

Show HN: DeepCue – The Anti-Cluely Tool to Help Detect Interview Fraud

Ergodic Literature

China approves building of 10 new nuclear power units for $27B

DDoS in 2025: 358% Spike and 6.5 Tbps Record [video]

Generative AI and the War on Writing [video]

Show HN: One-click security scanner for web, code, and AI vulnerabilities

Talking to Peter Farkas of FerretDB on Talking Postgres Podcast

Proposed law would require Apple and Google to verify your age to use app stores

Orders for Pahalgam satellite images from US firm peaked 2 months before attack

The female body is not the problem

Giving V8 a Heads-Up: Faster JavaScript Startup with Explicit Compile Hints

Writing a Social Insect Civilization

Ive used deepwiki from devin.ai to document my project and its awesome

Statistically Speaking, We Should Have Heard from Aliens by Now – Universe Today

Saliva Is a Critical but Underestimated Bodily Fluid

Girls on the Run Northern Arizona Direct Give Fundraiser 2024-2025

Ghana Reaps the Fruits of Science Investment

Detecting if an expression is constant in C

UCSD Pascal: In depth 1/n – markbessey.blog

'Orwellian': planetary scientists outraged over deletion of research records