Nobody ever got fired for using a struct

https://www.feldera.com/blog/nobody-ever-got-fired-for-using-a-struct

70•gz09•3d ago

Comments

SoftTalker•2h ago

> But SQL schemas often look like this. Columns are nullable by default, and wide tables are common.

Hard disagree. That database table was a waving red flag. I don't know enough/any rust so don't really understand the rest of the article but I have never in my life worked with a database table that had 700 columns. Or even 100.

gz09•2h ago

Hi, I'm the author of the article.

As to your hard disagree, I guess it depends... While this particular user is on the higher end (in terms of columns), it's not our only user where column counts are huge. We see tables with 100+ columns on a fairly regular basis especially when dealing with larger enterprises.

sublinear•56m ago

Can you clarify which knowledge domains those enterprises fall under with examples of what problems they were trying to solve?

If it's not obvious, I agree with the hard disagree. Every time I see a table with that many columns, I have a hard time believing there isn't some normalization possible.

Schemas that stubbornly stick to high-level concepts and refuse to dig into the subfeatures of the data are often seen from inexperienced devs or dysfunctional/disorganized places too inflexible to care much. This isn't really negotiable. There will be issues with such a schema if it's meant to scale up or be migrated or maintained long term.

fiddlerwoaroof•23m ago

Normalization is possible but not practical in a lot of cases: nearly every “legacy” database I’ve seen has at least one table that just accumulates columns because that was the quickest way to ship something.

Also, normalization solves a problem that’s present in OLTP applications: OLAP/Big Data applications generally have problems that are solved by denormalization.

Mikhail_Edoshin•2h ago

I saw tables with more than a thousand columns. It was a law firm home-grown FileMaker tool. Didn't inspect it too closely, so don't know what was inside

I remember a phrase from one of C. J. Date's books: every record is a logical statement. It really stood out for me and I keep returning to it. Such an understanding implies a rather small number of fields or the logical complexity will go through the roof.

unclad5968•2h ago

It might not be common in typical software shops. I work in manufacturing and our database has multiple tables with hundreds of columns.

ambicapter•2h ago

What's in them?

jayanmn•1h ago

Property1 to 20 or more is an example. There are better ways to do it but I have seen columns for storing ‘anything’

Spivak•25m ago

Sounds like a generic form of single table inheritance. I don't honestly see any other way to do it (punting to a JSON field is effectively the same thing) when you have potentially thousands of parts all with their own super specific relevant attributes.

I've worked on multiple products that have had a concept of "custom fields" who did it this way too.

unclad5968•1h ago

Data from measurement tools. Everything about the tool configuration, time of measurement, operator ID, usually a bunch of electrical data (we make laser diodes) like current, potential, power, and a bunch of emission related data.

pizza-wizard•1h ago

I’m working on migrating an IBM Maximo database from the late 90s to a SQL Server deployment on my current project. Also charged with updating the schema to a more maintainable and extensible design. Manufacturing and refurbishing domain - 200+ column tables is the norm. Very demoralizing.

holden_nelson•2h ago

https://jimmyhmiller.com/ugliest-beautiful-codebase

roblh•1h ago

I kinda love this. That sounds like an incredibly entertaining place to work for between 1 and 2 years in your late 20s and not a second longer.

tdeck•49m ago

If you enjoyed this, you'd probably enjoy thedailywtf.com, which is full of stories like that.

bobson381•1h ago

This is like the functional ugly tier of buildings from "how buildings learn". Excellent stuff

linolevan•43m ago

This is awesome. Got completely lost reading this and was struggling to figure out where I got this link from. Amazing story.

woah•1h ago

No idea what these guys do exactly but their tagline says "Feldera's award-winning incremental compute engine runs SQL pipelines of any complexity"

So it sounds like helping customers with databases full of red flags is their bread and butter

gz09•1h ago

> it sounds like helping customers with databases full of red flags is their bread and butter

Yes that captures it well. Feldera is an incremental query engine. Loosely speaking: it computes answers to any of your SQL queries by doing work proportional to the incoming changes for your data (rather than the entire state of your database tables).

If you have queries that take hours to compute in a traditional database like Spark/PostgreSQL/Snowflake (because of their complexity, or data size) and you want to always have the most up-to-date answer for your queries, feldera will give you that answer 'instantly' whenever your data changes (after you've back-filled your existing dataset into it).

There is some more information about how it works under the hood here: https://docs.feldera.com/literature/papers

nikhilsimha•1h ago

It is very common to find tables with 1000+ columns in machine learning training sets at e-commerce companies. The largest I have seen had over 10000 columns.

bananamogul•1h ago

That statement jumped out at me as well. I've worked as a DBA on tons of databases backing a wide variety of ERPs, web apps, analytics, data warehouses...700 columns?!? No.

randallsquared•1h ago

I have seen tables (SQL and parquet, too) that have at least high hundreds of optional columns, but this was always understood to be a terrible hack, in those cases.

wombatpm•1h ago

Not everyone understands normal form, much less 3rd normal form. I’ve seen people do worse with excel files where they ran out of columns and had to link across spreadsheets.

vharuck•1h ago

https://apps.naaccr.org/data-dictionary/data-dictionary/vers...

771 columns (and I've read the definitions for them all, plus about 50 more that have been retired). In the database, these are split across at least 3 tables (registry, patient, tumor). But when working with the records, it's common to use one joined table. Luckily, even that usually fits in RAM.

orthoxerox•43m ago

It's OLAP, it very common for analytical tables to be denormalized. As an example, each UserAction row can include every field from Device and User to maximize the speed at which fraud detection works. You might even want to store multiple Devices in a single row: current, common 1, 2 and 3.

arcrwlock•2h ago

Why not use a struct of arrays?

https://en.wikipedia.org/wiki/Data-oriented_design

mustache_kimono•2h ago

> Why not use a struct of arrays?

I would assume because then the shape of the data would be too different? SOAs is super effective when it suits the shape of the data. Here, the difference would be the difference between an OLTP and OLAP DB. And you wouldn't use an OLAP for an OLTP workload?

SigmundA•1h ago

Looks like they just recreated a tuple layout in rust with null bit map and everything, next up would be storing them in pages and memmap the pages.

https://www.postgresql.org/docs/current/storage-page-layout....

gz09•1h ago

Absolutely, it's a very common technique :)

I wasn't sure about writing the article in the first place because of that, but I figured it may be interesting anyways because I was kind of happy with how simple it was to write this optimization when it was all done (when I started out with the task I wasn't sure if it would be hard because of how our code is structured, the libraries we use etc.). I originally posted this in the rust community, and it seems people enjoyed the post.

astrostl•1h ago

I have mixed feelings about it, but I'm going to fire somebody tomorrow for using a struct just to prove a point to the author.

gz09•1h ago

Point them to us https://github.com/feldera/feldera -- we are hiring ;)

adampunk•1h ago

You folks have too many structs already! I just finished reading about it!

dyauspitr•1h ago

No one has written a struct in 10 years.

jimbokun•1h ago

They’re pretty popular in Go?

kstrauser•53m ago

And pervasive in Rust.

duc_minh•1h ago

> Sometimes the best optimization is not a clever algorithm. Sometimes it is just changing the shape of the data.

This is basically Rob Pike's Rule 5: If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident.(https://users.ece.utexas.edu/~adnan/pike.html)

jeswin•49m ago

I wouldn't give too much credit to rules like this. Data structures are often created with an approach in mind. You can't design a data structure without knowing how you will use it.

If anything it's the other way round, if you're not talking about business domain modeling (where data structures first is a valid approach).

sublinear•48m ago

If you don't know enough to design a data structure, requirements are missing and someone talking to the client is dropping the ball big time.

jeswin•43m ago

Where did I say any of that?

I'm saying that if you care about performance, data structures should be designed with approach specific tradeoffs in mind. And like I've said above, in typical business apps, it's ok to start with data structures because (a) performance is usually not a problem, (b) staying close to the domain is cleaner.

reverius42•23m ago

You said: "You can't design a data structure without knowing how you will use it."

But the whole discussion involves knowing how you will use it; the advocacy is for careful consideration of data structures (based on how you will use them) resulting in less pain when designing/choosing algorithms.

jeswin•18m ago

My point is that one doesn't follow the other. To design good data structures, you need to know how it'll get used (the algorithm).

> If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident.

This is what I was responding to.

reverius42•22m ago

See also:

"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious."

https://en.wikiquote.org/wiki/Fred_Brooks

amluto•48m ago

There are many systems that take a native data structure in your favorite language and, using some sort of reflection, makes an on-disk structure that resembles it. Python pickles and Java’s serialization system are infamous examples, and rkyv is a less alarming one.

I am quite strongly of the opinion that one should essentially never use these for anything that needs to work well at any scale. If you need an industrial strength on-disk format, start with a tool for defining on-disk formats, and map back to your language. This gives you far better safety, portability across languages, and often performance as well.

Depending on your needs, the right tool might be Parquet or Arrow or protobuf or Cap’n Proto or even JSON or XML or ASN.1. Note that there are zero programming languages in that list. The right choice is probably not C structs or pickles or some other language’s idea of pickles or even a really cool library that makes Rust do this.

(OMG I just discovered rkyv_dyn. boggle. Did someone really attempt to reproduce the security catastrophe that is Java deserialization in Rust? Hint: Java is also memory-safe, and that has not saved users of Java deserialization from all the extremely high severity security holes that have shown up over the years. You can shoot yourself in the foot just fine when you point a cannon at your foot, even if the cannon has no undefined behavior.)

gz09•33m ago

> Depending on your needs, the right tool might be Parquet or Arrow or protobuf or Cap’n Proto

I think parquet and arrow are great formats, but ultimately they have to solve a similar problem that rkyv solves: for any given type that they support, what does the bit pattern look like in serialized form and in deserialized form (and how do I convert between the two).

However, it is useful to point out that parquet/arrow on top of that solve many more problems needed to store data 'at scale' than rkyv (which is just a serialization framework after all): well defined data and file format, backward compatibility, bloom filters, run length encoding, compression, indexes, interoperability between languages, etc. etc.

everyone•45m ago

Just cus structs and classes work differently, and classes are much more common. I tend to make everything a class, unless there is a really good reason to make it a struct.

saghm•36m ago

I feel like I'm missing something, but the article started by talking about SQL tables, and then in-memory representations, and then on-disk representation, but...isn't storing it on a disk already what a SQL database is doing? It sounds like data is being read from a disk into memory in one format and then written back to a disk (maybe a different one?) in another format, and the second format was not as efficient as the first. I'm not sure I understand why a third format was even introduced in the first place.

System76 on Age Verification Laws

GPT-5.4

Nobody ever got fired for using a struct

Where things stand with the Department of War

10% of Firefox crashes are caused by bitflips

The Brand Age

A standard protocol to handle and discard low-effort, AI-Generated pull requests

Labor market impacts of AI: A new measure and early evidence

Stop Using Grey Text (2025)

CBP tapped into the online advertising ecosystem to track peoples’ movements

Good software knows when to stop

Show HN: Swarm – Program a colony of 200 ants using a custom assembly language

Wikipedia was in read-only mode following mass admin account compromise

A GitHub Issue Title Compromised 4k Developer Machines

A ternary plot of citrus geneology

Hardware hotplug events on Linux, the gory details

Hacking Super Mario 64 using covering spaces

Remotely unlocking an encrypted hard disk

Show HN: Jido 2.0, Elixir Agent Framework

Launch HN: Vela (YC W26) – AI for complex scheduling

How to install and start using LineageOS on your phone

Show HN: PageAgent, A GUI agent that lives inside your web app

Structured AI (YC F25) Is Hiring

Judge orders government to begin refunding more than $130B in tariffs

AI and the Ship of Theseus

Breaking Down 50M Pins: A Smarter Way to Design 3D IC Packages

Code World Models for Parameter Control in Evolutionary Algorithms

Proton Mail Helped FBI Unmask Anonymous 'Stop Cop City' Protester

Every Claim from Meta Child Safety Trials

OpenTitan Shipping in Production

System76 on Age Verification Laws

GPT-5.4

Nobody ever got fired for using a struct

Where things stand with the Department of War

10% of Firefox crashes are caused by bitflips

The Brand Age

A standard protocol to handle and discard low-effort, AI-Generated pull requests

Labor market impacts of AI: A new measure and early evidence

Stop Using Grey Text (2025)

CBP tapped into the online advertising ecosystem to track peoples’ movements

Good software knows when to stop

Show HN: Swarm – Program a colony of 200 ants using a custom assembly language

Wikipedia was in read-only mode following mass admin account compromise

A GitHub Issue Title Compromised 4k Developer Machines

A ternary plot of citrus geneology

Hardware hotplug events on Linux, the gory details

Hacking Super Mario 64 using covering spaces

Remotely unlocking an encrypted hard disk

Show HN: Jido 2.0, Elixir Agent Framework

Launch HN: Vela (YC W26) – AI for complex scheduling

How to install and start using LineageOS on your phone

Show HN: PageAgent, A GUI agent that lives inside your web app

Structured AI (YC F25) Is Hiring

Judge orders government to begin refunding more than $130B in tariffs

AI and the Ship of Theseus

Breaking Down 50M Pins: A Smarter Way to Design 3D IC Packages

Code World Models for Parameter Control in Evolutionary Algorithms

Proton Mail Helped FBI Unmask Anonymous 'Stop Cop City' Protester

Every Claim from Meta Child Safety Trials

OpenTitan Shipping in Production

Nobody ever got fired for using a struct

Comments