The real time sink is everything before that. Most real-world predictive problems live across many relational tables. So the majority of the work ends up being:
• Discovering which tables are actually relevant
• Understanding foreign keys and entity relationships
• Figuring out cardinality (1:1, 1:N, N:M)
• Aggregating child tables into meaningful features
• Handling time windows and leakage
• Integrating everything into a single training table
Only after all of that can you actually train the model. In many projects, 80–90% of the effort is spent on data discovery and multi-table aggregation, while the modeling step itself takes minutes.
Tabular foundation models reduce the amount of tuning required, but they don’t remove the fundamental need to collapse relational data into a single learning table. The bottleneck in tabular AI has always been the data graph, not the model.
Graphreduce is a project I've been incrementally building for a few years that addresses the real problem in tabular predictive AI: data prep
https://wesmadrigal.github.io/GraphReduce/
amazonbezos•1h ago
madman2890•1h ago