frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Formalizing Data Structures and Algorithms with Agents

https://risemsr.github.io/blog/2026-03-06-autoclrs/
1•alpaylan•1m ago•0 comments

Is the AI Compute Crunch Here?

https://martinalderson.com/posts/is-the-ai-compute-crunch-here/
1•gmays•2m ago•0 comments

Show HN: A tool that automatically installs Python and common dev libraries

1•Alexpan_dev•2m ago•0 comments

Show HN: AI-Proof Careers Leaderboard

https://github.com/yoyothesheep/ai-resilient-occupations-data
1•yoyothesheep•3m ago•0 comments

LeRobot v0.5.0: Scaling Every Dimension

https://huggingface.co/blog/lerobot-release-v050
2•ibobev•6m ago•0 comments

Ulysses Sequence Parallelism: Training with Million-Token Contexts

https://huggingface.co/blog/ulysses-sp
1•ibobev•6m ago•0 comments

In the '90s Germany's air traffic control ran on Emacs

https://old.reddit.com/r/emacs/comments/lly7po/comment/gnvzisy/
2•clyfe•7m ago•0 comments

Simulating Queueing 2

https://buttondown.com/jaffray/archive/simulating-queueing-2/
1•ibobev•7m ago•0 comments

Trump says Iran 'war is complete,' talks to Putin

https://www.cnbc.com/2026/03/09/trump-iran-war-end.html
3•kamaraju•8m ago•0 comments

Feed Palestine

1•alpple•8m ago•0 comments

Skill to slim down your bloated AGENTS.md file

https://mheadd.github.io/agent-slimmer/
1•mjheadd•8m ago•0 comments

I wrote a OpenClaw Operators Field Guide for operating multi-agent AI systems

https://bethegorilla.com/
2•pathowlett•8m ago•1 comments

Snice – 130 web components and a decorator-based framework

1•hedzer•9m ago•0 comments

Some skills become second nature

https://news.mit.edu/2026/how-some-skills-become-second-nature-0304
1•rbanffy•9m ago•0 comments

One Year with Hyprland

https://www.whileforloop.com/en/blog/2026/03/09/one-year-with-hyprland/
2•wookashh•11m ago•0 comments

Oracle is building yesterday's data centers with tomorrow's debt

https://www.cnbc.com/2026/03/09/oracle-is-building-yesterdays-data-centers-with-tomorrows-debt.html
8•spenvo•12m ago•0 comments

Show HN: Making Codex stop rediscovering the same repository over and over

1•oldskultxo•13m ago•1 comments

Setting Up a Debug Environment for QEMU PCI Device Exploitation

https://varik.dev/blog/htb/nftdrm/debug-env-for-qemu-pwn
1•varik77•13m ago•0 comments

Taara Beam

https://taaraconnect.com/product/beam
3•rglover•15m ago•0 comments

Talking Face Animation Using a Learned Kalman Filter on Mobile Devices

https://www.mdpi.com/1424-8220/26/4/1377
2•PaulHoule•15m ago•0 comments

Show HN: DevToolbox – 13 browser-based dev tools, privacy-first

https://geld-verdienen-app-kbpcmxfq.devinapps.com
1•DevToolboxApp•16m ago•1 comments

Thomas Selfridge: The First Airplane Fatality

https://www.amusingplanet.com/2026/03/thomas-selfridge-first-airplane-fatality.html
4•Hooke•16m ago•0 comments

Show HN: MDviewer – native macOS app for opening Markdown as print-ready docs

https://github.com/JackYoung27/mdviewer
1•jacknotold•18m ago•1 comments

'Love Is Strong as Death' Review: Triangles of Life

https://www.wsj.com/arts-culture/books/love-is-strong-as-death-review-a-theologians-triangles-of-...
1•apollinaire•18m ago•0 comments

Closing the verification loop: Observability-driven harnesses for agents

https://www.datadoghq.com/blog/ai/harness-first-agents/
2•alpaylan•18m ago•0 comments

Show HN: Efficient LLM Architectures for 32GB RAM (Ternary and Sparse Inference)

https://github.com/opengraviton/graviton-native
2•fatihturker•18m ago•1 comments

Show HN: SubmitGate – catch mobile submission/compliance issues before release

https://submit-gate.com/
2•drkhannah•20m ago•0 comments

Show HN: Agents.txt – proposed standard for AI agent permissions on the web

https://github.com/jaspervanveen/agents-txt
2•jaspervanveen•22m ago•3 comments

Bluesky CEO Jay Graber will step aside

https://www.theverge.com/tech/891562/bluesky-new-ceo
2•microsoftedging•23m ago•1 comments

Anthropic launches code review tool to check flood of AI-generated code

https://techcrunch.com/2026/03/09/anthropic-launches-code-review-tool-to-check-flood-of-ai-genera...
3•LostMyLogin•25m ago•1 comments
Open in hackernews

Why Most Valuable AI Systems Are Still Tabular Models

5•madman2890•2h ago
The Hard Part of Predictive AI Isn’t the Model

I’ve spent most of my career building predictive systems on tabular data.

The highest-value AI systems I’ve seen in production aren’t LLMs. They’re predictive models that operate on structured operational data: customers, orders, shipments, transactions, support events, etc.

These systems quietly generate millions in value by replacing expensive third-party services, improving operational decisions, and turning predictions into products.

Examples include churn prediction, fraud detection, ETA prediction, inventory demand forecasting, and operational anomaly detection.

In practice, the model itself is rarely the bottleneck.

The real bottleneck is integrating signals from relational data.

Why Tabular Data Is Hard

Most operational systems store data across many relational tables, not a single ML-ready dataset.

For example, consider a simple commerce schema:

customers -> orders -> order_notifications

If you want to train a model predicting something like:

Will this customer churn in the next 30 days?

the model does not train directly on these tables.

Instead you must first construct a training table like:

customer_id num_orders_last_30_days avg_order_value days_since_last_order num_notifications_last_7_days notification_rate_per_order ... target_churn

Building this dataset requires joins, aggregations, time windows, handling one-to-many relationships, and preventing data leakage.

For example:

num_orders_last_30_days = COUNT(orders WHERE order_timestamp >= now() - 30d)

num_notifications_last_7_days = COUNT(order_notifications WHERE timestamp >= now() - 7d)

This sounds simple, but at scale this quickly becomes hundreds of features, dozens of tables, and complex temporal joins.

In most organizations this data preparation step dominates the project.

Not the model.

Where Tabular Foundation Models Fit

Recently there has been a lot of excitement around tabular foundation models, such as TabPFN, TabTransformer variants, and other pretrained tabular architectures.

These models are interesting because they can often produce strong predictions with very little tuning.

You can often train them with something as simple as:

model.fit(X_train, y_train)

and they work surprisingly well.

However, these models typically expect a single flat table.

Something like:

customer_id f1 f2 f3 f4 ... target

They generally do not operate directly on relational schemas.

So the fundamental bottleneck remains:

How do you turn relational data into a useful feature table?

GraphReduce: Treating Relational Data as a Graph

This is where approaches like GraphReduce come in.

Relational schemas naturally form a graph structure.

Using the previous example:

customers | orders | order_notifications

Each edge represents a relationship where signals can propagate.

For example, orders can propagate to customers, and notifications can propagate to orders and then to customers.

GraphReduce treats the schema as a propagation graph.

Each table contributes signals that are aggregated upward.

Example propagation:

From order_notifications to orders:

notifications_per_order max_notification_delay notification_count

Then from orders to customers:

total_orders avg_order_value orders_last_30_days notification_rate_per_order

The result is a feature table at the target level:

customer_id orders_last_30_days avg_order_value notification_rate days_since_last_order ...

This table can then be fed directly into a predictive model.

Why This Matters for Tabular Foundation Models

Tabular foundation models are strongest when operating on a well-constructed flat dataset.

GraphReduce helps produce that dataset automatically by traversing relational graphs, aggregating signals, and generating structured features.

The pipeline looks like this:

Relational DB -> GraphReduce -> Unified feature table -> Tabular foundation model (e.g. TabPFN) -> Prediction

In practice this can dramatically increase the throughput of building predictive systems, because the hardest step, data integration, becomes much easier.

Why This Is Still an Open Problem

Most AI discussion today focuses on models.

But for structured data systems, the real challenges are relational structure, temporal aggregation, signal propagation, and feature construction.

Until those problems are solved, the modeling layer will always be limited.

Tabular foundation models may significantly reduce the modeling effort.

But relational data preparation remains the gating step.

The interesting opportunity is combining both.

Example Implementation

Here is a simple end-to-end example combining relational aggregation with a tabular foundation model:

https://wesmadrigal.github.io/GraphReduce/end_to_end_examples/predictive_ai_tabpfn/

There is some similar research coming out of the University of Hong Kong: https://arxiv.org/pdf/2602.13697

Thoughts?

Comments

amazonbezos•1h ago
totally agree
madman2890•1h ago
with all of it? wow :)