Does OLAP Need an ORM

https://clickhouse.com/blog/moosestack-does-olap-need-an-orm

75•craneca0•5mo ago

Comments

bob1029•5mo ago

> If you’ve got your OLAP schemas as objects in your application code

I guess I have a wildly different interpretation of typical OLAP scenarios. To me this acronym mostly means "reporting". And in 99% of cases where the business desires a new report, the ideal views or type systems have not been anticipated. In these cases (most of them), I can't imagine a faster way to give the business an answer than just writing some sql.

timgdelisle•5mo ago

I'm one of the Moose maintainers, and yes, most OLAP use cases fall into data warehousing categories where exposing the database to analysts and letting them run loose with SQL is viable. We're seeing more and more that OLAP is becoming a core part of the application stack, for user and agent-facing analytics. There, we see a lot more appetite for building on the analytical stack the way we build on the transactional one.

ElatedOwl•5mo ago

I agree with that being fastest, but not cheapest.

In my experience these one off reports are very brittle. The app ends up making schema changes that are breaking to these one off reports, and you usually don’t find out until it goes to production.

I’ve dealt with the maintenance nightmare before. At current gig we’re exploring solutions, curious what a robust pipeline looks like in 2025.

The ORM piece is interesting — we use ActiveRecord and Ruby, and accidentally breaking schema changes within app will get caught by the unit test suite. I would love for a way to bring OLAP reports in similarly to test at CI time.

wredcoll•5mo ago

I mean, if you're relying on tests to catch schema changes... then test your sql reports? This doesn't seem like an amzingly cool solution but if that's the one you're already using...

lpapez•5mo ago

Why not test the OLAP reports?

Surely there is a way to run a raw query in Rails/ActiveRecord and use it in a smoke test?

datadrivenangel•5mo ago

They call this Data Contracts.

sdairs•5mo ago

Without doubt, the majority of the market for OLAP today is still internal warehousing & BI. But the market for using OLAP behind features inside user-facing B2C/B2B apps has been kicking off for quite a few years now. Big consumer apps like Stripe, Uber, Shopify...pretty much every B2B SaaS with a usage/metrics dashboard...they're usually punting queries off to an OLAP to populate those stats/charts. That's where something like this might come in handy, I can't imagine it being using for general internal reporting (in the current form, anyway.)

michaelmarkell•5mo ago

The way my company uses Clickhouse is basically that we have one giant flat table, and have written our own abstraction layer on top of it based around "entities" which are functions of data in the underlying table, potentially adding in some window functions or joins. Pretty much every query we write with Clickhouse tacks on a big "Group By All" at the end of it, because we are always trying to squash down the number of rows and aggregate as aggressively as possible.

I imagine we're not alone in this type of abstraction layer, and some type-safety would be very welcome there. I tried to build our system on top of Kysely (https://kysely.dev/) but the Clickhouse extension was not far along enough to make sense for our use-case. As such, we basically had to build our own parser that compiles down to sql, but there are many type-error edge cases, especially when we're joining in against data from S3 that could be CSV, Parquet, etc.

Side note: One of the things I love most about Clickhouse is how easy it is to combine data from multiple sources other than just the source database at query time. I imagine this makes the problem of building an ORM much harder as well, since you could need to build type-checking / ORM against sql queries to external databases, rather than to the source table itself

cies•5mo ago

My take is...

No one needs an ORM: https://dev.to/cies/the-case-against-orms-5bh4

The article opens with "ORMs have proven to be useful for many developers" -- I believe the opposite is true.

wredcoll•5mo ago

I think what I just really want is a language that treats sql as a "first class" component in the same way perl treats regexes.

The devil is of course in the details, but it's a nice dream.

weinzierl•5mo ago

Not quite first class citizen, but you might like sqlx. At least it embraces the idea that writing SQL directly is in fact a good idea and helps you to do so safely.

https://docs.rs/sqlx/latest/sqlx/

caspper69•5mo ago

LINQ? Just throwing it out there; obviously not everybody can or wants to run a C#/.NET stack, but entity framework (core) is about as close as you can get to the perl and regex integration. I think Ruby on Rails gets there too, but I'm not a RoR guy, so I can't comment.

wredcoll•5mo ago

I think linq is the closest anyone has come so far, although it is a dsl with slightly different semantics as I understand.

ActiveRecord, RoR's ORM is the complete opposite.

SoftTalker•5mo ago

> a language that treats sql as a "first class" component

pl/pgSQL (or pl/sql in oracle) and variants.

lucisferre•5mo ago

I agree with this as well. I started my career at the height of ORMs. Most software developers were only learning the ORM APIs (which of course all differed significantly) and very few were learning SQL outside of the bare basics.

ORMs, like all abstractions, are a leaky abstraction. But I would argue because of the ubiquity and utility of SQL itself they are a very leaky one where eventually you are going to need to work around them.

After switching to just using SQL in all situations I found my life got a lot simpler. Performance also improved as most ORMs (Rails in particular) are not very well implemented from a performance standpoint, even for very simple use cases.

I can not recommend enough that people skip the ORM entirely.

cies•5mo ago

High five!

odie5533•5mo ago

How do you convert your type-safe native objects to and from the database in a reusable way? If you do anything in a reusable way, you're 95% of the way to an ORM. Or do you just accept that you get back random dictionaries from the database and don't care about type-safety?

wswope•5mo ago

You write INSERT and SELECT statements for the object types you want to persist.

What is your concern re: random types popping up? SQLite springs to mind as a prime offender due to not enforcing column types OOTB, but most dialects have rather strong typing.

If we’re talking about mapping UUIDs and datetimes from their DB representations to types defined by the language stdlib, that’s usually the responsibility of the DB driver, no?

odie5533•5mo ago

I'm talking knowing what the shape of the response and shape of the data going in ahead of time and being consistent. SQL is like a black box in and out. I mainly use Python. For that, it's nice to have things like Dataclasses for DTOs or Pydantic models or some sort of DTO class that has known field names and known types. When you use raw SQL, you lose all that or have to roll it yourself. And at that point, you're most of the way to an ORM or at least the data mapping portion of SQLAlchemy.

sgarland•5mo ago

Mapping a DB response to a Pydantic model is hardly an ORM.

primering•5mo ago

No, that is exactly what an ORM is, plus mapping it back. Anything around that is additional toolings that no ORM needs to be ORM, but is nonetheless usefull.

halfcat•5mo ago

Pydantic isn’t an ORM, any more than JSON.stringify() and JSON.parse() are an ORM.

Pydantic knows nothing of your database. It’s schema-on-read (a great pattern that pydantic is well suited for), or serialization, or validation, but not an ORM.

odie5533•5mo ago

Do you advise then use Pydantic for data mapping to/from raw SQL, to avoid using a full ORM? My thinking is you're almost at an ORM with that method with tools like SQLModel that I'm unsure what the benefit is to the plain Pydantic method.

halfcat•5mo ago

In one sense you’re right, but (at least in data projects) the goal is a bit different. We’re often reading not only from SQL databases, but also parquet files, CSV, JSON, APIs, piped input from another process’s STDOUT, and so on.

Basically we don’t always know what the future unknown data source we may be reading from, and also the schema of the source might change, but we can define what we expect on the receiving end (in pydantic), and have it fail loudly when our assumptions change.

odie5533•5mo ago

The nice thing about SQLModel is they're still Pydantic models so you can use them with custom data mappers like parquet, csv, json, etc. I think you make a good point about keeping the data model pure so you're not dependent on a data source. But I think SQLModel largely accomplishes that, and so does SQLAlchemy's declarative dataclass mapping (though I've not used the latter).

dijksterhuis•5mo ago

> Pydantic serves as a great tool for defining models for ORM (object relational mapping) libraries. ORMs are used to map objects to database tables, and vice versa.

https://docs.pydantic.dev/latest/examples/orms/

sdairs•5mo ago

Is the proposition in the OP not pretty much what you're suggesting in your blog? They're currently not using the query builder syntax, instead its pretty much "improving on SQL as strings" with a bunch of the other ORM-like benefits (type safety, autocomplete, etc.)

Perhaps saying "ORM" is a bit of a misnomer, but they're discussing the DX ergonomics of an ORM and acknowledging the exact challenges you describe

primering•5mo ago

Yeah it's funny they even mention ORM while at the same offering something that has nothing to do with ORMs at all. Yes, many ORM libraries offer additional tools like migration and querybuiler, but that's not the point of an ORM. ORM maps relation data to your OOP data structures. They completely misused the term entirely, which is kinda surprising.

cyberax•5mo ago

This really depends on the quality of the ORM. I used to write Java software, and Hibernate with QueryDSL saved me probably _months_ of typing. And I dare say, produced much nicer-looking code.

And for most of the code, the performance and overheads were negligible. C# with LINQ is even better, it provides strong typecheck for the queries and often has almost zero overhead.

I'm using Go now, and I don't even want to touch any of the available ORMs because they all suck, compared to the state-of-the-art in Java circa 20 years ago.

reactordev•5mo ago

The reasoning behind yes, it would help is in building data tools for people. So you load up your parque files with data, ingest it into your platform, it uses clickhouse (or some OLAP) for tabulation of data, the platform presents a UI that allows the data engineer to select which fields, etc.

This can only be achieved by utilizing some sort of type system. Whether it's reflecting on the tables, codegen on the fly, or having to write custom adapters for each structure. All of which can be greatly simplified with an ORM.

It's not going to help much with bespoke report asks from the business though.

oatsandsugar•5mo ago

Yeah, this is mainly aimed at applications that need an OLAP backend (think user facing analytics, or a database that backs chat applications)

AmazingGuy•5mo ago

it's called the Semantic Layer.

saadatq•5mo ago

> Borrow the best core concepts:

Schemas as application code means you get version control, PR review and type‑safe changes. A query builder that feels like SQL and lets you write “real” ClickHouse queries with IDE autocompletion and compile‑time checking. Local development and CI should mirror production so you can preview schema changes before they apply to prod.

>>>

I believe this is what dbt set out to accomplish. They came at the problem from the point of view of a data transformation language that is essentially a pseudo type checked SQL for analytical engines with some additional batteries included (ie macros) but the motivation was similar. I’ve always felt that what has held dbt back from more mainstream adoption by the dev community is because they’ve prioritized data transformation over data access to the application layer - ie business intelligence tools over a web app.

Moosestack looks interesting- will definitely check it out.

RobinL•5mo ago

When I read the title my brain immediately jumped to a slightly different idea. With olap, I often find it annoying to figure out the joins from the fk/pk relationships, so I was imagining a tool that kind of automatically followed the links for you. A bit like how a orm gives you auto complete, but without the user having to manually enter the schema.

And I wanted it to emit the raw SQL because that's generally what I want for olap.

So I had to go at building it. If anyone's interested a very rough demo/prototype is here: https://www.robinlinacre.com/vite_live_pg_orm/

Load in the demo Northwind schema and click some tables/columns to see the generated joins

reactordev•5mo ago

I like how it joins the inner tables to get to your desired table. Going from employees to products is a walk.

ram_rar•5mo ago

Unpopular opinion: in 2025, nobody should be reaching for an ORM first. They’re an anti-pattern at this point. The “abstraction” it promises rarely delivers—what you actually get is leaky, slow, and a nightmare to operate at scale.

The sane middle ground is libraries that give you nicer ergonomics around SQL without hiding it (like Golangs sqlx https://github.com/jmoiron/sqlx). Engineers should be writing SQL, period.

oatsandsugar•5mo ago

> The sane middle ground is libraries that give you nicer ergonomics around SQL without hiding it (like Golangs sqlx https://github.com/jmoiron/sqlx). Engineers should be writing SQL, period.

The blog suggests that an ORM for OLAP would do exactly that

jelder•5mo ago

Strongly agree! Rust’s sqlx is also insanely great, and I like sqlc for Go as well.

I’ve written a lot about this particular topic: https://www.jacobelder.com/2025/01/31/where-shift-left-fails...

bakugo•5mo ago

I think you're fundamentally misunderstanding the primary purpose of an ORM. It's in the name - Object Relational Mapper. It's meant to ease the mapping from SQL query results into objects in your code, and from objects back to SQL queries. Doing this manually at scale when you have a large number of tables is also a nightmare.

There's no rule saying you can't integrate your own manually written SQL with an ORM, and in fact, any production-ready, feature-complete ORM will allow you do it, because it's effectively a requirement for any non-trivial use case.

primering•5mo ago

Well said. Please never be silent over this fact. It's important to educate people on what an ORM is, what it means and especially what it doesn't mean. Especially in times where VC-baked companies misinform and manipulate people about that, like Prisma is doing

parpfish•5mo ago

the trouble is that even if people embrace that thinking, the ORM encourages them to pull entities out of the db and do a bunch of computation in the server that would be much faster to do in the db.

cyberax•5mo ago

> Engineers should be writing SQL, period.

Is it a variation of: "I suffered when I was young, so everyone must suffer as I did?"

SQL is terrible, however you slice it. It's a verbose bass-ackwards language. ORMs that remove the need to deal with SQL for 99% of trivial cases are great.

The rest 1% can remain painful.

tucnak•5mo ago

Please stop spewing nonsense.

SQL remains the only way to efficiently perform MANY computations in the database precisely because it's lingua franca for the database planner. If you're not writing SQL, it doesn't mean that you're unable to cover 1% of the queries, it only means that you're leaving 99% of performance on the table. You can tell a bad programmer by their refusing to use views and materialized views. Not to mention normalisation! I'm yet to see a coder using ORM produce a sane schema. And don't get me started on aggregates. Relational databases represent relations, not objects, period.

cyberax•5mo ago

Please stop spewing nonsense.

> If you're not writing SQL, it doesn't mean that you're unable to cover 1% of the queries, it only means that you're leaving 99% of performance on the table.

Honestly? You're spewing bullshit. In most apps most of SQL is trivial. It's typically limited to simple joins (with indexes) with simple filters. Anything more complicated, and it's usually not suitable for OLTP. Heck, all our SQL is linted to not have full-table scans.

This kind of SQL is perfectly auto-generated by ORMs.

Those multi-page queries that required mystic DB knowledge for placing hints, burning incense, and paying $1000 per hour to Oracle consultants? They're entirely useless in modern software stacks. Because you can either keep the entire working set in RAM so these queries can be trivially rewritten, or you just won't use regular SQL for it.

> You can tell a bad programmer by their refusing to use views and materialized views.

You can tell a bad programmer by them using DB in a way that requires materialized views. It typically ends with moving app logic into SQL, and may even lead to "SELECT '<td>' + row.cust_name + '</td>'".

halfcat•5mo ago

> Engineers should be writing SQL, period.

As another commenter wrote:

”If you're doing OLAP…SQL isn't wholly adequate for this, it's hard work to get the SQL right and if there's joins involved it's not hard to accidentally fan out and start double counting.”

I feel this in my bones. Anytime someone in the business changes something in a source system, the joins create extra rows. Trying to address this with SQL is like plowing a field with a spoon.

And I don’t think ORMs are the answer. Just imperative code. The ability to throw in a few lines of sanity checking outside of SQL is such a massive boost to reliability and data quality, when the source systems are moving underneath your feet.

tucnak•5mo ago

In my experience, LATERAL is a massive quality-of-life improvement when it comes to double-counting specifically; where a normal subquery is not possible, a correlated subquery in FROM is much nicer than a CTE+JOIN.

Doubleagree on sanity checks.

barrkel•5mo ago

If you're doing OLAP, you probably want dimensions, measures and operators that operate on time aggregations and shifts. You want rollups and drill downs along multiple axes, with subtotals and probably pivots.

SQL isn't wholly adequate for this, it's hard work to get the SQL right and if there's joins involved it's not hard to accidentally fan out and start double counting.

If you ask me, you want an analytic model of the data that is designed around measures, dimensions, with an anointed time dimension, and a way of expressing higher level queries such that it automatically aggregates depending on which dimensions you leave out, and gives you options to sort, pivot, filter etc. dynamically.

This doesn't look like entities, really, but it is a model between you and the SQL.

From my scan - not detailed - reading of the article, Moose looks too low level and not a useful abstraction to sit in the same logical place that ORMs do in OLTP databases.

timgdelisle•5mo ago

Very much agree with you, at this point the abstraction is too low-level to be considered a proper ORM (or whatever the acronym should be for OLAP) and we're progressively working our way up to the right level. I love the idea of operating at the dimensions/measures level. Hoping we address this concern in the next couple of releases! Really appreciate the feedback

paulbar•5mo ago

I fully agree. This is why I created a SQL query engine for my own need few years ago, it is not open-sourced. It is called SquashQL (https://github.com/squashql/squashql). It provides a "simple" API (available in Typescript and Java for now) to perform "complex" and "dynamic" SQL queries such as pivot queries, standard queries with rollups or partial rollups with the ability to hide totals and subtotals, time-series and hierarchical comparison, bucketing, drilling across and more...

dcmatt•5mo ago

Just expose a way to explore hierarchies and measures, give me an ability to generate "alternate hierarchies" then let me run MDX and leave me alone! ORM and OLAP don't belong in the same sentence together.

vladsanchez•5mo ago

Unrelated but I once stumbled with http://cubes.databrewery.org/index.html

While the domain modeling exerts (some/lots of) friction, an OLAP ORM adaptation may result from forking their concept.

We Mourn Our Craft

Speed up responses with fast mode

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Al Lowe on model trains, funny deaths and working with Disney

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

I Write Games in C (yes, C)

SectorC: A C Compiler in 512 bytes

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

Selection Rather Than Prediction

History and Timeline of the Proco Rat Pedal (2021)

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Software factories and the agentic moment

We Mourn Our Craft

Speed up responses with fast mode

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Al Lowe on model trains, funny deaths and working with Disney

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

I Write Games in C (yes, C)

SectorC: A C Compiler in 512 bytes

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

Selection Rather Than Prediction

History and Timeline of the Proco Rat Pedal (2021)

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Software factories and the agentic moment

Does OLAP Need an ORM

Comments