> The basic promise of a query optimizer is that it picks the “optimal” query plan.
This is kind of true, but the optimiser has to do all this with very tight constraints on available time and resources. A planner that returned a perfect plan every time would be useless if it took 500ms and 500mb, so I'd say a better phrasing would be
> The basic promise of a query optimizer is that it picks a good plan most of the time and does it quickly and efficiently.
Hum, no. The basic promise of a query optimizer is that it picks a good plan all of the time. Otherwise it's worse than useless and you would be better with a DB where you can pin the plan for every query.
But yes, the goal is on "good", not "optimal".
The poster you responded to is correct, it's a combinatorial problem that can't be expected to pick a good plan all of the time within the normal data and time constraints.
Yes. There are techniques in high end RDBMS like adaptive query processing which allow for the engine to update its approach during execution.
https://learn.microsoft.com/en-us/sql/relational-databases/p...
https://www.ibm.com/docs/en/i/7.6.0?topic=overview-adaptive-...
https://apex.oracle.com/pls/apex/features/r/dbfeatures/featu...
For queries issued by machines where predictability is the most important thing, it probably makes sense to hard-code plans or at least pin a plan so it can't change out from underneath you at midnight in production. I'm not sure about Postgres, but you can do this on Oracle DB. There's probably a better way to express schemas in which both indexing and plans can be constructed ahead of time in many cases where the developer already knows the likely distribution of data, and not having performance cliffs is more important than the convenience of the planner. Example:
@Entity class SomeTable { @Mostly("SUCCESSFUL", "FAILED") SomeEnum row; }
and then the DB mapper - not necessarily an ORM - could provide the query hints needed to force the DB onto a reasonable plan for the annotated distribution.
Not possible. It’s explicitly a non-goal, sub-optimal plans are considered a bug to be fixed, you can’t even force the use of an index.
To their credit, the Postgres query planner is amazing and does generally work very well that you don’t need forcing indexes or plans. But that’s little comfort when it does not and you have a production incident on your hands
Edit: I think RDS Aurora Postgres does let you have managed query plans
I like that Postgres has a lot of tools at its disposal for executing queries. But you can't rely on it for data even with modest scale. It's rare that it does a good job with interesting queries.
Even session scoped flags are a really coarse tool, don’t guarantee you’ll get the plan you want and it might unexpectedly impact other queries in the session.
Materialised CTE are one of the only tools that give real control, but an optimisation barrier is often the opposite of what you want
alexisread•3d ago
This applies the filter before the join, compared with:
Select from mytable1 join mytable2 on... Where...
Which relies on the query plan to resolve the correct application of where, and hope it puts the where before the join.
This becomes more important in the cloud, where Iceberg tables cannot be benchmarked in the same way as single-cluster relational DBs.
brudgers•1d ago
On the other hand, if you are hand coding the query, you are hand coding whether you use SQL, Linq, or anything else. And a strength of SQL is a robust ecosystem of documentation, tools, employment candidates, and consultants.
gigatexal•7h ago
databases are complicated beasts and tens of thousands if not more working years of the smartest people have been working on them since the 70s
hyperpape•5h ago
Yes, but they've also set themselves a much harder problem. Instead of "find the proper execution for this particular query, armed with all the knowledge of the database you have" they have the problem of "find the proper execution for all possible queries, using only information that's encoded in DB statistics."
So it's no surprise that it's quite easy to find places the optimizer does silly things. To take one instance that bit me recently: or clauses representing different tables cannot be optimized well at all in postgres (https://www.cybertec-postgresql.com/en/avoid-or-for-better-p...).
I'd still rather have optimizers than not--the optimizer does well most of the time, and my current work has far too many queries to hand-optimize. But you do end up seeing a lot of poorly optimized queries.
gigatexal•1h ago
gigatexal•7h ago
so I don't get it
gonzalohm•6h ago
Sesse__•6h ago
hobofan•7h ago
Filter pushdown ("correct application of where, and hope it puts the where before the join") is table stakes for a query planner and has been for decades.
And no, Iceberg tables are not special in any way here. Iceberg tables contain data statistics, just like the ones described in the article to make the optimizer choose the right query plan.
Sesse__•5h ago
adamzochowski•6h ago
If someone has a complex query or complex performance, they could do either subselects
or CTEs