frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Materialized views are obviously useful

https://sophiebits.com/2025/08/22/materialized-views-are-obviously-useful
120•gz09•1d ago

Comments

erulabs•1d ago
What a great post. Humble and honest and simple and focused on an issue most developers think is so simple (“why not just vibe code SQL?”, “whatever, just scale up the RDS instance”).

Compliments aside, where this article stops is where things get exciting. Postgres shines here, as does Vitess, Cassandra, ScyllaDB, even MongoDB has materialized views now. Vitess and Scylla are so good, it’s a shame they’re not more popular among smaller startups!

What I haven’t seen yet is a really good library for managing materialized views.

malthejorgensen•7h ago
Don’t you have to manually “refresh” Postgres materialized views, essentially making it an easier to implement cache (the Redis example in the blog post) rather than the type always-auto-updating materialized view the blog post author is actually touting?
lbreakjai•6h ago
Out of the box, you're right, but there are extensions that do just that:

https://github.com/sraoss/pg_ivm

It's however not available on RDS, so I've never had the chance to try it myself.

ropable•2h ago
Yes, you need to refresh the materialized views periodically. Which mean that, just like any other caching mechanism, you're solving one problem (query performance) but introducing another (cache invalidation). I've personally used Postgres MVs to great success, but there are tradeoffs.
striking•1h ago
The real bummer is not that you have to manually refresh them, it's that refreshing them involves refreshing the entire view. If you could pick and choose what gets refreshed, you might just sometimes have a stale cache here and there while parts of it get updated. But refreshing a materialized view that is basically just not small or potentially slightly interesting runs the risk of blowing your write instance up.

For this reason I would strongly advise, in the spirit of https://wiki.postgresql.org/wiki/Don't_Do_This, that you Don't Do Materialized Views.

Sure, Differential/Timely Dataflow exist and they're very interesting; I have not gotten to build a database system with them and the systems that provide them in a usable format to end users (e.g. Materialize) are too non-boring for me to want to deploy in a production app.

recroad•12m ago
So the author is wrong that they’re automatic kept in sync?
enedil•4h ago
Materialized views in ScyllaDB are (were?) known to be a buggy implementation. In particular, they often depended on the cluster being healthy at the time of propagating the changes.
bdcravens•2h ago
In lieu of good MV support, you can always just run a scheduled query to store results in a persisted table that you identify as a materialized view. For example, when doing this in SQL Server, I give the table name a "cache" prefix.
quectophoton•1d ago
> And then by magic the results of this query will just always exist and be up-to-date.

With PostgreSQL the materialized view won't be automatically updated though, you need to do `REFRESH MATERIALIZED VIEW` manually.

4ndrewl•1d ago
Just landed here to write this. Materialized Views are _very_ implementation specific and are definitely _not_ magic.

It's important to understand how your implementation works before committing to it.

shivasaxena•1d ago
Curious if anyone know any implementation where they would be automatically updated?

Now that would be awesome!

EDIT: come to think of it, it would require going through CDC stream, and figuring out if any of the tables affected are a dependency of given materialized view. Maybe with some ast parsing as well to handle tenants/partitions. Sounds like it can work?

magicalhippo•17h ago
MSSQL and Sybase SQLAnywhere has options for that, we use it a fair bit with both.

At least on SQLAnywhere it seems to be implemented using triggers, ie not unlike what one would do if rolling your own.

cyanydeez•9h ago
Postgesql knows when you try to drop its dependencies so shouldnt be to hard to watch traffic
4ndrewl•6h ago
Oracle will do - a couple of options, either a full rebuild or an incremental rebuild.
tanelpoder•6h ago
Also, latest Oracle version (23ai) has added "concurrent on-commit fast refresh" functionality, where concurrent transactions' changes are rolled up to the MV concurrently (previously these refreshes were serialized).

https://oracle-base.com/articles/23/materialized-view-concur...

From the article: "Concurrent refreshes will only happen if there are no overlaps in the materialized view rows updated by each session."

lsuresh•5h ago
Do give us at Feldera a shot -- full IVM for arbitrary SQL + UDFs: https://github.com/feldera/feldera/
dalyons•1d ago
Postgres materialized views are pretty terrible / useless compared to other rdbms. I’ve never found a usecase for the very limited pg version.
thrown-0825•20h ago
having a dataset refresh on a timer and cache the result for future queries is pretty useful
JohnBooty•7h ago
I've only used Postgres' and (ages ago) MSSQL's materialized views. What is pg missing compared to the others?

I've found them VERY useful for a narrow range of use cases but, I probably don't realize what I'm missing.

globular-toast•7h ago
In Postgres a materialized view is basically a table that remembers the query used to generate it. Useful if you want to trigger the refreshes without knowing the queries.
jelder•14h ago
Did I miss in the article where OP reveals the magic database that actually does this?

3rd party solutions like https://readyset.io/ and https://materialize.com/ exist specifically because databases don’t actually have what we all want materialized views to be.

sophiebits•8h ago
These startups (and a handful of others) are what I meant!
jelder•8h ago
What others do you know of?
jamesblonde•7h ago
Feldera
sophiebits•7h ago
Feldera, RisingWave, DeltaStream, Epsio, Decodable, Confluent all seem to have some offerings in this space. Probably others too!
rapind•8h ago
You can do targeted materialized view updates via triggers. It's definitely verbose but does give you a lot of control.

I'm currently parking PostgREST behind Fastly (varnish) for pretty much the same benefits plus edge CDNs for my read APIs. I really just use materialized views for report generation now.

anon84873628•7h ago
In the analytics world, BigQuery MVs seem pretty cool. You can tune the freshness parameters, it will maintain user-specific row-level security, and even rewrite regular queries to use the pre-computed aggregates if possible.

But I don't think there is anything similar in the GCP transactional db options like Spanner or CloudSQL.

oftenwrong•8h ago
There is a PostgreSQL extension that adds support for incremental updates to materialised views: https://github.com/sraoss/pg_ivm
Jupe•1d ago
> (Technically speaking, if 100 people load the same page at the same time and the cache isn’t populated yet, then we’ll end up sending 100 queries to the database which isn’t amazing, but let’s just pretend we didn’t hear that.)

Isn't their tech to address that, like golang's "singleflight"?

bob1029•16h ago
> I don’t know yet if the implementations of this yet are good enough to use at scale. Maybe they’re slow or maybe the bugs aren’t ironed out yet.

This technique is very well supported in the big commercial engines. In MSSQL's Indexed View case, the views are synchronously updated when the underlying tables are modified. This has implications at insert/update/delete time, so if you are going to be doing a lot of these you might want to do it on a read replica to avoid impact to production writes.

https://learn.microsoft.com/en-us/sql/relational-databases/v...

https://learn.microsoft.com/en-us/sql/t-sql/statements/creat...

sophiebits•8h ago
TIL, thanks! I know Postgres and MySQL don’t include an equivalent.
MangoToupe•3h ago
Can we inspect MSSQL's source or is it shipped as a blob? I can't find any serious information about how this works. I can't imagine who would want to spend money on this.
porridgeraisin•9h ago
In general counts are just so commonly needed in 1:N relationships. Specifically counts grouped by "status" of the object on the N side of the relationship. I've never seen a CRUD app where this didn't eventually make the feature list.

So I just pre-emptively break "normal form" and maintain counts in the database right from the start. I either update it with multiple stmts everytime in a txn, or with triggers. In sqlite I prefer triggers mostly because I know the operation and edge cases of sqlite trigger impl better than I know other DBs'.

Caching can just be invalidation-based in this case.

mkleczek•8h ago
That way you increase contention and harm scalability. In sqlite it does not matter as writes are single-threaded anyway. But in other DBMSes it does. What's more: in PostgreSQL you increase bloat as every update creates a new row version.

Most of the times the solution is not pre-aggregation but proper indexes on foreign key columns (missing fk indexes is one of the most common mistakes in RDB design).

the_sleaze_•7h ago
I have to admit I do the same as GP here. However I would always do both - index and keep a count on the parent as a first step.

> increase contention and harm scalability

Contention, concurrent connections, high-throughput and the associated race conditions are absolute table stakes for RDBMs. They just won't be taken seriously if they can't handle thousands of concurrent updates. So imho for 90% of projects this just won't be an issue.

> PostgreSQL you increase bloat as every update creates a new row version

This is true, but the complexity has to live somewhere. There will be as many rows added to a materialized view as there will be in the original table.

> Most of the times the solution is not pre-aggregation

This is wrong. Caching = pre-aggregation and is almost always either the solution or a part of the solution to scalability, latancy etc. Don't compute what you can retrieve.

porridgeraisin•6h ago
Yeah if index + count(*) works fine, then that is enough. This will usually not be a purely index-only scan though in practice, in MVCC databases. Since it sometimes has to inspect the actual row to see if its visible to the current transaction or not. So it's "mostly" an index scan. If you're not getting an index-only scan, count(*) can become a bottleneck in my experience even with FK indexes.

[All just my understanding]

wavemode•7h ago
This shouldn't require denormalization. `SELECT COUNT(*)` ought to be very fast if you have a foreign key index.
lblume•5h ago
Denormalization in order to improve performance like this does not seem like the right tradeoff to me. You are making the database much less robust, and anomalies much more likely, in what should be as simple as what the original post mentions.
thom•8h ago
Materialize.com and Snowflake have pretty reliable incremental materialised views now, with caveats that aren’t back breaking. If you can transform in SQL rather than having to build a whole new pipeline or microservice to do the work that’s a pure operational win. I consider this alongside hybrid transactional/analytical databases to be the holy grail of data infrastructure. Finally we can stop just shuffling data around, support almost all workloads in one place, and get some work done.
viccis•3h ago
Yep Databricks does this pretty well too. I think their sales jargon for it is the "Enzyme engine"
MangoToupe•3h ago
> Finally we can stop just shuffling data around

Bro that's your entire job description. What is left?

jmull•8h ago
I curious why an index can't handle that first query well.
th0ma5•7h ago
This is my thing, I often thought of these views as a way to bridge organizational divides rather than technical ones. Still cool! But if you own everything you can do all kinds of other stuff just as easily.
mb7733•7h ago
Indexes can only help narrow down to the issues for the project (more generally: matching rows for the query). Once the index narrows down the rows, Postgres still has to count them all, and Postgres isn't particularly fast at that, especially in an active table[0]. That's what the author meant by 'complete index scan of the tasks for the project'.

Of course this isn't really relevant until there are a very large number of rows to count for a given query. Much larger than what is likely for "tasks in a project". I've run into this only with queries that end up counting 10e7/8/9 rows, i.e. more like OLAP workloads

[0] https://wiki.postgresql.org/wiki/Slow_Counting

crazygringo•3h ago
There is no reason. An index is the proper solution for dealing with <1K tasks per project, conservatively. (On modern SSD's you'd probably still be plenty fast for <100K tasks.)

In fact, the query would return the result straight from counting the project_id index entries, never even needing to scan the table itself (as the author acknowledges).

If you're really dealing with many, many thousands of tasks per project, then materialized views are going to be just as slow to update as to view. They're not "magic". The standard performant solution would be to keep a num_tasks field that was always incremented or decremented in a transaction together with inserting or deleting a task row. That will actually be lightning fast.

Materialized views aren't even supported in many common relational databases. They're a very particular solution that has very particular tradeoffs. Unfortunately, this article doesn't go into the tradeoffs at all, and picks a bad example where they're not even an obviously good solution in the first place.

jamesblonde•7h ago
This triggered me in the article

'There are a few startups these days peddling a newfangled technology called “incremental view maintenance” or “differential dataflow”. '

Incremental view maintenance can change recomputation cost of view updates from O(N) to O(1). DBSP is based on z-sets, a generalization of relational algebra. The paper won best paper at SIGMOD. There is a startup, Feldera, commercializing it.

This is just ignorance to dismiss as 'new fangled'.

PerryStyle•7h ago
+1. Learned about this in DB research course during grad school. Feldera is really cool.

Also I love their website design.

lsuresh•5h ago
Thanks for the kind words (Feldera co-founder here). I'll pass it on to the design team. :)
anon84873628•7h ago
I mean, everything you said sounds exactly like the definition of "new fangled" to me. I don't think the term is meant to be so pejorative or dismissive, just that the tech is currently intimidating to people not on the cutting edge. (Edit: e.g. taking graduate level database courses, as mentioned by a sibling comment :-)

There is constantly so much new stuff in software, you have to be a bit willfully ignorant of some things some of the time.

jamesblonde•6h ago
Does going from O(N) to O(1) sound like "new fangled"? That is the smell of progress
Zen1th•6h ago
You're taking "new fangled" too literally. Is it new, as in not everyone concerned is aware of it? Yes! I think the author is as enthusiastic about this as you are.
lysergic•7h ago
Here are some commercial and source-available options:

I'm not affiliated with any of these names, I'm just really interested in IVM:

Materialize Readyset Feldera RisingWave

globular-toast•7h ago
This is a weird article. The author doesn't even mention what database they are talking about then just drops in some SQL that looks like Postgres. If you think Postgres will magically have the right values in it for a materialized view you will be very disappointed...
quasarj•6h ago
Yeah, that was my thoughts as well. What database is this? In Postgres you definitely have to update materialized views manually....
xixixao•4h ago
Convex's queries[0] are another example, with perhaps a somewhat simpler approach to the tracking and invalidation.

[0] https://stack.convex.dev/how-convex-works

aboodman•4h ago
FYI, Zero uses incremental view maintenance internally as the core of its sync engine:

https://zero.rocicorp.dev/docs/introduction

IVM is what allows the core feature of our DX: that users can just do a fairly large query with sorts and joins, like:

  zero.query.posts.where('authorID', 42)
    .orderBy('created', 'desc')
    .limit(1000)
    .related('comments',
      c => c.orderBy('created', 'desc').limit(10))
... and we sync updates to this query incrementally to the client.

TanStack DB also uses IVM, but only client-side currently.

If you are interested in such things, you can take a poke around the source here: https://github.com/rocicorp/mono/tree/main/packages/zql. Or come find us in Discord: https://discord.rocicorp.dev/

wslh•3h ago
A 2020 HN thread on differential dataflow and some comments about materialized views: https://news.ycombinator.com/item?id=24837031
Aeolun•3h ago
This is like React or SolidJS for database queries.
joaomacp•2h ago
[After just using `count()`]

> Uh oh, someone is tapping you on the shoulder and saying this is too slow because it has to do a complete index scan of the tasks for the project, every time you load the page

Just ask them if that's actually the bottleneck and go for a walk outside, before sweating over anything else discussed in this post.

ropable•2h ago
Materialized views are great and (obviously) useful, but they have the usual tradeoffs of any caching mechanism (e.g. now you have to worry about cache data age and invalidation.

IMO a slept-on database feature is table partitioning to improve query performance. If you have a frequently-used filter field that you can partition on (e.g. creation timestamp), then you can radically improve query performance of large databases by having the DB only need to full-scan the given partitions. The database itself manages where records are placed, so there is no additional overhead complexity beyond initial setup. I've only used this for PostgreSQL, but I assume that other databases have similar partition mechanisms.

npn•8m ago
This is silly. The proper way is to keep a task_count field on projects table and update it on create/delete action. Way more flexible and perfomant.

Show HN: Sping – An HTTP/TCP latency tool that's easy on the eye

https://dseltzer.gitlab.io/sping/docs/
36•zorlack•3h ago•3 comments

The two versions of Parquet

https://www.jeronimo.dev/the-two-versions-of-parquet/
130•tanelpoder•3d ago•31 comments

Busy beaver hunters reach numbers that overwhelm ordinary math

https://www.quantamagazine.org/busy-beaver-hunters-reach-numbers-that-overwhelm-ordinary-math-202...
33•defrost•2d ago•4 comments

Ghrc.io appears to be malicious

https://bmitch.net/blog/2025-08-22-ghrc-appears-malicious/
242•todsacerdoti•3h ago•30 comments

Is 4chan the perfect Pirate Bay poster child to justify wider UK site-blocking?

https://torrentfreak.com/uk-govt-finds-ideal-pirate-bay-poster-boy-to-sell-blocking-of-non-pirate...
171•gloxkiqcza•10h ago•127 comments

Prison isn’t set up for today’s tech so we have to do legal work the old way

https://prisonjournalismproject.org/2025/08/19/prisons-outdated-technology-hurts-our-chances-at-f...
69•danso•3h ago•34 comments

We put a coding agent in a while loop

https://github.com/repomirrorhq/repomirror/blob/main/repomirror.md
127•sfarshid•10h ago•94 comments

Making games in Go: 3 months without LLMs vs. 3 days with LLMs

https://marianogappa.github.io/software/2025/08/24/i-made-two-card-games-in-go/
259•maloga•12h ago•180 comments

My ZIP isn't your ZIP: Identifying and exploiting semantic gaps between parsers

https://www.usenix.org/conference/usenixsecurity25/presentation/you
41•layer8•3d ago•14 comments

A Brilliant and Nearby One-off Fast Radio Burst Localized to 13 pc Precision

https://iopscience.iop.org/article/10.3847/2041-8213/adf62f
48•gnabgib•7h ago•7 comments

Y Combinator files brief supporting Epic Games, says store fees stifle startups

https://www.macrumors.com/2025/08/21/y-combinator-epic-games-amicus-brief/
80•greenburger•3d ago•62 comments

How to check if your Apple Silicon Mac is booting securely

https://eclecticlight.co/2025/08/21/how-to-check-if-your-apple-silicon-mac-is-booting-securely/
40•shorden•3h ago•8 comments

Trees on city streets cope with drought by drinking from leaky pipes

https://www.newscientist.com/article/2487804-trees-on-city-streets-cope-with-drought-by-drinking-...
151•bookofjoe•2d ago•79 comments

How many paths of length K are there between A and B? (2021)

https://horace.io/walks
15•jxmorris12•7h ago•2 comments

Burner Phone 101

https://rebeccawilliams.info/burner-phone-101/
285•CharlesW•4d ago•108 comments

Everything I know about good API design

https://www.seangoedecke.com/good-api-design/
199•ahamez•8h ago•77 comments

Cloudflare incident on August 21, 2025

https://blog.cloudflare.com/cloudflare-incident-on-august-21-2025/
143•achalshah•2d ago•29 comments

Halt and Catch Fire Syllabus (2021)

https://bits.ashleyblewer.com/halt-and-catch-fire-syllabus/
105•Kye•6h ago•27 comments

Show HN: Clearcam – Add AI object detection to your IP CCTV cameras

https://github.com/roryclear/clearcam
164•roryclear•15h ago•47 comments

GNU cross-tools: musl-cross 313.3M

https://github.com/cross-tools/musl-cross
17•1vuio0pswjnm7•4h ago•2 comments

Iterative DFS with stack-based graph traversal (2024)

https://dwf.dev/blog/2024/09/23/2024/dfs-iterative-stack-based
27•cpp_frog•3d ago•2 comments

NASA's Juno mission leaves legacy of science at Jupiter

https://www.scientificamerican.com/article/how-nasas-juno-probe-changed-everything-we-know-about-...
65•apress•3d ago•27 comments

Stepanov's biggest blunder? The curious case of adjacent difference

https://mmapped.blog/posts/43-stepanovs-biggest-blunder
39•signa11•3d ago•8 comments

Comet AI browser can get prompt injected from any site, drain your bank account

https://twitter.com/zack_overflow/status/1959308058200551721
491•helloplanets•11h ago•173 comments

Bash Strict Mode

http://redsymbol.net/articles/unofficial-bash-strict-mode/
4•dcminter•2d ago•2 comments

OS Yamato lets your data fade away

https://github.com/osyamato/os-yamato
17•tsuyoshi_k•3d ago•13 comments

Claim: GPT-5-pro can prove new interesting mathematics

https://twitter.com/SebastienBubeck/status/1958198661139009862
118•marcuschong•4d ago•79 comments

Using acetaminophen during pregnancy may increase childrens autism and ADHD risk

https://hsph.harvard.edu/news/using-acetaminophen-during-pregnancy-may-increase-childrens-autism-...
8•spchampion2•3h ago•0 comments

Show HN: I Built a XSLT Blog Framework

https://vgr.land/content/posts/20250821.xml
31•vgr-land•9h ago•11 comments

Will at centre of legal battle over Shakespeare’s home unearthed after 150 years

https://www.theguardian.com/culture/2025/aug/21/will-at-centre-of-legal-battle-over-shakespeares-...
43•forthelose•1d ago•14 comments