Today it suffers from feature creep and too many pivots & acquisitions. That they are insanely bad at naming features doesn't help either.
The biggest gripe in have is how crazy expensive it is.
But these days just use trino or whatever. There are lots of new ways to work on data that are all bigger steps up - ergonomically, performance and price - over spark as spark was over hadoop.
What an impressive feat of engineering.
Can't imagine someone incapable of building a website would deliver a good (digital) product.
The product is still centered Spark, but most companies don't want or need Spark and a combination of Iceberg and DuckDB will work for 95% of companies. It's cheaper, just as fast or faster and way easier to reason about.
We're building a data platform around that premise at Definite[0]. It includes everything you need to get started with data (ETL, BI, datalake).
There are some examples[0] of enabling DuckDB to manage distributed workloads, but these are pretty experimental.
I really don't understand the valuation for this company. Why is it so high.
But if you're inclined to use it, databricks' setup of spark just saves you an incredible amount of time that you'd normally waste on configuration and wiring infrastructure (storage, compute, pipelines, unified access, VPNs etc). It's expensive and opinionated, but the data engineers you need to deal with spark OOM errors constantly is greater. Also databricks' default configs give you MUCH better performance out of the box than anything DIY and you don't have to fiddle with partitions and super niche config options to get even medium workloads stable
What are some bad UX choices you generally dislike in data products?
* No persist(). Not being able to cache dataframes is a nightmare when it's a workflow that involves taking a massive source of data, doing some rough filtering on it that gets it down to a tiny subset, and then doing more complex stuff with that.
* No good way to get usage info programmatically that I've found. For things like monitoring for periodic queries that get out of hand.
* Can't set Spark config. There are often ways to get around this, like when I recently had to set S3A credentials and needed a way that wasn't OS environment variables (this doesn't work for worker nodes). Eventually, through much documentation browsing and finally an exasperated hail mary question to ChatGPT that solved it (told me the things to pass into options() ) I got it working. But all the documentation and online QA resources just say to use Spark config.
* This is more of a Unity Catalog problem, but kind of applies because Serverless and UC often go very hand in hand (particularly when dealing with things that used to be stored in a cluster like credentials), but it drives me insane that I can only mount external volumes with the same block storage as my workspace provider. So I can't mount an external volume to an AWS bucket on an Azure UC. That means if I want to write stuff that can run the same regardless of what my customer is running their Databricks workspace with, I need to use less sophisticated approaches.
It's still nowhere near the pain that Databrick's attempt at copying Snowflake's VARIANT data type has caused me, but there are many times when I find myself having to work around serverless limitations. Especially when these limitations aren't really mentioned much upfront when Databricks pushes serverless aggressively.
Databricks' proprietary notebook format that introduced subtle incompatibilities with Jupyter was infuriating embrace-extend-extinguish style bullshit, but on-prem cluster instability causing jobs to crash on a daily basis was way more infuriating, and at that time, enterprises were more than happy to pay a premium to accelerate analytics teams.
In the 2010s, Databricks had a solid billion-dollar business. But Spark-as-a-Service by itself was never going to be a unicorn idea. AWS EMR was the giant tortoise lurking in the background, slowly but surely closing the gap. The status quo couldn't hold, and who doesn't want to be a unicorn? So, they bloated the hell out of the product, drank that off-brand growth-hacker Kool-Aid, and started spewing some of the most incoherent buzz-word salad to ever come out of the Left Coast. Just slapping data, lake, and house onto the ends of everything, like it was baby oil at a Diddy Party.
Now, here we are in 2025, deep into the terminal decline of enshittification, and they're just rotting away, waiting for One Real Asshole Called Larry Ellison to scoop them up and take them straight to Hell. The State of Florida, but for Big Data companies.
It would be a mystery to me too, why anyone would pick Databricks today for a greenfield project, but those enterprises from 5+ years ago are locked in hard now. They'll squeeze those whales and they'll shit money like a golden goose for a few more years, but their market share will steadily decrease over the next few years.
It's the cycle of life. Entropy always wins. Eventually the Grim Reaper Larry comes for us all. I wouldn't hate on them too hard. They had a pretty solid run.
As it happens, we've just launched our new Xata platform (https://xata.io/) which has some of the key Neon features: instant copy-on-write branching and separation of storage and compute. As an extra twist, we also can do anonymization (PII masking) between your production database and developer branches.
The way we do copy-on-write branches is a bit different. We haven't done any modifications to Postgres but do it completely at the storage layer, which is a distributed system in itself. This also brings some I/O performance opportunities.
While Xata has been around for a while, we're just launching this new platform, and it is in Private Beta. But we are happy to work with you if you are interested.
Btw, congrats to the Neon team!
These are the open source components:
* pgstream for the anonymization from the production branch
* pgroll for schema changes
* Xata Agent for the LLM-powered optimizations
If you want to get anonymization from your RDS/Aurora instance and into Xata branches, then you run only a CLI command (`xata clone`) which does something similar to pg_dump/pg_restore but with masking. It is based on our pgstream open source project.
Happy to organize a demo any time.
(Disclaimer: I work at Xata.) Just wanted to mention that we also support anonymization, in case that’s something you're looking into: https://xata.io/postgres-data-masking
2. FYI, couldn't request access via the BYOC form so I sent an email as per the error: There was an error, please try again or contact us at info@xata.io.
2. Thanks, I see you sent the email already, not sure why it failed. Will reach out over email.
Cynically, am I the only one who takes pause because of an acquisition like this? It worries me that they will need to be more focused on the needs of their new owners, rather than their users. In theory, the needs should align — but I’m not sure it usually works out that way in practice.
Same! I remember it too. I found it quite fascinating. Separation of storage and compute was something new to me, and I was asking them about Pageserver [0]. I also asked for career advice on how to get into database development [1].
Two years later, I ended up working on very similar disaggregated storage at Turso database.
Congrats to the Neon team!
I really do hope that their OSS strategy does not change due to this, as it's really friendly to people who want to learn their product and run smaller deployments. It's (intentionally or not) really hard to run at a big scale as the control plane is not open-source, which makes the model actually work.
I'm biased, I'm a big-data-tech refugee (ex-Snowflake) and am working on https://tower.dev right now, but we're definitely seeing the open source trend supported by Iceberg. It'll be really interesting to see how this plays out.
To be honest this is a little sad for me. I'd hoped that Neon would be able to fill the vacuum left by CockroachDB going "business source"
Being bought by DataBricks makes Neon far less interesting to me. I simply don't trust such a large organisation that has previously had issues acquiring companies, to really care about what is pretty much the most important infrastructure I've got.
There certainly is enough demand for a more "modern" postgresql, but pretty much all of the direct alternatives are straying far from its roots. Whether it be pricing, compatibility, source available etc.
Back when I was looking at alternatives to postgres these were considered:
1. AWS RDS: We were already on AWS RDS, but it is expensive, and has scaling and operations issues
2. AWS Aurora: The one that ended up being recommended, solved some operations issues, but came with other niche downsides. Pretty much the same downsides as other wire compatible postgresql alternatives
3. CockroachDB: Was very interesting, wire compatible, but had deeper compatibility issues, was open source at the time, it didn't fit with our tooling
4. Neon: Was considered to be too immature at the time, but certainly interesting, looked to be able to solve most of our challenges, maybe except for some of the operations problems with postgresql, I didn't look deeper into it at the time
5. Yugabyte: interesting technology, had some of the same compatibility issues, but less that the others, as they're also using the query engine from postgresql as far as I can tell.
There are also various self hosting utilities for PostgreSQL I looked at, specifically CloudPG, but we didn't have the resources to maintain a stateful deployment of kubernetes and postgres ourselves. It would fulfill most of our requirements, but with extra maintenance burden, both for Kubernetes and PostgreSQL.
Hosting PostgreSQL by itself, didn't have mature enough replication and operations features by itself at that point. It is steadily maturing, but as we'd got many databases manual upgrades and patches would be very time consuming, as PostgreSQL has some not so nice upgrade quirks. You basically have to unload and reload all data during major upgrades. Unless you use extensions and other services to circumvent this issue.
I'm interested if you'd care to elaborate.
Neon is Postgres.
https://neon.tech/databricks-faq
We're really excited about this, and will try to respond to some of the questions people have here later.
We’ve all read glowing blog posts and reassuring FAQs enough times after an acquisition only to see a complete about-face a few months or a year later.
I quite enjoyed using Neon but as a solo founder running my business on Neon I can’t help but think it’s insanity to not be looking for alternatives.
Databricks is _not_ a company I trust at all.
[0] if you don’t know, databricks acquired bit.io and shut down all databases within 30 days. Production databases had <30 days to migrate.
Something is always going to change, almost always in a way that impacts customers. In the best case it's something simple like a different name on the bill, other times it will leave customers scrambling for an alternative before a ridiculous deadline. It could happen within weeks, after a month, or it might take a year. The answers at the time of the announcement are the same regardless.
Most likely a holding state for a bit before databricks ruins it or shuts it down. I started looking around when the news broke last week or so for alternatives.
I really would prefer a managed DB for multiple reasons but I might need to look at just self-hosting. I might have spent less time futzing with my DB if I had done that from the start instead of going Aurora Serverless v1 -> Planetscale -> Neon.
[0] https://xata.io/
I think one of the coolest features of neon is being able to quickly spin up new DBs via the API (single tenant DBs) and while that is cool, my client list is small so manually creating the DBs is not a problem (B2B).
https://news.ycombinator.com/item?id=43899016
Databricks in talks to acquire startup Neon for about $1B (174 comments)
This would’ve been three acquisitions straight for me and… I’m okay, they’re awful. I just want stability.
Congrats to the neon team! I use and love neon. Really hope this doesn’t change them too much.
In a couple cases I’ve been recruited because I have a history of scaling and integrating acquisitions into companies successfully
Walking into something like that is tough because the two teams sort of don’t like each other and you’re really “neither”. I’d want to make sure I was interviewed by both teams
IMO, this is where the power of being hired into the situation is. No existing bias for either company and all the baggage that comes with that.
Allows a person to see the pros and cons of how things get done on both sides of the fence, and act accordingly
I mean, hindsight 20/20 here, but I would have loved the theoretical money @ 1 billion. But those are so rare and my experience in the past 15 years hasn’t matched those unicorns.
Basically I’ve come to the conclusion unless you have serious equity or you’re a founder, acquisition suck. You’re the one doing the work making these two companies come together, while the founders usually bounce or are stripped of any real power to change things.
My guess is that this team gets rolled into Online Tables tech, which would make product sense.
https://docs.databricks.com/aws/en/machine-learning/feature-...
It led to some serious burnout and I took several months off. I'm now happily working as an IC again.
Surely, there might be other agents creating Neon databases so we might be under-counting.
I guess it’s time to go back to the well of managed/serverless Postgres options…
I'm seeing a lot of DBX hate in this thread overall. I think it's warranted. At Tower[0], we're trying to provide a decent open solution. It stars with owning your own data, and Iceberg helps you break free.
[0] - https://tower.dev
It's fine. Probably actually a good place to work.
Databricks is the antithesis of Neon. Neon is driven by product, Databricks is driven by sales. Opinions of Databricks in a thread about Neon are going to be on the negative side (but not necessarily representative).
I've been an SA at Databricks for the past two years and love it here. The people you get to work with here are world-class and our customers legitimately love our product.
I too am a little confused about comments in threads on HN about Databricks, they seriously don't reflect what I see internally and what my customers say. I don't think I'd be working here if they did.
I like how they’re innovating, but it can be rough around the edges sometimes.
Is there an alternative for that? Scale-to-zero postgres, basically?
I used Serverless v1 and then they doubled the prices for v2 while removing features so I moved to PlanetScale. They were great but as I grew and wanted multiple smaller DBs they didn't really have a good way to do that and I moved to Neon. Now, with this news, I guess I'll be looking for an alternative.
Was just about to react to someone being wrong on the internet and say that this is not true. Instead, TIL that this is, in fact, the case. Since 2024Q4.
Thanks for invalidating my stale cache.
One click turns it off, or you can just leave it on. A $5 VM will run a lot of small postgres.
I use both neon and coolify, and could live with either, though apples and oranges when it comes to the data branching feature. But a quick pg_dump/restore which could even be scripted solves my problem. Disclaimer: I like devops in addition to just dev.
If I can throw together a random project, completely isolated, that costs $0.10 per month, that enables me to do many orders more random projects than something that costs me $5 per month.
This seems like quite the pivot though
Neon filled their product gap of not having an operational (row-oriented) DB.
On-premise open-source S3 is a problem though. MinIO is not something we're touching and other than that it looks a bit empty with enterprise ready solutions.
https://news.ycombinator.com/item?id=32148007
https://news.ycombinator.com/item?id=35299665
Ceph would be a theoretical option, but a) we don't have a lot of experience with it and b) it's relatively complex to operate. We'd really love to add a lighter option to our stack that's under the stewardship of a foundation.
MinIO is a good fit if you want a small cluster that doesn't require day 2 operational complexity, as you only store a few TBs.
I have not looked into them recently, but I doubt the core has changed. Being VC-funded and looking for an exit makes them overinvest in marketing and story telling.
They might have added an index by now but gatekept it to their enterprise AIStor offering since they’ve abandoned any investment in open source at this point or appearance that they care about that. Their initial inclination in response to this issue says everything - https://github.com/minio/minio/issues/20845#issuecomment-259...
Rook/ceph with object storage is pretty bulletproof: https://www.rook.io/docs/rook/v1.17/Storage-Configuration/Ob...
I do wish more systems had high quality operators out there. A lot of operators I have looked into are half baked, not reliable, or not supported.
One example from Snowflake is hybrid tables which adds rowstore next to columnar.
OLAP + OLTP = HTAP
ps: I worked at SingleStore. https://www.mooncake.dev/blog/htap-is-dead
The problem isn't in the CDC / replication tools in the market.
The problem is that columnar stores (especially Iceberg) are not designed for the write /upserts patterns of OLTP systems.
They just can't keep up...
This is a big problem we're hoping to solve at Mooncake [0]. Turn Iceberg into an operational columnstore. So that it can be keep up (<s freshness) with your Postgres.
And for second latency replication, it is more involving, you actually need to build layer on top of iceberg to track pk/ apply deletion.
I managed to add one startup and so far it’s done very well, but it was an exceptional case and the global CEO wanted the functionality. But it used MongoDB and ops team didn’t have any skills, so rather than learn one tiny thing for an irrelevant data store they added cash to use Atlas with all the support and RBAC etc etc. They couldn’t use the default Azure firewall because they only know one firewall, so added one of those too. Also loaded with contracts. Kept hiring load down, one number to call, job done. Startups cost is $5-10k per year. Support BS about $40k. (I forget the exact numbers but it dwarfed the startup costs.)
Startups are from Venus, enterprise are from Jupiter.
It’s basically a luxury minivan. It’s may not be the fastest or prettiest or cheapest, but it’s a safe way for a large family of “data and AI people” to traverse a large organisation.
More seriously, I like to call it an “analytics workbench” in a professional setting.
Hence IBM talking up Iceberg: https://www.ibm.com/think/topics/apache-iceberg
I bet they had VMware all over the place.
Neon is still early‑stage and, AFAIK, not profitable. It’s a perfect snapshot of 2025: anything that’s (1) serverless, and (2) even vaguely AI‑adjacent is trading at a multiple nobody would have believed two years ago. Also supports my hypothesis that the next 12 months will be filled with cash acquisitions.
> Databricks will ruin Neon;
I certainly hope not. Focus on DX, friendly free tier, and community support is what made it special. If that vanishes behind Databricks’ enterprise guardrails, the goodwill will vanish with it.
What the hell do profits have to do with valuing tech startups?
Valuations like this only make sense if there’s a clear path to significant strategic leverage or future cash flow.
I've been hearing that Neon is burning through cash pretty aggressively, which raised eyebrows for me. But you're right: high margins and scalability mean profits can be deferred.
It's vastly more complicated to do this efficiently than you might imagine. Postgres' internal architecture is built around a very different set of assumptions (pages, WAL, local disk etc.) than what the S3 API offers.
> Can’t you use a cloud provider and have them host this for you?
If it really is all OSS, then I guess the moat is the impressive execution of this team.
>As Neon became GA last year, they noticed an interesting stat: 30% of the databases were created by AI agents, not humans. When they looked at their stats again recently, the number went from 30% to over 80%. That is, AI agents were creating 4 times more databases versus humans.
For me this has alarm bells all over it. Databricks is trying to pump postgres as some sort of AI solution. We do live in weird times.
And yes, congratulations to the Neon team! (Nikita is, after all, YC)
The OP and I built an HTAP system at SingleStore. A single database with one copy of data for both OLTP and OLAP workloads. HTAP never took off [0].
What we learned was that OLTP (Postgres) should handle OLTP, while OLAP (data warehouses/lakes) should handle OLAP, with replication between them.
Designing the 'up-to-date' replication between these systems is hard.... columnar stores just aren’t built for OLTP‑style writes, and can't keep up with your OLTP tables.
Let’s see if Databricks and Neon can pull this off
“give me up‑to‑date Postgres tables in Unity Catalog", no debezium --> kafka --> flink --> Iceberg. With Spark jobs in the back ensuring that Iceberg is an optimal state.
Interesting trend - modern serverless databases choosing Rust for its memory safety, performance predictability. Makes sense for systems where reliability and efficiency are non-negotiable.
We've been running OpenRaft in production for several years now and have found it to be quite stable. It's designed as a generic, feature-complete Raft library that handles the complexities of distributed consensus well. If you're looking for a mature Rust Raft implementation, it's definitely worth considering.
I believe that future data platforms will adopt an all-in-one approach, offering OLTP, OLAP, as well as support for other hybrid workloads such as vector, graph, and time series. This will lower user costs and be more friendly to applications in the AI era.
It's already available and open source by the nice folks from MIT and they even wrote a book on it [1],[2].
You can use it to develop modern datahub for data engineering and analytics [3],[4].
[1] D4M: Dynamic Distributed Dimensional Data Model:
[2] Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs:
https://mitpress.mit.edu/9780262038393/mathematics-of-big-da...
[3] Technical Report: Developing a Working Data Hub:
https://arxiv.org/abs/2004.00190
[4] Collaborative data analytics with DataHub:
Databricks now reminds me of Oracle. Still a great product, but it's a melting delicious ice cream.
Smaller companies don't usually need Databricks until they grow and become larger with more complex needs, and enough data that queries start to take a long time to run.
As bare metal gets so much faster the point where you hit scaling issues with a DB becomes further and further away.
As this trend continues more companies might decide they don't need Databricks they just need a few Postgres replicas and nothing else.
Neon is kind of like an insurance policy against this outcome.
lmc•8mo ago
WSJ article: https://www.wsj.com/articles/databricks-to-buy-startup-neon-...