Do you even need a database?

https://www.dbpro.app/blog/do-you-even-need-a-database

33•upmostly•3h ago

Comments

the_inspector•2h ago

In many cases not. E.g. for caching with python, diskcache is a good choice. For small amounts of data, a JSON file does the job (you pointed to JSONL as an option). But for larger collections, that should be searchable/processable, postgres is a good choice.

Memory of course, as you wrote, also seems reasonable in many cases.

vovanidze•2h ago

people wildly underestimate the os page cache and modern nvme drives tbh. disk io today is basically ram speeds from 10 years ago. seeing startups spin up managed postgres + redis clusters + prisma on day 1 just to collect waitlist emails is peak feature vomit.

a jsonl file and a single go binary will literally outlive most startup runways.

also, the irony of a database gui company writing a post about how you dont actually need a database is pretty based.

upmostly•2h ago

The irony isn’t lost on us, trust me. We spent a while debating whether to even publish this one.

But yeah, the page cache point is real and massively underappreciated. Modern infrastructure discourse skips past it almost entirely. A warm NVMe-backed file with the OS doing the caching is genuinely fast enough for most early-stage products.

vovanidze•1h ago

props for actually publishing it tbh. transparent engineering takes are so rare now, usually its just seo fluff.

weve basically been brainwashed to think we need kubernetes and 3 different databases just to serve a few thousand users. gotta burn those startup cloud credits somehow i guess.

mad respect for the honesty though, actually makes me want to check out db pro when i finally outgrow my flat files.

upmostly•1h ago

I'm feel like I could write another post: Do you even need serverless/Cloud because we've also been brainwashed into thinking we need to spend hundreds/thousands a month on AWS when a tiny VPS will do.

Similar sentiment.

hilariously•1h ago

You are both right, with the exception that it requires knowledge and taste to accomplish, both of which are in short supply in the industry.

Why setup a go binary and a json file? Just use google forms and move on, or pay someone for a dead simple form system so you can capture and commmunicate with customers.

People want to do the things that make them feel good - writing code to fit in just the right size, spending money to make themselves look cool, getting "the right setup for the future so we can scale to all the users in the world!" - most people don't consider the business case.

What they "need" is an interesting one because it requires a forecast of what the actual work to be done in the future is, and usually the head of any department pretends they do that when in reality they mostly manage a shared delusion about how great everything is going to go until reality hits.

I have worked for companies getting billions of hits a month and ones that I had to get the founder to admit there's maybe 10k users on earth for the product, and neither of them was good at planning based on "what they need".

grep_it•53m ago

Except that eventually you'll find you lose a write when things go down because the page cache is write behind. So you start issuing fsync calls. Then one day you'll find yourself with a WAL and buffer pool wondering why you didn't just start with sqlite instead.

ghc•1h ago

I'm so old I remember working on databases that were designed to use RAW, not files. I'm betting some databases still do, but probably only for mainframe systems nowadays.

bob1029•1h ago

https://docs.oracle.com/cd/B16276_01/doc/win.102/b14305/arch...

chuckadams•1h ago

I need a filesystem that does some database things. We got teased with that with WinFS and Beos's BFS, but it seems the football always gets yanked away, and the mainstream of filesystems always reverts back to the APIs established in the 1980s.

z3ugma•1h ago

At some point, don't you just end up making a low-quality, poorly-tested reinvention of SQLite by doing this and adding features?

gorjusborg•1h ago

Only if you get there and need it.

upmostly•1h ago

Exactly. And most apps don't get there and therefore don't need it.

evanelias•1h ago

Your article completely ignores operational considerations: backups, schema changes, replication/HA. As well as security, i.e. your application has full permissions to completely destroy your data file.

Regardless of whether most apps have enough requests per second to "need" a database for performance reasons, these are extremely important topics for any app used by a real business.

z3ugma•1h ago

but it's so trivial to implement SQLite, in almost any app or language...there are sufficient ORMs to do the joins if you don't like working with SQL directly...the B-trees are built in and you don't need to reason about binary search, and your app doesn't have 300% test coverage with fuzzing like SQLite does

you should be squashing bugs related to your business logic, not core data storage. Local data storage on your one horizontally-scaling box is a solved problem using SQLite. Not to mention atomic backups?

9rx•1h ago

> and your app doesn't have 300% test coverage with fuzzing like SQLite does

Surely it does? Otherwise you cannot trust the interface point with SQLite and you're no further ahead. SQLite being flawless doesn't mean much if you screw things up before getting to it.

RL2024•1h ago

That's true but relying on a highly tested component like SQLite means that you can focus your tests on the interface and your business logic, i.e. you can test that you are persisting to the your datastore rather than testing that your datastore implementation is valid.

9rx•53m ago

Your business logic tests will already, by osmosis, exercise the backing data store in every conceivable way to the fundamental extent that is possible with testing given finite time. If that's not the case, your business logic tests have cases that have been overlooked. Choosing SQLite does mean that it will also be tested for code paths that your application will never touch, but who cares about that? It makes no difference if code that is never executed is theoretically buggy.

moron4hire•1h ago

Came here to also throw in a vote for it being so much easier to just use SQLite. You get so much for so very little. There might be a one-time up-front learning effort for tweaking settings, but that is a lot less effort than what you're going to spend on fiddling with stupid issues with data files all day, every day, for the rest of the life of your project.

gorjusborg•1h ago

Honestly, there is zero chance you will implement anything close to sqlite.

What is more likely, if you are making good decisions, is that you'll reach a point where the simple approach will fail to meet your needs. If you use the same attitude again and choose the simplest solution based on your _need_, you'll have concrete knowledge and constraints that you can redesign for.

hirvi74•1h ago

Sqlite is also the only major database to receive DO-178B certification, which allows Sqlite to legally operate in avionic environments and roles.

freedomben•1h ago

Sometimes yes, I've seen it. It even tends to happen on NoSQL databases as well. Three times I've seen apps start on top of Dynamo DB, and then end up re-implementing relational databases at the application level anyway. Starting with postgres would have been the right answer for all three of those. Initial dev went faster, but tech debt and complexity quickly started soaking up all those gains and left a hard-to-maintain mess.

leafarlua•1h ago

This always confuses me because we have decades of SQL and all its issues as well. Hundreds of experienced devs talking about all the issues in SQL and the quirks of queries when your data is not trivial.

One would think that for a startup of sorts, where things changes fast and are unpredictable, NoSQL is the correct answer. And when things are stable and the shape of entities are known, going for SQL becomes a natural path.

There is also cases for having both, and there is cases for graph-oriented databases or even columnar-oriented ones such as duckdb.

Seems to me, with my very limited experience of course, everything leads to same boring fundamental issue: Rarely the issue lays on infrastructure, and is mostly bad design decisions and poor domain knowledge. Realistic, how many times the bottleneck is indeed the type of database versus the quality of the code and the system design?

dalenw•51m ago

It's almost always a system design issue. Outside of a few specific use cases with big data, I struggle to imagine when I'd use NoSQL, especially in an application or data analytics scenario. At the end of the data, your data should be structured in a predictable manner, and it most likely relates to other data. So just use SQL.

greenavocado•42m ago

System design issues are a product of culture, capabilities, and prototyping speed of the dev team

noveltyaccount•1h ago

As soon as you need to do a JOIN, you're either rewriting a database or replatforming on Sqlite.

freedomben•1h ago

I avoided DBs like the plague early in my career, in favor of serialized formats on disk. I still think there's a lot of merit to that, but at this point in my career I see a lot more use case for sqlite and the relational features it comes with. At the least, I've spent a lot less time chasing down data corruption bugs since changing philosophy.

Now that said, if there's value to the "database" being human readable/editable, json is still well worth a consideration. Dealing with even sqlite is a pain in the ass when you just need to tweak or read something, especially if you're not the dev.

giva•1h ago

> Dealing with even sqlite is a pain in the ass when you just need to tweak or read something, especially if you're not the dev.

How? With SQL is super easy to search, compare, and update data. That's what it’s built for.

freedomben•1h ago

Pain in the ass was way too strong, I retract that. Mainly I meant relative. For example `nvim <filename>.json` and then /search for what I want, versus tracking down the sqlite file, opening, examining the schema, figuring out where the most likely place is that I care about, writing a SQL statement to query, etc.

giva•1h ago

Well, you still need to track down the <filename> part and knowing what you want to search, so you need to examine the schema anyway.

However, if your all application state can be represented in a single json file of less than a dozen MB, yes, a database can be overkill.

freedomben•14m ago

> Well, you still need to track down the <filename> part and knowing what you want to search, so you need to examine the schema anyway.

Yes agreed, but it's usually a lot easier to find the filename part, especially if the application follows XDG. Sqlite databases are usually buried somewhere because they aren't expected to be looked at.

fatih-erikli-cg•1h ago

I agree. Databases are useless. You don't even need to load it into the memory. Reading it from the disk when there is a need to read something must be ok. I don't believe the case that there are billions of records so the database must be something optimized for handling it. That amount of records most likely is something like access logs etc, I think they should not be stored at all, for such case.

Even it's postgres, it is still a file on disk. If there is need something like like partitioning the data, it is much more easier to write the code that partitions the data.

If there is a need to adding something with textinputs, checkboxes etc, database with their admin tools may be a good thing. If the data is something that imported exported etc, database may be a good thing too. But still I don't believe such cases, in my ten something years of software development career, something like that never happened.

Sharlin•1h ago

Not sure if sarcastic…

bsenftner•1h ago

I worked as a software engineer for 30 years before being forced to use a database, and that was for a web site. I've been coding actively, daily, since the 70's. Forever we just wrote proprietary files to disk, and that was the norm, for decades. Many a new developer can't even imagine writing their own proprietary file formats, the idea literally scares them. The engineers produced today are a shadow of what they used to be.

anonymars•1h ago

Yeah, it scares me because I'm experienced enough to know all the difficulties involved in keeping durable data consistent, correct, and performant

vlapec•42m ago

>The engineers produced today are a shadow of what they used to be.

…and it won’t get better anytime soon.

zeroonetwothree•32m ago

Poe’s law in action?

gavinray•1h ago

Not to nitpick, but it would be interesting to see profiling info of the benchmarks

Different languages and stdlib methods can often spend time doing unexpected things that makes what looks like apples-to-apples comparisons not quite equivalent

srslyTrying2hlp•1h ago

I tried doing this with csv files (and for an online solution, Google Sheets)

I ended up just buying a VPS, putting openclaw on it, and letting it Postgres my app.

I feel like this article is outdated since the invention of OpenClaw/Claude Opus level AI Agents. The difficulty is no longer programming.

fifilura•1h ago

Isn't this the same case the NoSQL movement made.

jbiason•1h ago

Honestly, I have been thinking about the same topic for some time, and I do realize that direct files could be faster.

In my (hypothetical, 'cause I never actually sat down and wrote that) case, I wanted the personal transactions in a month, and I realized I could just keep one single file per month, and read the whole thing at once (also 'cause the application would display the whole month at once).

Filesystems can be considered a key-value (or key-document) database. The funny thing about the example used in the link is that one could simply create a structure like `user/[id]/info.json` and directly access the user ID instead of running some file to find them -- again, just 'cause the examples used, search by name would be a pain, and one point where databases would handle things better.

m6z•1h ago

I have found that SQLite can be faster than using text or binary files, confirming their claims here: https://sqlite.org/fasterthanfs.html

forinti•1h ago

Many eons ago I wrote a small sales web application in Perl. I couldn't install anything on the ISP's machine, so I used file-backed hashes: one for users, one for orders, another for something else.

As the years went by, I expected the client to move to something better, but he just stuck with it until he died after about 20 years, the family took over and had everything redone (it now runs Wordpress).

The last time I checked, it had hundreds of thousands of orders and still had good performance. The evolution of hardware made this hack keep its performance well past what I had expected it to endure. I'm pretty sure SQLite would be just fine nowadays.

jwitchel•1h ago

This is a great incredibly well written piece. Nice work showing under the hood build up of how a db works. It makes you think.

randusername•1h ago

Separate from performance, I feel like databases are a sub-specialty that has its own cognitive load.

I can use databases just fine, but will never be able to make wise decisions about table layouts, ORMs, migrations, backups, scaling.

I don't understand the culture of "oh we need to use this tool because that's what professionals use" when the team doesn't have the knowledge or discipline to do it right and the scale doesn't justify the complexity.

ForHackernews•1h ago

Surprised to see this beating SQLite after previously reading https://sqlite.org/fasterthanfs.html

XorNot•1h ago

I've just built myself a useful tool which now really would benefit from a database and I'm deeply regretting not doing that from the get-go.

So my opinion has thoroughly shifted to "start with a database, and if you _really_ don't need one it'll be obvious.

But you probably do.

kabir_daki•1h ago

We built a PDF processing tool and faced this exact question early on.

For our use case — merge, split, compress — we went fully stateless. Files are processed in memory and never stored. No database needed at all.

The only time a database becomes necessary is when you need user accounts, history, or async jobs for large files. For simple tools, a database is often just added complexity.

The real question isn't "do you need a database" but "do you need state" — and often the answer is no.

JohnMakin•1h ago

everyone thinks this is a great idea until they learn about file descriptor limits the hard way

Joeboy•1h ago

Don't know if it counts, but my London cinema listings website just uses static json files that I upload every weekend. All of the searching and stuff is done client side. Although I do use sqlite to create the files locally.

Total hosting costs are £0 ($0) other than the domain name.

shafoshaf•1h ago

Relational Databases Aren’t Dinosaurs, They’re Sharks. https://www.simplethread.com/relational-databases-arent-dino...

The very small bonus you get on small apps is hardly worth the time you spend redeveloping the wheel.

MattRogish•1h ago

"Do not cite the deep magic to me witch, I was there when it was written"

If you want to do this for fun or for learning? Absolutely! I did my CS Masters thesis on SQL JOINS and tried building my own new JOIN indexing system (tl;dr: mine wasn't better). Learning is fun! Just don't recommend people build production systems like this.

Is this article trolling? It feels like trolling. I struggle to take an article seriously that conflates databases with database management systems.

A JSON file is a database. A CSV is a database. XML (shudder) is a database. PostgreSQL data files, I guess, are a database (and indexes and transaction logs).

They never actually posit a scenario in which rolling your own DBMS makes sense (the only pro is "hand rolled binary search is faster than SQLite"), and their "When you might need" a DBMS misses all the scenarios, the addition of which would cause the conclusion to round to "just start with SQLite".

It should basically be "if you have an entirely read-only system on a single server/container/whatever" then use JSON files. I won't even argue with that.

Nobody - and I mean nobody - is running a production system processing hundreds of thousands of requests per second off of a single JSON file. I mean, if req/sec is the only consideration, at that point just cache everything to flat HTML files! Node and Typescript and code at all is unnecessary complexity.

PostgreSQL (MySQL, et al) is a DBMS (DataBase Management System). It might sound pedantic but the "MS" part is the thing you're building in code:

concurrency, access controls, backups, transactions: recovery, rollback, committing, etc., ability to do aggregations, joins, indexing, arbitrary queries, etc. etc.

These are not just "nice to have" in the vast, vast majority of projects.

"The cases where you'll outgrow flat files:"

Please add "you just want to get shit done and never have to build your own database management system". Which should be just about everybody.

If your app is meaningfully successful - and I mean more than just like a vibe-coded prototype - it will break. It will break in both spectacular ways that wake you up at 2AM and it will break in subtle ways that you won't know about until you realize something terrible has happened and you lost your data.

Didn't we just have this discussion like yesterday (https://ultrathink.art/blog/sqlite-in-production-lessons)?

It feels like we're throwing away 50 years of collective knowledge, skills, and experience because it "is faster" (and in the same breath note that nobody is gonna hit these req/sec.)

I know, it's really, really hard to type `yarn add sqlite3` and then `SELECT * FROM foo WHERE bar='baz'`. You're right, it's so much easier writing your own binary search and indexing logic and reordering files and query language.

Not to mention now you need a AGENTS.md that says "We use our own home-grown database nonsense if you want to query the JSON file in a different way just generate more code." - NOT using standard components that LLMs know backwards-and-forwards? Gonna have a bad time. Enjoy burning your token budget on useless, counter-productive code.

This is madness.

stackskipton•1h ago

SRE here. My "Huh, neat" side of my brain is very interested. The SRE side of my brain is screaming "GOD NO, PLEASE NO"

Overhead in any project is understanding it and onboarding new people to it. Keeping on "mainline" path is key to lower friction here. All 3 languages have well supported ORM that supports SQLite.

matja•1h ago

If you think files are easier than a database, check out https://danluu.com/file-consistency/

ktzar•54m ago

Writing your own storage is a great way to understand how databases work (if you do it efficiently, keeping indexes, correct data structures, etc.) and to come to the conclusion that if your intention wasn't just tinkering, you should've used a database from day 1.

koliber•5m ago

I love this article as it shows how fast computers really are.

There is one conclusion that I do not agree with. Near the end, the author lists cases where you will outgrow flat files. He then says that "None of these constraints apply to a lot of applications."

One of the constraints is "Multiple processes need to write at the same time." It turns out many early stage products need crons and message queues that execute on a separate worker. These multiple processes often need to write at the same time. You could finagle it so that the main server is the only one writing, but you'd introduce architectural complexity.

So while from the pure scale perspective I agree with the author, if you take a wider perspective, it's best to go with a database. And sqlite is a very sane choice.

If you need scale, cache the most often accessed data in memory and you have the best of both worlds.

My winning combo is sqlite + in-memory cache.

The Future of Everything Is Lies, I Guess: New Jobs

God Sleeps in the Minerals

Show HN: Every CEO and CFO change at US public companies, live from SEC

Want to Write a Compiler? Just Read These Two Papers (2008)

Good Sleep, Good Learning (2012)

MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints

Elevated errors on Claude.ai, API, Claude Code

Gemini Robotics-ER 1.6

Costasiella kuroshimae – Solar Powered animals, that do indirect photosynthesis

Do you even need a database?

Wacli – WhatsApp CLI

Fixing a 20-year-old bug in Enlightenment E16

Metro stop is Ancient Rome's new attraction

Forcing an Inversion of Control on the SaaS Stack

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

We ran Doom on a 40 year old printer controller (Agfa Compugraphic 9000PS) [video]

The Deepfake Nudes Crisis in Schools Is Worse Than You Thought

Pretty Fish: A better mermaid diagram editor

AI ruling prompts warnings from US lawyers: Your chats could be used against you

US v. Heppner (S.D.N.Y. 2026) no attorney-client privilege for AI chats [pdf]

Academic fraud may be the symptom of a more systemic problem

Study: Back-to-basics approach can match or outperform AI in language analysis

Your Backpack Got Worse on Purpose

New Modern Greek

Sam Vimes 'Boots' Theory of Socio-Economic Unfairness

Dependency cooldowns turn you into a free-rider

MIT Radiation Laboratory

A communist Apple II and fourteen years of not knowing what you're testing

My adventure in designing API keys

Direct Win32 API, Weird-Shaped Windows, and Why They Mostly Disappeared

The Future of Everything Is Lies, I Guess: New Jobs

God Sleeps in the Minerals

Show HN: Every CEO and CFO change at US public companies, live from SEC

Want to Write a Compiler? Just Read These Two Papers (2008)

Good Sleep, Good Learning (2012)

MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints

Elevated errors on Claude.ai, API, Claude Code

Gemini Robotics-ER 1.6

Costasiella kuroshimae – Solar Powered animals, that do indirect photosynthesis

Do you even need a database?

Wacli – WhatsApp CLI

Fixing a 20-year-old bug in Enlightenment E16

Metro stop is Ancient Rome's new attraction

Forcing an Inversion of Control on the SaaS Stack

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

We ran Doom on a 40 year old printer controller (Agfa Compugraphic 9000PS) [video]

The Deepfake Nudes Crisis in Schools Is Worse Than You Thought

Pretty Fish: A better mermaid diagram editor

AI ruling prompts warnings from US lawyers: Your chats could be used against you

US v. Heppner (S.D.N.Y. 2026) no attorney-client privilege for AI chats [pdf]

Academic fraud may be the symptom of a more systemic problem

Study: Back-to-basics approach can match or outperform AI in language analysis

Your Backpack Got Worse on Purpose

New Modern Greek

Sam Vimes 'Boots' Theory of Socio-Economic Unfairness

Dependency cooldowns turn you into a free-rider

MIT Radiation Laboratory

A communist Apple II and fourteen years of not knowing what you're testing

My adventure in designing API keys

Direct Win32 API, Weird-Shaped Windows, and Why They Mostly Disappeared

Do you even need a database?

Comments