https://fly.io/blog/litestream-revamped/ https://news.ycombinator.com/item?id=44045292
at the end of the day with litestream, when you respond back to a client with a successful write you are only guaranteeing a replication factor of 1.
SQLite is designed for one local client at a time. Client-server relational databases are designed for many clients at a time.
If you have lots of concurrent writes SQLite isn't the right solution. For concurrent reads it's fine.
SQLite also isn't a network database out-of-the-box. If you want to be able to access it over the network you need to solve that separately.
(Don't try and use NFS. https://sqlite.org/howtocorrupt.html#_filesystems_with_broke... )
That becomes an instant problem if users ever write to your database. You can't duplicate the environment unless it's read-only.
And even if the database is read-only for users, the fact that every time you update it you need to redeploy the database to every client, is pretty annoying.
That's why it's usually better to start with Postgres or MySQL. A single source of truth for data makes everything vastly easier.
If you have a read-heavy app (99% of saas) that runs on one server and dont have millions of users then sqlite is a great option.
Usually you want to be able to run multiple webservers against a single database though, since that's the first thing you'll usually need to scale.
Now if I'm building a local app then absolutely sqlite makes the most sense but I don't see it otherwise.
dpkg -i litestream.deb
systemctl enable litestream
systemctl start litestream
The fact it's so simple is my favourite thing about it.Systemctl's only in there to restart it if it crashes; litestream itself is (iirc) a single cli binary.
I love postgres but in no way is it as simple to run as sqlite (pretty sure even postgres core team would agree that postgres is more complex than sqlite).
People care about things like long-term support. Postgres 13, from 2020, is still officially supported. Litestream 0.1.0 was the first release, also from 2020, but I can't tell if it is supported still. Worrying about the maturity, stability, and support of an application database is very reasonable in risk adverse projects.
Besides the question wasnt litestream vs postgres backup apps. It was sqlite vs postgres.
More than once I've started a project with sqlite and then had to migrate to postgres. In my experience it's because of the way sqlite has to lock the whole database file while writing to it, blocking concurrent reads - this isn't a problem in postgres. (There's WAL mode, but it still serialises all writes, and requires periodic explicit checkpointing IME)
You may also find you really want a feature postgres has, for example more advanced data types (arrays, JSON), more advanced indices (GIN inverted index on array members), replication...
If you’re building something as a hobby project and you know it will always fit on one server, sqlite is perfect.
If it’s meant to be a startup and grow quickly, you don’t want to have to change your database to horizontally scale.
Deploying without downtime is also much easier with multiple servers. So again, it depends whether you’re doing something serious enough that you can’t tolerate dropping any requests during deploys.
For sure downtime is easier with kubernete etc but again overkill for 99,99% of apps.
On the other side, why not just store everything in memory and flush to a local json file if you won't have any users? sqlite is overkill!
It's definitely really nice though that if you do choose sqlite initially to keep things as small and simple as possible, you don't immediately need to switch databases if you want to scale.
This is essentially what GraphQL does instead of crafting each of these super tailored API endpoints for each of your screens, you use their query language to ask for the data you want, it queries the DB for you and get you the data back in a single network roundtrip from the user perspective.
(Not an expert, so I trust comments to correct what I got wrong)
For the latter, it's along the lines of `select * from posts where ...` and `select * from authors where id in {posts.map(author_id)}`. And then once it's in memory you manually work out the associations (or rely on your ORM to do it).
In particular, JSON aggregations mean you can have a single query that does things like fetch a blog entry and the earliest 10 comments in a single go. I wrote up some patterns for doing that in SQLite and PostgreSQL here: https://github.com/simonw/til/blob/main/sqlite/related-rows-...
Here's an example PostgreSQL query that does this to fetch tags for posts: https://simonwillison.net/dashboard/json-agg-example
select
blog_entry.id,
title,
slug,
created,
coalesce(json_agg(json_build_object(blog_tag.id, blog_tag.tag)) filter (
where
blog_tag.tag is not null
), json_build_array()) as tags
from
blog_entry
left join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
left join blog_tag on blog_entry_tags.tag_id = blog_tag.id
group by
blog_entry.id
order by
blog_entry.id desc
The alternative, more common path is the pattern that Django calls "prefetch_related". Effectively looks like this: select id, title, created from posts order by created desc limit 20
-- Now extract the id values from that and run:
select
blog_entry.id,
blog_tag.tag
from
blog_entry
join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
join blog_tag on blog_entry_tags.tag_id = blog_tag.id
where
blog_entry.id in (?, ?, ?, ...)
-- Now you can re-assemble the list of tags for
-- each entry in your application logic
Once you have a list of e.g. 20 IDs you can run a bunch of cheap additional queries to fetch extra data about all 20 of those items.No, JOINs should be orders of magnitude faster.
> What are you really supposed to do if you need to, e.g. fetch a list of posts along with the number of comments on each post?
You're really supposed to do a JOIN, together with a GROUP BY and a COUNT(). This is elementary SQL.
of course, it's all possible with custom SQL but it gets complicated quick.
The filtering you describe is trivial with COUNT(flag IN (...) AND state=...) etc.
If you want to retrieve data on the last comment, as opposed to an aggregate function of all comments, you can do that with window functions (or with JOIN LATERAL for something idiomatic specifically to Postgres).
Learning how to do JOIN's in SQL is like learning pointers in C -- in that it's a basic building block of the language. Learning window functions is like learning loops. These are just programming basics.
For anything that isn't just a basic row lookup from a single table, you should really just be writing the SQL yourself.
Alternately, if you can separate query issuance from result parsing, you can make N+1 palletable. Ex, do your query to get the ids, wait for the results, loop and issue the N queries, then loop and wait for results in order. That will be two-ish round trips rather than N+1 round trips. But you have to search to find database apis that allow that kind of separation.
[1] You can almost always express the query you want in SQL, but that doesn't mean it will have a reasonable runtime. Sometimes server side join and client side join have about the same runtime... if it's significant and you have the usual case where clients are easier to scale than database servers, it might be worthwhile to have the join run on the client to reduce server load.
In general, you want to ask the remote server once for all the data you need, then read all the results. It applies to databases as well as APIs.
Pipelined requests also solve the problem and can be more flexible.
Also, joins can be optimised in different ways. Sometimes the optimal way to do a join isn't to query each row one-by-one, but to do something like (when the rows you want are a large fraction of the rows that exist) making a bloom filter of the rows you want and then sequentially reading all the rows in the table.
More formally, the number of queries should be constant and not linearly scaling with the number of rows you're processing.
You can install MySQL/PostgreSQL on the application server, connect over a unix socket and get the same benefits as if you'd used SQLite on the application server (no network latency, fast queries). Plus the other benefits that come from using these database servers (Postgres extensions, remote connections, standard tooling etc). I'm guessing more RAM is required on the application server than if you used SQLite but I haven't benchmarked it.
But I'd love to hear more from someone more well-versed in the use cases for reliable sql-lite
In my opinion if you have an easy way to run postgres,MySQL,... - just run that.
There's usually a lot of quirks in the details of DB usage (even when it doesn't immediately seem like it - got bitten by it a few times). Features not supported, different semantics, ...
IMO every project has an "experimental stuff" budget and if you go over it it's too broken to recover, and for most projects there's just not that much to win by spending them on a new database thing
This is an interesting take; why do you see recent hype around the most boring and stone-age of technologies, SQLite?
Not super sure who followed who but there was all of a sudden a lot of excitement
SQLite's "buzz" isn't new, type "sqlite" into my https://tools.simonwillison.net/hacker-news-histogram tool and you'll see interest (on HN at least) has been pretty stable since 2021.
While I don't have stats about every conference talk for the last decade, my experience has been that SQLite has been featured more in Rails conference talks. There's a new book titled "SQLite on Rails: The Workbook" that I don't think would have had an audience five years ago. And I've noticed more blog posts and more discussion in Rails-related discussion platforms. Moreover, I expect we'll see SQLite gain even more in popularity as it simplifies multi-agent development with multiple git worktrees.
Bit more stretched out than I thought it had been
But most apps should just use a classic n-tier database architecture like Postgres. We mostly do too (though Litestream does back some stuff here like our token system).
In every case where I had a SQLite vertical that required resilience, the customer simply configured the block storage device for periodic snapshots. Litestream is approximately the same idea, except you get block device snapshots implicitly as part of being in the cloud. There is no extra machinery to worry about and you won't forget about a path/file/etc.
Also, streaming replication to S3 is not that valuable an idea to me when we consider the recovery story. All other solutions support hot & ready replicas within seconds.
MySQL, Postgres, etc. have a much greater overhead for setup, unless you want to pay for a managed database, which is not going to be worth the price for small quantities of data.
If you have access to a database that is well managed on your behalf I would definitely still go with that for many usecases.
So this fills that gap by giving you a database as a service level of QOL without needing to provision a database as a service backend. Otherwise you're dicking about maintaining a service with all that comes with that (provisioning, updating, etc) when really all you need is a file that is automagically backed up or placed somewhere on the web to avoid the drawbacks of the local file system.
> If there are many client programs sending SQL to the same database over a network, then use a client/server database engine instead of SQLite.
I once was responsible for migrating a legacy business app to Azure, and the app had a local MSSQL server co-running with the app (the same pattern that Litestream is using).
As have been mentioned below, the app had been developed assuming the local access (and thus <1ms latency), so it had a ton of N+1 everywhere.
This made it almost impossible to migrate/transition to another configuration.
So, if this style of app hosting doesn't take off and you're at all worried about this being a dead end storage once you reach a certain scale, I'd recommend not doing this, otherwise your options will be very limited.
Then again - I bet you could get very very far on a single box, so maybe it'd be a non factor! :)
It was a Rails app, therefore easy to get into the N+1 but also somewhat easy to fix.
def test_homepage_queries(django_assert_max_num_queries, client):
with django_assert_max_num_queries(10):
assert client.get("/").status_code == 200
Or django_assert_num_queries to assert an exact number.Probably some of the most valuable code I've ever written on a per LOC basis lol.
But anyhow, merging that into a new project was always a fun day. But on the other side of the cleanup the app stops falling down due to memory leaks.
It's usually a big problem for database performance because each query carries additional overhead for the network round trip to the database server.
SQLite queries are effectively a C function call accessing data on local disk so this is much less of an issue - there's an article about that in the SQLite docs here: https://www.sqlite.org/np1queryprob.html
A naive ORM setup will often end up doing a 1 query to get a list of object it needs, and then perform N queries, one per object, usually fetching each object individually by ID or key.
So for example, if you wanted to see “all TVs by Samsung” on a consumer site, it would do 1 query to figure out the set of items that match, and then if say 200 items matched, it would do 200 queries to get those individual items.
ORMs are better at avoiding it these days, depending on the ORM or language, but it still can happen.
It used to be a very common pitfall - and often not at all obvious. You’d grab a collection of objects from the ORM, process them in a loop, and everything looked fine because the objects were already rehydrated in memory.
Then later, someone would access a property on a child object inside that loop. What looked like a simple property access would silently trigger a database query. The kicker was that this could be far removed from any obvious database access, so the person causing the issue often had no idea they were generating dozens (or hundreds) of extra queries.
I really wish there was a way to compose SQL so you can actually write the dumb/obvious thing and it will run a single query. I talked with a dev once who seemed to have the beginnings of a system that could do this. It leveraged async and put composable queryish objects into a queue and kept track of what what callers needed what results, merged and executed the single query, and then returned the results. Obviously far from generalizable for arbitrary queries but it did seem to work.
e.g. for ActiveRecord there's ar_lazy_preloader[0] or goldiloader[1] which fix many N+1s by keeping track of a context: you load a set of User in one go, and when you do user.posts it will do a single query for all, and when you then access post.likes it will load all likes for those and so on. Or, if you get the records some other way, you add them to a shared context and then it works.
Doesn't solve everything, but helps quite a bit.
stories = get_stories(query)
which results in a SQL query like SELECT id FROM stories WHERE author = ?
with the '?' being bound to some concrete value like "Jim".Then, the framework will be used to do something like this
for id in stories {
story = get_story_by_id(id)
// do something with story
}
which results in N SQL queries with SELECT title, author, date, content FROM stories WHERE id = ?
and there's your N+1We ran into this while building, funnily enough, a database management app called DB Pro (https://dbpro.app) At first we were doing exactly that: query for all schemas, then for each schema query its tables, and then for each table query its columns. On a database with hundreds of tables it took ~3.8s.
We fixed it by flipping the approach: query all the schemas, then all the tables, then all the columns in one go, and join them in memory. That dropped the load time to ~180ms.
N+1 is one of those things you only really “get” when you hit it in practice.
Its not its fault. :)
With single instances topping out at 20+ TBs of RAM and hundreds of cores, I think this is likely very under-explored as an option
Even more if you combine this with cell-based architecture, splitting on users / tenants instead of splitting the service itself.
Configure the init flags to disable all controllers and other doodads, deploy them yourself with Helm. Helm sucks to work with but someone has already gone through the pain for you.
AI is GREAT at K8s since K8s has GREAT docs which has been trained on.
A good mental model is good: It's an API with a bunch of control loops
Then you are off to races. you can add more nodes etc later to give it a try.
Use k9s (not a misspelling) and headlamp to observe your cluster if you need a gui.
If you need to deploy it elsewhere, you just install k3s/k8s or whatever and apply the yamls (except for stateful things like db).
IT also handles name resolution with service names, restarts etc.
IT's amazing.
* Very flexible, but rigid deployments (can build anywhere, deploy from anywhere, and roll out deployments safely with zero downtime)
* Images don't randomly disappear (ran into this all the time with dokku and caprover)
* If something goes wrong, it heals itself as best it can
* Structured observability (i.e. logs, metrics, etc. are easy to capture, unify, and ship to places)
* Very easy to setup replicas to reduce load on services or have safe failovers
* Custom resource usage (I can give some pods use more/less CPU/memory limits depending on scale and priority)
* Easy to self-host FOSS services (queues, dbs, observability, apps, etc.)
* Total flexibility when customizing ingress/routing. I can keep private services private and only expose public services
* Certbot can issue ssl certs instantly (always ran into issues with other self-hosting platforms)
* Tailscale Operator makes accessing services a breeze (can opt-in services one by one)
* Everything is yaml, so easy to manipulate
* Adding new services is a cake-walk - as easy as creating a new yaml file, building an image and pushing it. I'm no longer disincentivized to spin up a new codebase for something small but worthwhile, because it's easy to ship it.
All-in-all I spent many years trying "lightweight" deployment solutions (dokku, elastic beanstalk, caprover, coolify, etc.) that all came with the promise of "simple" but ended up being infinitely more of a headache to manage when things went wrong. Even something like heroku falls short because it's harder to just spin up "anything" like a stateful service or random FOSS application. Dokku was probably the best, but it always felt somewhat brittle. Caprover was okay. And coolify never got off the ground for me. Don't even get me started on elastic beanstalk.I would say the biggest downside is that managing databases is less rigid than using something like RDS, but the flip side is that my DB is far more performant and far cheaper (I own the CPU cycles! no noisy neighbors.), and I still run daily backups to external object storage.
Once you get k8s running, it kind of just works. And when I want to do something funky or experimental (like splitting AI bots to separate pods), I can go ahead and do that with ease.
I run two separate k8s "clusters" (both single node) and I kind of love it. k9s (obs. tool) is amazing. I built my own logging platform because I hated all the other ones, might release that into its own product one day (email in my profile if you're interested).
I love Litestream and use it in every app I build now.
They advertise it as costing "pennies per day," but it's even less expensive than that. It obviously varies depending on how much storage you need, but I had a real app in production, and Litestream replication to S3 only cost me 2-3 cents ($0.02-$0.03) per month.[0]
[0] https://mtlynch.io/litestream/#using-logpaste-in-production
Such a system would also require a side channel propagating WAL updates (over Kafka or similar) to replicas, so that the running replicas can update themselves incrementally and stay fresh without loading anything from object storage.
https://litestream.io/guides/s3/
I think this is also roughly what Turso is, although it's becoming a SQLite-compatible db rather than vanilla
https://docs.turso.tech/features/embedded-replicas/introduct...
Turso looks cool and is a project I will keep an eye on, but it's replica mode seems to be designed for apps rather than mechanism to rapidly scale out read replicas in a server cluster. Also, the web site warns that it's not ready for production use.
Litestream is working on that now - the code is already in https://github.com/benbjohnson/litestream/blob/v0.5.0/vfs.go but it's not yet a working, documented feature.
They already have a prototype, and... it's pretty rough on the edges.
I'm porting it to my Go SQLite driver and already ran into a bunch of issues. But it seems at least feasible to get it into a working shape.
https://github.com/benbjohnson/litestream/issues/772
https://github.com/ncruces/go-sqlite3/compare/main...litestr...
What exactly are you trying to port?
When I saw the v0.5.0 tag, I dived into just porting it. It's just over a couple hundred lines, and I have more experience with SQLite VFSes than most, so why not.
But it's still pretty shaky.
After some testing, I expect it to be working correctly in the vast majority of cases, but the performance may disappoint. The original still needs work.
https://developers.cloudflare.com/d1/best-practices/read-rep...
>But the market has spoken! Users prefer Litestream.
>Yeah, we didn’t like those movies much either.
The NATS Jetstream use case is something I'm curious about.
Cheers and keep up the great work on Litestream.
Could how does litestream handle backing up through a spotty connection and can we consolidate the backups into a central db an query against it?
So far I’ve stuck with the SFTP solution, since I don’t use any of the cloud object storage services that are hardcoded into the tool.[^2]
Big thanks to the developers.
I would probably run both litestream and full backups, to get extra safety
The advantage of sqlite3_rsync is that, if on the other end is an SSH server with sqlite3_rsync, you only transfer the changed pages, and still get a “perfect” copy of the file on the other end.
The advantage of Litestream is that on the other end does not need to live a “server,” and still only changes are uploaded. If you do it continuously, you get many points to recover from, and many of those cross reference each other, saving storage space too. On the flip side, you need the tool to restore.
So I would treat sqlite3_rsync as more of a demo than a stable product right now.
Litestream provides near-real-time offsite replication and point in time recovery, which sqlite3_rsync won't do on its own. You could probably build a litestream-like product based on sqlite3_rsync but it probably won't be as fast or as efficient on storage.
I wish they'd put a bit more effort into the DX here, but it probably doesn't make much sense from a biz PoV since big customers aren't going to be running these kinds workloads.
Curious if anybody here is deploying SQLite apps to production and what host they're using?
Only fuss I remember encountering was with fighting with rails migrating solid queue properly, but this seemed like a rails unit issue and don’t remember it being a Fly issue.
I’ve been contemplating migrating my pg primary to SQLite too. Anyways don’t have much else to offer other than an anecdote that I’m happily using fly with partial SQLite.
What's the Fly.io issue here? Aren't the issues you're describing in Rails not Fly.io?
I run several Go apps in production on Fly.io[0, 1, 2] and I've never had an issue with the Fly.io + SQLite part of it.
SQLite + Litestream makes things slightly more complicated, but again, I've never had issues with Fly.io specifically making that harder. I use a custom launch script in my Dockerfile that starts litestream with the -exec flag to start my app as a child process.[3]
[0] https://github.com/mtlynch/logpaste
[1] https://github.com/mtlynch/picoshare
[2] https://github.com/mtlynch/screenjournal
[3] https://github.com/mtlynch/logpaste/blob/0.3.1/docker-entryp...
This why I prefer to take backup stuff in a side container, eg: https://github.com/atrakic/gin-sqlite/blob/main/compose.yml
As a side note, you might consider revisiting your dockerfiles and skip litestream build steps, eg. in your final stage just add line like this:
COPY --from=litestream/litestream /usr/local/bin/litestream /bin/litestream
Where are you seeing a 31 MB binary? Latest releases are 10-11 MB.[0]
>This why I prefer to take backup stuff in a side container, eg: https://github.com/atrakic/gin-sqlite/blob/main/compose.yml
Yeah, I agree that's cleaner, but once you're dealing with multiple containers, it's a big step up in complexity, and you can't do the simple install on places like Fly.io and Lightsail.
>As a side note, you might consider revisiting your dockerfiles and skip litestream build steps, eg. in your final stage just add line like this:
>COPY --from=litestream/litestream /usr/local/bin/litestream /bin/litestream
Ah, thanks! The litestream Docker image is new as of 0.5.0.
Their managed postgres has gotten better, but its still a little sparse, so after about 6 months using it I am going to just take my DB to either Supabase or Planetscale.
It’s certainly not intuitive. It would be awesome if they sweat these details, but their deal is “here’s a bag of sharp knives”, which is good for some use cases.
Every time I deploy something, it spins up 2 instances that are in some suspended state. I have to restart them like 3 times before they actually boot? And why can I never just pick one instance when launching an app.
Apps will randomly go into a suspended state, without explaining why. Contacting support says they ran out of capacity, but didn't automatically start them back when capacity was available?! That's the whole point of Apps (vs Machines), you keep my app running...
Fly is set up to be the best compute provider, but there are too many reliability and ergonomics issues.
Please stop updating flyctl every time i go to deploy an app
> In Litestream, we’re solving the problem a different way. Modern object stores like S3 and Tigris solve this problem for us: they now offer conditional write support. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency.
Reading this blog post though, I couldn't see any reference to this. Is this supported in Litestream v0.5.0, or will it be for a future release?
Will Litestream freak out about the database being replaced underneath it?
Will I still be able to restore old versions of the DB?
dave78•4mo ago
> But the market has spoken! Users prefer Litestream. And honestly, we get it: Litestream is easier to run and to reason about. So we’ve shifted our focus back to it.
simonw•4mo ago