I added this build step before `scp`ing the binary to the server
docker run --rm --platform=linux/amd64 \
-v "$PWD":/app -w /app \
golang:1.22-bullseye \
/bin/bash -c "apt update && apt install -y gcc sqlite3 libsqlite3-dev && \
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o app-linux-amd64 ./cmd/main.go"
Looking at the article, I should give modernc sqlite driver a try.Sqlite over stdin, to a subprocess, and it's fast!
That's the kind of result that makes me wonder if there is something odd with the benchmarking.
I wonder if the following things make the C driven version slower...
- prepare the send buffers (sqlite side)
- prepare the receive buffers (go side)
- do the call
- get the received data into go buffers of some kind
- free up the send buffers (happens automatically)
- free up the receive buffers (semi automatically in Go).
When using stdin/stdout, the system looks after send/receive buffers. It's simply reading/writing them. No allocation is needed. The stream can be as big or as little as wanted/needed. The OS will look after the integrity of the streams and these are probably fairly well tested subsystems on most operating systems.
stdin/stdout becomes a "library" for "fast data transfer".
Pretty neat.
Still, I can see a few downsides. Though sqinn-go is pure Go, the forked process is pure C, so you'll need to either download a prebuilt one (Linux and Windows only atm), or build it yourself. This rather defeats the benefits of Go's killer feature of "single-binary distribution".
Still, I agree it's wild it is so fast.
Perhaps I misunderstand though.
I quickly ruled out using database/sql drivers as the indirection through interface types added a bunch of overhead and stymied my attempts for reasonable memory layout. For my use-case, I found the crawshaw driver performed the best, but I ended up forking it as well as the Golang standard library CSV parser as I found defensive copying & allocation was the largest bottleneck. I ended up cycling several very large arenas among a CSV parser thread that filled the arena with column bytes and several threads writing to different temporary sqlite databases. Then at the end I ATTACHED them together and copied them into one big file (idk exactly why this is faster, but my profiles showed most cpu time spent in sqlite doing query binding things so MOAR CORES).
One notable optimization was exposing a way to bind borrowed bytes to query parameters without inducing a copy in either Golang caller code, or SQLite library code. The crawshaw driver upstream only exposes sqlite_bind_blob with SQLITE_TRANSIENT mode, which tells SQLite to copy the input to a private allocation before returning from the sqlite_bind* call. I added a version that passes SQLITE_STATIC, which means "trust me, I won't touch these bytes until the query is done, and I'll free them afterwards". This is safe in Rust who's "borrow" and "lifetime" concept models this perfectly, but I guess in Golang its dicey enough to not expose in your public package.
Here's the relevant commit in my fork: https://github.com/crawshaw/sqlite/commit/82ad4f03528e8fdc6a...
I'm curious how OP's https://github.com/cvilsmeier/sqinn would fare, I'm somewhat sus about copying 200GB to stdin but the benchmark results are pretty good so ¯\_(ツ)_/¯
I see what you mean, there are some categories there (cGO based or not) that lend themselves to quick understanding via a table.
I feel like SQLite is undervalued. I do agree that in particular cases might not be the best, but more often than not I see that SQLite is more than enough database. Using Postgres or MySQL for the sake of being "production grade" is never a good idea. SQLite is also production grade. Watching at the statistics (look at sqinn) I would state that 90% of the internet could use SQLite without any issue and only benefits.
How do you overcome this with SQLite and Django?
It's on by default in many sqlite drivers because it really is the best default. But it isn't on by default in upstream sqlite even though it's been out for ages now.
What's there to deal with? You turn it on with a pragma and forget about it.
sqlite3 source_database.db ".backup backup_database.db"
Now the WAL content is rolled into your new backup file. Stick a timestamp in the backup file name and run this as a cron job every N minutes and you have all the recovery you need. Another one-liner to sync to S3 and you're all set.Edit: And just to clarify, that command can be run on a live DB as it's being used by your app server. SQLite handles external concurrent readers just fine.
_hyn3•5mo ago
It also looks like squinn is the clear leader for most but not all of the benchmarks.
Even though it's "not scientific", is still very useful as a baseline - thanks for taking this effort and publishing your results!
Also taking a look at monibot.io , looks cool
cvilsmeier•5mo ago