frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

What Does a Database for SSDs Look Like?

https://brooker.co.za/blog/2025/12/15/database-for-ssd.html
31•charleshn•2h ago

Comments

mrkeen•1h ago
> Design decisions like write-ahead logs, large page sizes, and buffering table writes in bulk were built around disks where I/O was SLOW, and where sequential I/O was order(s)-of-magnitude faster than random.

Overall speed is irrelevant, what mattered was the relative speed difference between sequential and random access.

And since there's still a massive difference between sequential and random access with SSDs, I doubt the overall approach of using buffers needs to be reconsidered.

crazygringo•34m ago
Can you clarify? I thought a major benefit of SSDs is that there isn't any difference between sequential and random access. There's no physical head that needs to move.
b112•32m ago
Read up on IOPS, conjoined with requests for sequential reads.
crazygringo•28m ago
Very interesting, thank you. TIL.
threeducks•27m ago
Lets take the Samsung 9100 Pro M.2 as an example. It has a sequential read rate of ~6700 MB/s and a 4k random read rate of ~80 MB/s:

https://i.imgur.com/t5scCa3.png

https://ssd.userbenchmark.com/ (click on the orange double arrow to view additional columns)

That is a latency of about 50 µs for a random read, compared to 4-5 ms latency for HDDs.

yyyk•20m ago
SSD controllers and VFSs are often optimized for sequential access (e.g. readahead cache) which leads to software being written to do sequential access for speed which leads to optimization for that access pattern, and so on.
PunchyHamster•14m ago
SSD block size is far bigger than 4kB. They still benefit from sequential write
londons_explore•1h ago
Median database workloads are probably doing writes of just a few bytes per transaction. Ie 'set last_login_time = now() where userid=12345'.

Due to the interface between SSD and host OS being block based, you are forced to write a full 4k page. Which means you really still benefit from a write ahead log to batch together all those changes, at least up to page size, if not larger.

esperent•1h ago
Don't some SSDs have 512b page size?
zokier•1h ago
They might present 512 blocks to host, but internally the ssd almost certainly manages data in larger pages
cm2187•1h ago
And the filesystem will also likely be 4k block size.
digikata•54m ago
I would guess by now none have that internally. As a rule of thumb every major flash density increase (SLC, TLC, QLC) also tended to double internal page size. There were also internal transfer performance reasons for large sizes. Low level 16k-64k flash "pages" are common, and sometimes with even larger stripes of pages due to the internal firmware sw/hw design.
Sesse__•23m ago
Also due to error correction issues. Flash is notoriously unreliable, so you get bit errors _all the time_ (correcting errors is absolutely routine). And you can make more efficient error-correcting codes if you are using larger blocks. This is why HDDs went from 512 to 4096 byte blocks as well.
Sesse__•25m ago
A write-ahead log isn't a performance tool to batch changes, it's a tool to get durability of random writes. You write your intended changes to the log, fsync it (which means you get a 4k write), then make the actual changes on disk just as if you didn't have a WAL.

If you want to get some sort of sub-block batching, you need a structure that isn't random in the first place, for instance an LSM (where you write all of your changes sequentially to a log and then do compaction later)—and then solve your durability in some other way.

throw0101a•5m ago
> A write-ahead log isn't a performance tool to batch changes, it's a tool to get durability of random writes.

¿Por qué no los dos?

danielfalbo•1h ago
Reminds me of: Databases on SSDs, Initial Ideas on Tuning (2010) [1]

[1] https://www.dr-josiah.com/2010/08/databases-on-ssds-initial-...

zokier•1h ago
Author could have started by surveying current state of art instead of just falsely assuming that DB devs have just been resting on the laurels for past decades. If you want to see (relational) DB for SSD just check out stuff like myrocks on zenfs+; it's pretty impressive stuff.
raggi•57m ago
It may not matter for clouds with massive margins but there are substantial opportunities for optimizing wear.
ljosifov•46m ago
Not for SSD specifically, but I assume the compact design doesn't hurt: duckdb saved my sanity recently. Single file, columnar, with builtin compression I presume (given in columnar even simplest compression maybe very effective), and with $ duckdb -ui /path/to/data/base.duckdb opening a notebook in browser. Didn't find a single thing to dislike about duckdb - as a single user. To top it off - afaik can be zero-copy 'overlayed' on the top of a bunch of parquet binary files to provide sql over them?? (didn't try it; wd be amazing if it works well)
dist1ll•10m ago
Is there more detail on the design of the distributed multi-AZ journal? That feels like the meat of the architecture.
PunchyHamster•10m ago
> WALs, and related low-level logging details, are critical for database systems that care deeply about durability on a single system. But the modern database isn’t like that: it doesn’t depend on commit-to-disk on a single system for its durability story. Commit-to-disk on a single system is both unnecessary (because we can replicate across storage on multiple systems) and inadequate (because we don’t want to lose writes even if a single system fails).

And then a bug crashes your database cluster all at once and now instead of missing seconds, you miss minutes, because some smartass thought "surely if I send request to 5 nodes some of that will land on disk in reasonably near future?".

I love how this industry invents best practices that are actually good then people just invent badly researched reasons to just... not do them.

dist1ll•7m ago
> "surely if I send request to 5 nodes some of that will land on disk in reasonably near future?"

That would be asynchronous replication. But IIUC the author is instead advocating for a distributed log with synchronous quorum writes.

NTP at NIST Boulder Has Lost Power

https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/ACADD3NKOG2QRWZ56OSNNG7UIEKKT...
149•lpage•5h ago•61 comments

What Does a Database for SSDs Look Like?

https://brooker.co.za/blog/2025/12/15/database-for-ssd.html
31•charleshn•2h ago•22 comments

Charles Proxy

https://www.charlesproxy.com/
182•handfuloflight•6h ago•57 comments

CSS Grid Lanes

https://webkit.org/blog/17660/introducing-css-grid-lanes/
567•frizlab•14h ago•160 comments

A terminal emulator that runs in your terminal. Powered by Turbo Vision

https://github.com/magiblot/tvterm
61•mariuz•2d ago•6 comments

Skills Officially Comes to Codex

https://developers.openai.com/codex/skills/
55•rochansinha•4h ago•18 comments

Mistral OCR 3

https://mistral.ai/news/mistral-ocr-3
568•pember•1d ago•103 comments

Raycaster (YC F24) Is Hiring a Research Engineer (NYC, In-Person)

1•levilian•47m ago

New Quantum Antenna Reveals a Hidden Terahertz World

https://www.sciencedaily.com/releases/2025/12/251213032617.htm
34•aacker•4d ago•1 comments

Airbus to migrate critical apps to a sovereign Euro cloud

https://www.theregister.com/2025/12/19/airbus_sovereign_cloud/
208•saubeidl•4h ago•101 comments

Garage – An S3 object store so reliable you can run it outside datacenters

https://garagehq.deuxfleurs.fr/
604•ibobev•21h ago•128 comments

Contrails Map

https://map.contrails.org/
52•schaum•5h ago•18 comments

Privacy doesn't mean anything anymore, anonymity does

https://servury.com/blog/privacy-is-marketing-anonymity-is-architecture/
127•ybceo•6h ago•91 comments

Sharp: High performance Node.js image processing/optimization

https://github.com/lovell/sharp
21•nateb2022•3d ago•1 comments

TP-Link Tapo C200: Hardcoded Keys, Buffer Overflows and Privacy

https://www.evilsocket.net/2025/12/18/TP-Link-Tapo-C200-Hardcoded-Keys-Buffer-Overflows-and-Priva...
300•sibellavia•18h ago•89 comments

Fuzix on a Raspberry Pi Pico

https://ewpratten.com/blog/fuzix-pi-pico
73•ewpratten•5d ago•6 comments

The Deviancy Signal: Having "Nothing to Hide" Is a Threat to Us All

https://thompson2026.com/blog/deviancy-signal/
64•NickForLiberty•7h ago•40 comments

Hash tables in Go and advantage of self-hosted compilers

https://rushter.com/blog/go-and-hashmaps/
15•f311a•5d ago•4 comments

8-bit Boléro

https://linusakesson.net/music/bolero/index.php
270•Aissen•1d ago•41 comments

Graphite is joining Cursor

https://cursor.com/blog/graphite
242•fosterfriends•20h ago•235 comments

A proof of concept of a semistable C++ vector container

https://github.com/joaquintides/semistable_vector
4•joaquintides•4d ago•0 comments

LLM Year in Review

https://karpathy.bearblog.dev/year-in-review-2025/
249•swyx•15h ago•67 comments

Carolina Cloud – One third the cost of AWS for data science workloads

https://carolinacloud.io/
119•bojangleslover•5d ago•65 comments

A better zip bomb (2019)

https://www.bamsoftware.com/hacks/zipbomb/
146•kekqqq•15h ago•53 comments

Brown/MIT shooting suspect found dead, officials say

https://www.washingtonpost.com/nation/2025/12/18/brown-university-shooting-person-of-interest/
161•anigbrowl•1d ago•195 comments

Build Your Own React

https://pomb.us/build-your-own-react/
137•howToTestFE•12h ago•12 comments

Show HN: TinyPDF – 3kb pdf library (70x smaller than jsPDF)

https://github.com/Lulzx/tinypdf
198•lulzx•1d ago•24 comments

Gh-actions-lockfile: generate and verify lockfiles for GitHub Actions

https://gh-actions-lockfile.net
42•gjtorikian•3d ago•23 comments

Rust's Block Pattern

https://notgull.net/block-pattern/
192•zdw•1d ago•94 comments

Feast Your Eyes on Japan's Fake Food

https://www.newyorker.com/culture/annals-of-gastronomy/feast-your-eyes-on-japans-fake-food
25•Kaibeezy•4d ago•9 comments