frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
68•theblazehen•2d ago•14 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
642•klaussilveira•13h ago•188 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
937•xnx•18h ago•549 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
36•helloplanets•4d ago•32 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
115•matheusalmeida•1d ago•28 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
45•videotopia•4d ago•1 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
13•kaonwarb•3d ago•15 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
223•isitcontent•13h ago•25 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
215•dmpetrov•13h ago•106 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
324•vecti•15h ago•142 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
377•ostacke•19h ago•94 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
481•todsacerdoti•21h ago•238 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
359•aktau•20h ago•181 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
281•eljojo•16h ago•167 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
407•lstoll•19h ago•274 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
17•jesperordrup•3h ago•10 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
86•quibono•4d ago•21 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
58•kmm•5d ago•4 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
28•romes•4d ago•3 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
248•i5heu•16h ago•193 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
14•bikenaga•3d ago•3 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
56•gfortaine•11h ago•23 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1061•cdrnsf•22h ago•438 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
140•SerCe•9h ago•126 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
180•limoce•3d ago•97 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
284•surprisetalk•3d ago•38 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
145•vmatsiiako•18h ago•65 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
70•phreda4•13h ago•14 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
29•gmays•8h ago•11 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
64•rescrv•21h ago•23 comments
Open in hackernews

Build your own database

https://www.nan.fyi/database
547•nansdotio•3mo ago
Content source: Designing Data-Intensive Applications https://www.oreilly.com/library/view/designing-data-intensiv...

Comments

4ndrewl•3mo ago
> Databases were made to solve one problem:

>

> "How do we store data persistently and then efficiently look it up later?"

Isn't that two problems?

dayjaby•3mo ago
Store data persistently so it can be looked up efficiently* sounds like a single problem.
SirFatty•3mo ago
Definitely two.
cjbgkagh•3mo ago
It’s not persistent if it can’t be recovered later
stvltvs•3mo ago
Puts message in a bottle and tosses into the most convenient black hole.
BetaDeltaAlpha•3mo ago
Doesn't the black hole compresses the bottle beyond recovery?
stvltvs•3mo ago
Not necessarily, opinions vary.

https://www.sciencenewstoday.org/do-black-holes-destroy-or-s...

SahAssar•3mo ago
"Store data persistently" implies "it can be looked up" since if you cannot look it up it is impossible to know if it is stored persistently.

The "efficiently" part can be considered a separate problem though.

prerok•3mo ago
Well, if you just want to store data, you can use files. Lookup is a bit tedious and inefficient.

So, if we consider that persistent storage is a solved problem, then we can say that the reason for databases was how to look up data efficiently. In fact, that is why they were invented, even if persistent storage is a prerequisite.

nonethewiser•3mo ago
How about "store data in certain way." That sounds more like 1 problem and encompasses an even larger problem space.
grokgrok•3mo ago
How do we reconstruct past memory states? That's the fundamental problem.

Efficiency of storage or retrieval, reliability against loss or corruption, security against unwanted disclosure or modification are all common concerns, and the relative values assigned to these features and others motivate database design.

kiitos•3mo ago
> How do we reconstruct past memory states? That's the fundamental problem.

reconstructing past memory states is rarely, if ever, a requirement that needs to be accommodated in the database layer

nonethewiser•3mo ago
Can you elaborate? That certainly seems to be what happens in a typical crud app. You have some model for your data which you persist so that it can be loaded later. Perhaps partially at times.

In another context perhaps you're ingesting data to be used in analytics. Which seems to fit the "reconstruct past memory stat" less.

grokgrok•3mo ago
Presumably the analysis will retrieve stored memory states from the ingestion phase to then perform useful calculation, or else why is there a database?
i_k_k•3mo ago
I always wanted to ship a write-only database. Lightning fast.
elygre•3mo ago
Back in the 80s a professor at our college got a presentation on the concept of «write-only memory» accepted for some symposium.

Good times.

thomasjudge•3mo ago
Very secure!
pcdevils•3mo ago
Pretty much how eventstoredb works. Deleting data fully only happens at scavenge which rewrites the data files.
hxtk•3mo ago
I think it was a joke. It sounds like you read it as append-only, like most LSM tree databases (not rewriting files in the course of write operations), but I think GP meant it as write-only to the exclusion of reads, roughly equivalent to `echo $data > /dev/null`
datadrivenangel•3mo ago
I've forgotten how to count that low. [0]

0 - https://www.youtube.com/watch?v=3t6L-FlfeaI

archerx•3mo ago
That would be useful for logging.
warkdarrior•3mo ago
If it's write-only, and no reads ever happen, one can write to /dev/null without loss of utility.
mewpmewp2•3mo ago
It would be good for before going to sleep then.
Etheryte•3mo ago
Also useful for backups, so long as you don't need to restore.
pratik661•3mo ago
This is analogous to an elevator that’s unidirectional
rzzzt•3mo ago
One that lets people enter. We will figure out exiting later, with exiting on a different floor as a stretch goal.
theideaofcoffee•3mo ago
Or just a paternoster
nonethewiser•3mo ago
It's amusing to me that this is really quite a pedantic observation yet it's driving very earnest engagement from hackernews. Myself included. Absolutely nothing in this article is riding on if its 1 or 2 problems - it's an aside at best. Yet I'm still trying to think through if it's 1 or 2. I mean, the "and" is right there - that clearly suggests two. It's almost comical even, to say "Here is one problem: X and Y." Yet in another way it seems like 2 sides of the same coin.

I guess there is a rather fine line between philosophy and pedantry.

Maybe we can think about it from another angle. If they are 2 problems databases were designed to solve, then that means this is a problem databases were designed to solve: storing data persistently.

Is that really a problem database were designed to solve? Not really. We had that long before databases. It was already solved. It's a pretty fundamental computer operation. Isn't it fair to say this is one thing? "Storing data so it can be retrieved efficiently."

gingersnap•3mo ago
You're thinking of regex
mrighele•3mo ago
It is a single problem that contains two smaller problems, but the actual hard part (a third problem, if you wish) is putting them together. If you limit yourself to solve those two problems independently you won't have a (useful) database.
didip•3mo ago
Off by 1 error is indeed a hard problem.
whartung•3mo ago
> Isn't that two problems?

No, that would be regexes.

mamcx•3mo ago
You can decompose in 2 problems, because well is better, but is in fact one. Can be argued that is only this single problem:

How, in ACID way, store data that will be efficiently look it up later by a unknown number of clients and unknown access patterns, concurrently, without blocking all the participants, in a fast way?

And then add SQL (ouch!)

lelanthran•3mo ago
>> "How do we store data persistently and then efficiently look it up later?"

> Isn't that two problems?

Only if you're creating a write-only database, in which case just write it to /dev/null.

cube2222•3mo ago
I clicked through a couple of the articles in the OP, and I must say, the design and animations are extremely pretty!

Kudos for that!

liqilin1567•3mo ago
The framework OP uses for his blog: https://github.com/nandanmen/NotANumber
235ylkj•3mo ago
Here's a simple key-value store inspired by D.B. Cooper:

  ~/bin/cooper-db-set
  ===================
  #! /bin/bash

  key="$1"
  value="$2"

  echo "${key}:${value}" >> /dev/null


  ~/bin/cooper-db-get
  ===================
  #! /bin/bash

  key="$1"
  </dev/null awk -F: -v key="$key" '$1 == key {result = $2} END {print result}'
MathMonkeyMan•3mo ago
/dev/null is persistent across restarts and cache friendly, so it's got you covered.
skeptrune•3mo ago
I love the design and examples in this post. Easy to read for sure.

Exercises like this also seem fun in general. It's a real test of how much you know to start anything from scratch.

kevinqi•3mo ago
my only minor critique is using lorem ipsum examples. It tends to make me want to gloss over instead of reading; I prefer seeing realistic data. other than that, it's a really cool post
WD-42•3mo ago
Was going to post the same thing. Lorem Ipsum makes the data too hard to distinguish. I get that due to the dynamic nature of the examples the text needed to be generated, but Latin isn't the best choice IMO.

Otherwise great article, thank you!

doublerabbit•3mo ago
It's the same for me when foo and bar are used as examples.
samwho•3mo ago
Someone gave me the advice to use animals, ideally animals of very different sizes or colours. People instantly picture them and remember them.
kragen•3mo ago
You may be interested in http://canonical.org/~kragen/sw/dev3/exampledb.py -sql -n 2 example.sql:

    begin;
    insert into cust (id, name, company, streetaddress, city, state, zip) values (1, 'Jacqueline Gagnon', 'Baker Group', '218 Miller Dr.', 'Riverside', 'KS', '51859');
    commit;
    begin;
    insert into cust (id, name, company, streetaddress, city, state, zip) values (2, 'Wayne Bennett', 'FF Petroleum LLC', '4375 Moore Dr.', 'Mount Vernon', 'MS', '98270');
    select setval('cust_id_seq', 2);
    commit;
    begin;
    insert into product (id, name, unitprice) values (1, 'Biological blue steel doll', 30.4);
    commit;
    begin;
    insert into product (id, name, unitprice) values (2, 'Gray cotton electronic boxers, size L', 13.3);
    insert into product (id, name, unitprice) values (3, 'Blue cotton intimate blazer, ages 2–5', 37.3);
    insert into product (id, name, unitprice) values (4, 'Daily beige steel car', 14.6);
    insert into product (id, name, unitprice) values (5, 'Black spandex daily blazer, size L', 24.1);
    insert into product (id, name, unitprice) values (6, 'Blue wool dynamic briefs, ages 3–10', 79.0);
    insert into product (id, name, unitprice) values (7, 'Blue spandex ultrasonic dress, child’s size', 31.9);
    insert into product (id, name, unitprice) values (8, 'Gold wool daily boxers, ages 3–10', 8.85);
    insert into product (id, name, unitprice) values (9, 'Red cotton utility boxers, ages 2–5', 28.9);
    insert into product (id, name, unitprice) values (10, 'Gray polyester ultrasonic briefs, ages 3–10', 15.3);
    -- ...
It also creates the tables, including invoice and lineitem tables. It's still a bit of a dull accounting example, rather than something like food, superheroes, social networks, zoo animals, sports, or dating, but I think the randomness does add a little bit of humor.

Although now we have LLMs, and maybe they'd do a better job.

ashleyn•3mo ago
I was tempted to knee-jerk dismiss this as "don't write your own database, don't even use a KV database, just use SQL". And then I remembered the only reason I'd say this is because I went through designing my own DB or using KV databases just to avoid SQL...only to realise i was badly reinventing SQL. It could be worth the lesson.
shellfishgene•3mo ago
It's very nice, but I think he should expand on why hash tables are fast/constant time lookup. It's a central concept to why the index makes the db fast.
codezero•3mo ago
If you want to store key values in a file, one fun and ridiculous technique is to use xattrs :)

I made a pointless program to help w/ this on macOS for kicks: https://github.com/radiofreejohn/xattrkv

When I made it I also found a bug in the xattr implementation for Darwin and submitted it to Apple, they eventually fixed it.

FpUser•3mo ago
>Problem. How do we store data persistently and then efficiently look it up later?"

I would say without transactions it is not a database yet from a practical standpoint.

dangoodmanUT•3mo ago
I think a lot of databases would disagree
FpUser•3mo ago
You might be on to something here ;)
alecco•3mo ago
But they are web scale!
socketcluster•3mo ago
You can implement two-phase commit instead. It requires a bit of additional planning in terms of data management but I actually find it much more elegant and it scales better. DB transactions are expensive and unnecessarily complicated.

You can have a really simple two-phase commit system where you initially mark all records as 'pending' and then update them as 'settled' once all the necessary associated rows have been inserted into their respective tables. You can record a timestamp so that you know where to resume the settlement process from. I once had multiple processes doing settlement in parallel by hashing the ids to map to specific processes so it scales really well.

FpUser•3mo ago
>"You can implement two-phase commit instead"

Two-phase commit is a particular way of implementing transaction when system is distributed. There is no "instead" here

socketcluster•3mo ago
System doesn't have to be distributed. In general it just needs to separate the insertion of records from their settlement.
myth_drannon•3mo ago
I also recommend this free online book to build a database https://build-your-own.org/database/
bionsystem•3mo ago
I remember an article here, maybe a year ago, where somebody showed some database concepts from bash examples (like "write your db in bash"), but I can't find it anywhere, does anybody have it ?
pandaec•3mo ago
https://tontinton.com/posts/database-fundementals/
DiabloD3•3mo ago
It looks like it got hugged to death already.
winrid•3mo ago
Needs a faster database
keybored•3mo ago
Part of the reason why I'm not a "maker" is because my mind gets ahead of me with all the things that I would need to do in order to do things properly. So the article starts out interesting and then gets more and more, well, not exactly stressful but I get a bit weary by it.

Not that I would aspire to implement a general-purpose database. But even smaller tasks can make my mind spin too much.

browningstreet•3mo ago
I don't disagree with your take in general, but I do think it's different reading about minutiae than being invested in it. If you actually are curing these requirements it's probably quite engaging. If not, the eyes and mind start to gloss over them.

As a different example: I'm moving this week. I've known I'm moving for a while. Thinking about moving -- and all the little things I have to do -- is way more painful than doing them. Thinking about them keeps me up at night, getting through my list today is only fractionally painful.

I'm also leveling up a few aspects of my "way of living" in the midst of all this, and it'd be terribly boring to tell others about it, but when next Monday comes.. it'll be quite sweet indeed.

keybored•3mo ago
> As a different example: I'm moving this week. I've known I'm moving for a while. Thinking about moving -- and all the little things I have to do -- is way more painful than doing them. Thinking about them keeps me up at night, getting through my list today is only fractionally painful.

this sounds familiar... :)

nawgz•3mo ago
Have you considered if you have ADHD?
keybored•3mo ago
I have thought about it.
ozgrakkurt•3mo ago
I have this same issue but lately I am realizing it is about belief and made great progress fixing it.

For me it is all about believing that I’ll succeed and realizing that the belief doesn’t really correlate with technical aspect as much as I think it does.

If I believe I won’t succeed, I spend every moment trying to find the problem that will finally end me. And every problem becomes a death sentence.

If I believe I’ll succeed, problems become temporary obstacles and all my focus is on how I’ll overcome the current obstacle.

keybored•3mo ago
Thanks. That’s helpful.
chrisallick•3mo ago
if author is reading, can you add an rss feed to your site? i want to add to feedly.
jamwil•3mo ago
I was quite excited to add this one! And shocked to not find it, given the overall high production quality.
jayfair•3mo ago
I've found Kill The Newsletter works pretty well for the few things I want to follow that still insist on email delivery. https://kill-the-newsletter.com/
chrisallick•3mo ago
omg genius lol. thats such a simple and smart utility. i might try and make a chrome extension for this.
constantcrying•3mo ago
I absolutely love this "first principles" approach of explaining a topic. You can really go through this and at each time understand what problem needs to be solved and what other problems this introduces, until you get at a reasonably satisfying solution.
exdeejay_•3mo ago
The first example in the "Sorting in Practice" section appears to be broken. The text makes it seem like the list should be sorted in-memory and then written to disk sorted, but the example un-sorts the list when it's written to disk.

Edit: the flush example (2nd one) in the recap section does the same thing, when the text says that the records are supposed to be written to the file in sorted order.

0xb0565e486•3mo ago
I have spending the last ~4 weeks writing a triple store!

I wish this came out earlier, there are a few insights in there that took me me a while to understand :)

saxelsen•3mo ago
Nice interactivity, but this is taken straight from the Designing Data-Intensive Applications. Literally all the content here is an interactive version of chapter 3.

Maybe give credit?

tomhow•3mo ago
Thanks, we've added this to the thread's top text.
oneeyedpigeon•3mo ago
Come on, that's not enough. a) The parent said "taken straight from" but you've watered that down to "inspired by"; which is it? b) You've edited this post on HN, but the actual original article still makes no mention of the source.
tomhow•3mo ago
Yep, fair enough. We've had contact with the post's publisher, and whilst it would be unfair of us to disclose the details of the communication, I've now updated the header text (to what I originally posted there when I first saw the root comment's allegation), and have down-weighted the post.
nansdotio•3mo ago
Hey! Author of the post here.

While the text itself is my own words, the logical structure and the examples were indeed based off DDIA's chapter 3. I dropped the ball here - the site has been updated with proper attribution.

vladpowerman•3mo ago
Great read. I’ve been modeling developer activity as a time series key value system where each developer is a key and commits are values. Faced the same issues: logs grow fast, indexes get heavy, range queries slow down. How do you decide what to drop when compacting segments? Balancing freshness and retention is tricky.
withinboredom•3mo ago
I'm curious how much data you have? I have 12 years of dev data and reports are generated in seconds, if not milliseconds. What is your key patterns? It sounds like a key-design problem.
orliesaurus•3mo ago
am i the only one who IS a huge fan of this blogpost layout
jumploops•3mo ago
“LSM trees are the underlying data structure used for [..] DynamoDB, and they have proven to perform really well at scale [..] 80 million requests per second!”

This is a tad bit misleading, as the LSM is used for the node-level storage engine, but doesn’t explain how the overall distributed system scales to 80 million rps.

iirc the original Dynamo paper used BerkeleyDB (b-tree or LSM), but the 2012 paper shifted to a fully LSM-based engine.

tombert•3mo ago
I more or less built my own database in Erlang a few months ago. I say "more or less" because I did use Bitcask as the underlying store, and I used the riak_core libraries initially, but I did handle replication and different fault tolerance techniques on my own.

It was actually very fun; a key-value database is something that can be any level of difficulty that you want. If you want a simple KV "database", you could just serialize and deserialize a JSON string all the time, or write a protobuf, but there is of course no limit to the level of complexity.

I use the JSON example because that was actually how I started; I was constantly serializing and deserializing JSON with base64 binary encoded strings, just because it was easy and good enough, and over the course of building the project I ended up making a proper replicated database. I even had a very basic "query language" to handle some basic searches.

marci•3mo ago
Any repo? even if not production ready. I'm curious about how you approached replication, compared to mnesia or couchdb, especially now that erlang natively supports json.
tombert•3mo ago
Sadly it’s in a private repo as I had ambitions of trying to sell the product, which I haven’t completely given up on yet.

That said, a lot of the concepts come from riak_core, which is FOSS: https://github.com/OpenRiak/riak_core

mirpoker•3mo ago
thanks for share
TheAnkurTyagi•3mo ago
A very nice explanation with visual interactions of how database works internally. author is a great teacher.
darkstar_16•3mo ago
One of our final projects during university was to design and program a basic database in C. Even after 20 years I think that was one of the most one I've had in a project.
curtisblaine•3mo ago
This gets fuzzy around the end - indexes are depicted as separate (partial) entities. Do we store all of those separated in different files? If so, do we need to open them all to search for a record?
LourensT•3mo ago
Great post and beautiful website. I got a bit confused by the flush operation that happens when the memtable is full. A quick note that a new on-disk segment is created would help. In the recap at the end, segmentation is also not mentioned.
kragen•3mo ago
This article is attractively presented and seems to be well written, but I disagree with the framing:

> Databases were made to solve one problem:

> How do we store data persistently and then efficiently look it up later?

There are indeed things described as "databases" which are made to solve that problem, but more commonly such things are instead called "file formats" or, to old IBMers, "access methods".

As I see it, the much more interesting problem that databases solve, the one that usually distinguishes what we call "databases" from what we call "file formats", is query evaluation:

> How do we organize a large body of information such that we can easily answer questions efficiently from it?

Prolog, Datalog, QBE, QUEL, and SQL are approaches to expressing the questions, and indexing, materialized views, query planning, resolution, the WAM, and tabled resolution are approaches to answering them efficiently.

dbm is not a database. ISAM is not a database. But SQLite in :memory: is still a database.

mb2100•3mo ago
seems you've overlooked the word "efficiently" in "store data persistently and then efficiently look it up later"?
kragen•3mo ago
No, I did not. Possibly you think I did because you don't know what ISAM or dbm are.