Kudos for that!
~/bin/cooper-db-set
===================
#! /bin/bash
key="$1"
value="$2"
echo "${key}:${value}" >> /dev/null
~/bin/cooper-db-get
===================
#! /bin/bash
key="$1"
</dev/null awk -F: -v key="$key" '$1 == key {result = $2} END {print result}'Exercises like this also seem fun in general. It's a real test of how much you know to start anything from scratch.
Otherwise great article, thank you!
begin;
insert into cust (id, name, company, streetaddress, city, state, zip) values (1, 'Jacqueline Gagnon', 'Baker Group', '218 Miller Dr.', 'Riverside', 'KS', '51859');
commit;
begin;
insert into cust (id, name, company, streetaddress, city, state, zip) values (2, 'Wayne Bennett', 'FF Petroleum LLC', '4375 Moore Dr.', 'Mount Vernon', 'MS', '98270');
select setval('cust_id_seq', 2);
commit;
begin;
insert into product (id, name, unitprice) values (1, 'Biological blue steel doll', 30.4);
commit;
begin;
insert into product (id, name, unitprice) values (2, 'Gray cotton electronic boxers, size L', 13.3);
insert into product (id, name, unitprice) values (3, 'Blue cotton intimate blazer, ages 2–5', 37.3);
insert into product (id, name, unitprice) values (4, 'Daily beige steel car', 14.6);
insert into product (id, name, unitprice) values (5, 'Black spandex daily blazer, size L', 24.1);
insert into product (id, name, unitprice) values (6, 'Blue wool dynamic briefs, ages 3–10', 79.0);
insert into product (id, name, unitprice) values (7, 'Blue spandex ultrasonic dress, child’s size', 31.9);
insert into product (id, name, unitprice) values (8, 'Gold wool daily boxers, ages 3–10', 8.85);
insert into product (id, name, unitprice) values (9, 'Red cotton utility boxers, ages 2–5', 28.9);
insert into product (id, name, unitprice) values (10, 'Gray polyester ultrasonic briefs, ages 3–10', 15.3);
-- ...
It also creates the tables, including invoice and lineitem tables. It's still a bit of a dull accounting example, rather than something like food, superheroes, social networks, zoo animals, sports, or dating, but I think the randomness does add a little bit of humor.Although now we have LLMs, and maybe they'd do a better job.
I made a pointless program to help w/ this on macOS for kicks: https://github.com/radiofreejohn/xattrkv
When I made it I also found a bug in the xattr implementation for Darwin and submitted it to Apple, they eventually fixed it.
I would say without transactions it is not a database yet from a practical standpoint.
You can have a really simple two-phase commit system where you initially mark all records as 'pending' and then update them as 'settled' once all the necessary associated rows have been inserted into their respective tables. You can record a timestamp so that you know where to resume the settlement process from. I once had multiple processes doing settlement in parallel by hashing the ids to map to specific processes so it scales really well.
Two-phase commit is a particular way of implementing transaction when system is distributed. There is no "instead" here
Not that I would aspire to implement a general-purpose database. But even smaller tasks can make my mind spin too much.
As a different example: I'm moving this week. I've known I'm moving for a while. Thinking about moving -- and all the little things I have to do -- is way more painful than doing them. Thinking about them keeps me up at night, getting through my list today is only fractionally painful.
I'm also leveling up a few aspects of my "way of living" in the midst of all this, and it'd be terribly boring to tell others about it, but when next Monday comes.. it'll be quite sweet indeed.
this sounds familiar... :)
For me it is all about believing that I’ll succeed and realizing that the belief doesn’t really correlate with technical aspect as much as I think it does.
If I believe I won’t succeed, I spend every moment trying to find the problem that will finally end me. And every problem becomes a death sentence.
If I believe I’ll succeed, problems become temporary obstacles and all my focus is on how I’ll overcome the current obstacle.
Edit: the flush example (2nd one) in the recap section does the same thing, when the text says that the records are supposed to be written to the file in sorted order.
I wish this came out earlier, there are a few insights in there that took me me a while to understand :)
Maybe give credit?
While the text itself is my own words, the logical structure and the examples were indeed based off DDIA's chapter 3. I dropped the ball here - the site has been updated with proper attribution.
This is a tad bit misleading, as the LSM is used for the node-level storage engine, but doesn’t explain how the overall distributed system scales to 80 million rps.
iirc the original Dynamo paper used BerkeleyDB (b-tree or LSM), but the 2012 paper shifted to a fully LSM-based engine.
It was actually very fun; a key-value database is something that can be any level of difficulty that you want. If you want a simple KV "database", you could just serialize and deserialize a JSON string all the time, or write a protobuf, but there is of course no limit to the level of complexity.
I use the JSON example because that was actually how I started; I was constantly serializing and deserializing JSON with base64 binary encoded strings, just because it was easy and good enough, and over the course of building the project I ended up making a proper replicated database. I even had a very basic "query language" to handle some basic searches.
That said, a lot of the concepts come from riak_core, which is FOSS: https://github.com/OpenRiak/riak_core
> Databases were made to solve one problem:
> How do we store data persistently and then efficiently look it up later?
There are indeed things described as "databases" which are made to solve that problem, but more commonly such things are instead called "file formats" or, to old IBMers, "access methods".
As I see it, the much more interesting problem that databases solve, the one that usually distinguishes what we call "databases" from what we call "file formats", is query evaluation:
> How do we organize a large body of information such that we can easily answer questions efficiently from it?
Prolog, Datalog, QBE, QUEL, and SQL are approaches to expressing the questions, and indexing, materialized views, query planning, resolution, the WAM, and tabled resolution are approaches to answering them efficiently.
dbm is not a database. ISAM is not a database. But SQLite in :memory: is still a database.
4ndrewl•3mo ago
>
> "How do we store data persistently and then efficiently look it up later?"
Isn't that two problems?
dayjaby•3mo ago
SirFatty•3mo ago
cjbgkagh•3mo ago
stvltvs•3mo ago
BetaDeltaAlpha•3mo ago
stvltvs•3mo ago
https://www.sciencenewstoday.org/do-black-holes-destroy-or-s...
SahAssar•3mo ago
The "efficiently" part can be considered a separate problem though.
prerok•3mo ago
So, if we consider that persistent storage is a solved problem, then we can say that the reason for databases was how to look up data efficiently. In fact, that is why they were invented, even if persistent storage is a prerequisite.
nonethewiser•3mo ago
grokgrok•3mo ago
Efficiency of storage or retrieval, reliability against loss or corruption, security against unwanted disclosure or modification are all common concerns, and the relative values assigned to these features and others motivate database design.
kiitos•3mo ago
reconstructing past memory states is rarely, if ever, a requirement that needs to be accommodated in the database layer
nonethewiser•3mo ago
In another context perhaps you're ingesting data to be used in analytics. Which seems to fit the "reconstruct past memory stat" less.
grokgrok•3mo ago
i_k_k•3mo ago
elygre•3mo ago
Good times.
thomasjudge•3mo ago
pcdevils•3mo ago
hxtk•3mo ago
datadrivenangel•3mo ago
0 - https://www.youtube.com/watch?v=3t6L-FlfeaI
archerx•3mo ago
warkdarrior•3mo ago
mewpmewp2•3mo ago
Etheryte•3mo ago
pratik661•3mo ago
rzzzt•3mo ago
theideaofcoffee•3mo ago
nonethewiser•3mo ago
I guess there is a rather fine line between philosophy and pedantry.
Maybe we can think about it from another angle. If they are 2 problems databases were designed to solve, then that means this is a problem databases were designed to solve: storing data persistently.
Is that really a problem database were designed to solve? Not really. We had that long before databases. It was already solved. It's a pretty fundamental computer operation. Isn't it fair to say this is one thing? "Storing data so it can be retrieved efficiently."
gingersnap•3mo ago
mrighele•3mo ago
didip•3mo ago
whartung•3mo ago
No, that would be regexes.
mamcx•3mo ago
How, in ACID way, store data that will be efficiently look it up later by a unknown number of clients and unknown access patterns, concurrently, without blocking all the participants, in a fast way?
And then add SQL (ouch!)
lelanthran•3mo ago
> Isn't that two problems?
Only if you're creating a write-only database, in which case just write it to /dev/null.