frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Show HN: Goboscript, text-based programming language, compiles to Scratch

https://github.com/aspizu/goboscript
38•aspizu•1h ago•8 comments

New research reveals the strongest solar event ever detected, in 12350 BC

https://phys.org/news/2025-05-reveals-strongest-solar-event-bc.html
146•politelemon•3d ago•72 comments

The principles of database design, or, the Truth is out there

https://ebellani.github.io/blog/2025/the-principles-of-database-design-or-the-truth-is-out-there/
50•b-man•4h ago•39 comments

Spaced repetition systems have gotten better

https://domenic.me/fsrs/
828•domenicd•19h ago•441 comments

“There are people who can see and others who cannot even look”

https://worldhistory.substack.com/p/there-are-people-who-can-see-and
106•crescit_eundo•6h ago•16 comments

InventWood is about to mass-produce wood that's stronger than steel

https://techcrunch.com/2025/05/12/inventwood-is-about-to-mass-produce-wood-thats-stronger-than-steel/
9•LorenDB•18h ago•1 comments

Ditching Obsidian and building my own

https://amberwilliams.io/blogs/building-my-own-pkms
312•williamsss•14h ago•331 comments

Layers All the Way Down: The Untold Story of Shader Compilation

https://moonside.games/posts/layers-all-the-way-down/
36•birdculture•4h ago•13 comments

Llama from scratch (2023)

https://blog.briankitano.com/llama-from-scratch/
12•sebg•3d ago•0 comments

Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

https://github.com/brianmg/voynich-nlp-analysis
320•brig90•14h ago•97 comments

Font Activations: A Note on the Type

https://robhorning.substack.com/p/font-activations
25•prismatic•2d ago•0 comments

France Endorses UN Open Source Principles

https://social.numerique.gouv.fr/@codegouvfr/114529954373492878
384•bzg•8h ago•88 comments

Show HN: Job board aggregator for best paying remote SWE jobs in the U.S.

https://www.remoteswe.fyi
39•xitang•6h ago•28 comments

$30 Homebrew Automated Blinds Opener

https://sifter.org/~simon/journal/20240718.html
240•busymom0•13h ago•103 comments

Spaced Repetition Memory System

https://notes.andymatuschak.org/Spaced_repetition_memory_system
210•gasull•15h ago•19 comments

The Connoisseur of Desire

https://www.nybooks.com/articles/2025/05/29/the-connoisseur-of-desire-the-annotated-great-gatsby/
16•samclemens•2d ago•0 comments

Show HN: A platform to find tech conferences, discounts, and ticket giveaways

https://www.tech.tickets/
62•danthebaker•2d ago•21 comments

Show HN: Vaev – A browser engine built from scratch (It renders google.com)

https://github.com/skift-org/vaev
170•monax•13h ago•96 comments

Hyper Typing

https://pscanf.com/s/341/
75•azhenley•10h ago•55 comments

K-Scale Labs: Open-source humanoid robots, built for developers

https://www.kscale.dev/
86•rbanffy•11h ago•41 comments

The Tongue Is a Fire

https://www.lrb.co.uk/the-paper/v47/n09/ferdinand-mount/the-tongue-is-a-fire
17•Petiver•3d ago•5 comments

Comparing Parallel Functional Array Languages: Programming and Performance

https://arxiv.org/abs/2505.08906
68•vok•2d ago•11 comments

Show HN: Python Simulator of David Deutsch’s “Constructor Theory of Time”

https://github.com/gvelesandro/constructor-theory-simulator
65•SandroG•10h ago•7 comments

The Fall of Roam (2022)

https://every.to/superorganizers/the-fall-of-roam
111•ingve•12h ago•58 comments

What do wealthy people buy, that ordinary people know nothing about? (2015)

https://old.reddit.com/r/AskReddit/comments/2s9u0s/comment/cnnmca8/
126•Tomte•15h ago•175 comments

In Memoriam: John L. Young, Cryptome Co-Founder

https://www.eff.org/deeplinks/2025/05/memoriam-john-l-young-cryptome-co-founder
193•coloneltcb•3d ago•23 comments

Publisher: The Malloy Semantic Model Server

https://github.com/malloydata/publisher
16•cpard•2d ago•0 comments

Mystical

https://suberic.net/~dmm/projects/mystical/README.html
389•mmphosis•1d ago•44 comments

Emergent social conventions and collective bias in LLM populations

https://www.science.org/doi/10.1126/sciadv.adu9368
62•jbotz•14h ago•18 comments

Show HN: Buckaroo – Data table UI for Notebooks

https://github.com/paddymul/buckaroo
93•paddy_m•15h ago•8 comments
Open in hackernews

The principles of database design, or, the Truth is out there

https://ebellani.github.io/blog/2025/the-principles-of-database-design-or-the-truth-is-out-there/
50•b-man•4h ago

Comments

AnonHP•2h ago
Seems like this article places too much emphasis on normalization, which is appropriate for many cases, but may be a huge cost and performance issue for requirements like reporting. You may probably need different kinds of schema and data storage structures for different requirements in the same application, which in turn may result in duplicated data, but results in acceptable trade offs.
weinzierl•2h ago
" Every base relation should be in its highest normal form (3, 5 or 6th normal form). "

If I remember my database lessons correctly there is no strictly highest normal form. It progresses from 1NF to BCNF, but above that it is more choosing different trade-offs.

Even below it is always a trade-off with performance and that is why we most of the time aim for 3NF, and sometimes BCNF.

moi2388•1h ago
That’s what I was taught as well. And even then I use it more as a rule of thumb
zeroCalories•1h ago
Putting aside performance implications, I get kinda irritated by having to do joins for basic queries all the time.
adamcharnock•2h ago
> A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

I fairly strongly disagree with this. Database identifiers have to serve a lot of purposes, and natural key almost certainly isn’t ideal. Off the top my head, IDs can be used for:

- Joins, lookups, indexes. Here data type can matter regarding performance and resource use.

- Idempotency. Allowing a client to generate IDs can be a big help here (ie UUIDs)

- Sharing. You may want to share a URL to something that requires the key, but not expose domain data (a URL to a user’s profile image shouldn’t expose their national ID).

There is not one solution that handles all of these well. But using natural keys is one of the least good options.

Also, we all know that stakeholders will absolutely swear that there will never be two people with the same national ID. Oh, except unless someone died, then we may reuse their ID. Oh, and sometimes this remote territory has duplicate IDs with the mainland. Oh, and for people born during that revolution 50 years ago, we just kinda had to make stuff up for them.

So ideally I’d put a unique index on the national ID column. But realistically, it would be no unique constraint and instead form validation + a warning on anytime someone opened a screen for a user with a non-unique ID.

Then maybe a BIGINT for database ID, and a UUID4/7 for exposing to the world.

EDIT: Actually, the article is proposing a new principle. And so perhaps this could indeed be a viable one. And my comment above would describe situations where it is valid to break the principle. But I also suspect that this is so rarely a good idea that it shouldn’t be the default choice.

Jarwain•1h ago
Why have both a database ID and UUIDv7, versus just a UUIDv7?
sroussey•58m ago
There is a security principle to not expose real identifiers to the outside world. It makes a crack in your system easier to open.
jandrewrogers•46m ago
Actually, it should be a database ID and an encrypted database ID, which doesn’t require storing a second ID. Even better, you can make that key unique per session so that users can’t share keys. For security reasons, it is a bad idea to leak private state, which UUIDv7 does.

A single AES encryption block is the same size as a UUID and cheap to compute.

adamcharnock•13m ago
> A single AES encryption block is the same size as a UUID and cheap to compute.

I didn’t realise this! The UUID spec mandates some values for specific digits, so I assume this would not be strictly valid UUIDs?

jiggawatts•1h ago
> will absolutely swear that there will never be two people with the same national ID...

I suddenly got flash-backs.

There are duplicate ISBN numbers for books, despite the system being carefully designed to avoid this.

There are ISBN numbers that have invalid checksums, but are valid ISBNs with the invalid number in the barcode and everything. Either the calculation was incorrectly done, or it was simply a mis-print.

The same book can have hundreds of ISBNs.

There is no sane way to determine if two such ISBNs are truly the same (page numbers and everything), or a reprint that has renumbered pages or even subtly different content with corrected typos, missing or added illustrations, etc...

Our federal government publishes a master database of "job id" numbers for each profession one could have. This is critical for legislation related to skilled migrants, collective workplace agreements, etc...

The states decided to add one digit to these numbers to further subdivide them. They did it differently, of course, and some didn't subdivide at all. Some of them have typos with "O" in place of "0" in a few places. Some states dropped the leading zeroes, and then added a suffix digit, which is fun.

On and on and on...

The real world is messy.

Any ID you don't generate yourself is fraught with risk. Even then there are issues such as what happens if the database is rolled back to a backup and then IDs are generated again for the missed data!

bruce511•1h ago
I'm with you. I've used natural keys in the past, and they've always been a problem eventually.

On the other hand I've used surrogate keys for 20 years, and never encountered an issue that wasn't simple to resolve.

I get there are different camps here, and yes your context matters, but "I'm not really interested in why natural keys worked for you." They don't work for me. So arguments for natural keys are kinda meh.

I guess they work for some folk (shrug).

jandrewrogers•55m ago
> Allowing a client to generate IDs can be a big help here (ie UUIDs)

Trusting the client to generate a high-quality ID has a long history of being a bad idea in practice. It requires the client to not be misconfigured, to not be hacked, to not be malicious, to not have hardware bugs, etc. A single server can generate hundreds of millions of IDs per second and provides a single point of monitoring and control.

treyd•30m ago
In context I read that as database client, meaning the application server (which is a client to the database) providing the service to the user. Having that be able to generate IDs could be useful when needing to refer to the same entity, even if there is data that has to exist in some separate database for some reason.
adamcharnock•19m ago
That is indeed what I had in mind, although I did leave it intentionally vague as everyone can asses what’s best for their own situation
qazxcvbnm•11m ago
I’ve once attempted to implement a solution where ids are generated by UUIDv5 from a certain owner and the relationship of the new item to the owner; that way, users cannot generate arbitrary ids but can still predict ahead of time their new ids to ease optimistic behaviour.
Joel_Mckay•1m ago
In a way, you are both right...

Modern distributed systems almost always use compound binary packed GUID: EPOCH_TIME, IP, MAC, PID, memory Address-offset, Account ID, and or signed object hash. Thus, the node knows 100% for sure the key is always globally unique, and still preserves its origin.

This makes inefficient SQL design given it de-normalizes most structures, but memory storage cost is cheap compared to the features gained abandoning incremental/indexed keys. Also, combining localized transaction state expected-state pre-conditions in the query with the key packed with breadcrumbs solves problems you don't know you have yet (including non-blocking options.)

In general, many projects end up just implementing an object store in SQL eventually. Yes it is terrible design, but also a convenient bodge =3

sitharus•12m ago
This isn’t a new principle, it was part of database design courses in the early 2000s at least. However from a couple of decades of bitter experience I say external keys should never be your primary keys. They’ll always change for some reason.

Yes you can create your tables with ON UPDATE CASCADE foreign keys, but are those really the only places the ID is used?

Sometimes your own data does present a natural key though, so if it’s fully within your control then it’s a decent idea to use.

madduci•2h ago
Many of the principles and also the example provided for PED cannot be mapped easily through an ORM library and AFAIK Java JPA doesn't handle it too.

Why does it matter? I have seen that many developers rely totally only on the code to manage entities on the database, instead of relying on prepared statements and pure SQL queries. This obviously opens a door for poor optimisation, since these Entity Management libraries don't support certain SQL capabilities.

jbverschoor•2h ago
That’s non argument. Just use a better ORM. Hibernate is able to do that for about 20 years.

That said, I’m not a fan of natural keys as primary keys. Especially composite keys. This just takes everybody back to the 80s/early 90s.

It only makes sense when there’s a huge storage benefit

mrkeen•2h ago
> Principle of Essential Denotation (PED): A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

  create table citizen (
    national_id national_id primary key,
    full_name text);
Is national_id really a natural key, or is it someone else's synthetic key? If so, should the owner of that database have opted for a natural key rather than a synthetic key?

More arguments for synthetic over natural keys: https://blog.ploeh.dk/2024/06/03/youll-regret-using-natural-...

tuatoru•2h ago
The "natural key" for a (natural) person is compound: full name and mother's full name, plus date, time and place of birth. Your birth certificate is your primary identification document.

However that still runs into problems of nondurability of the key in cultures that delay naming for a few years. To name one problem.

So yeah, use a big enough integer as your key, and have appropriate checks on groups of attributes like this.

However, if you are only interested in citizens, then a "natural" key is the citizen id issued by the government department responsible for doing that. (Citizen is a role played by a natural person so probably doesn't have too many attributes.) I still wouldn't use that as a primary key on the citizen table, though.

bouke•1h ago
That natural key isn’t guaranteed to be unique.
stevoski•28m ago
I know someone who doesn’t know when she was born, nor who her mother is.

She doesn’t have a birth certificate.

She was born in a country that was enduring several years of brutal war.

I know another person whose national ID was changed. Systems that use national ID as primary key failed to accept this change.

anon7000•6m ago
https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...
rawgabbit•2h ago
I was going to comment on this. Natural keys sound like a good idea and they should enforced maybe by using a unique constraint.

Natural keys are important. But the real world and the databases that represent them are messy. People’s identities get stolen. Data entry mistakes and integration between systems fail and leave the data in a schizophrenic state.

In my experience I find arguments about natural keys unproductive. I usually try to steer the conversation to the scenarios I mentioned above. Those who listen to me will have a combination of synthetic and natural keys. The first is used to represent system state. The second is used to represent business processes.

atomicnumber3•1h ago
Natural keys are also all too often PII. A surrogate key that's just pure entropy is much safer to blast all over the place in logs and error messages and so on.
rawgabbit•1h ago
I usually encourage people to place all PII in a separate table. Only those who engage with customers e.g., verifying customers identities should have access. Furthermore images of customer identity cards are strictly forbidden. You can enter their passport number, name, address, birthdate etc. but copies of identity documents will make you a target of hackers and angry customers. The rep can ask the customer to show the document or in the worst case present a copy but the copy should immediately be deleted.
sroussey•50m ago
PII in a separate db. Encrypted like you would a credit card card number.

BTW: email+password should be separated too. An early draft of GDPR specifically mentioned that, though the final version got less into the weeds.

I’m sure if you vibe code any of this, it will all be plaintext, lol.

weinzierl•2h ago
" Every base relation should be in its highest normal form (3, 5 or 6th normal form). "

If I remember my database lessons correctly there is no strictly highest normal form. It progresses from 1NF to BCNF, but above that it is more choosing different trade-offs.

Even below it is always a trade-off with performance and that is why we most of the time aim for 3NF, and sometimes BCNF.

xlii•1h ago
I have a joke in context that I often like to tell:

Devil captured Physicist, Engineer and Mathematician. He gave each of them big can of spam and locked them in the empty room saying „you will be here for 2 weeks - open the can and survive or die to starvation”. After 2 weeks Devil opens Physicist cell. It’s covered floor to the ceiling in complex scribbles. One piece of wall is clean of etching but small dent is visible. Can of span is opened and eaten clean, Physicist sits in corner visibly annoyed. Next one is Engineer. Cell walls are covered in multiple dents and pieces of spam. Engineer is bruised almost as much as the can, but it is ultimately opened and engineer is alive.

Finally the Devil opens Mathematician cell and find him dead. Only „given the cylinder” is etched on the wall.

—-

Puent isn’t about engineering but it always helped me to set limits between software engineering and computer science.

moi2388•1h ago
“ Principle of Full Normalization (POFN) : Every base relation should be in its highest normal form (3, 5 or 6th normal form)”

No it shouldnt.

JSR_FDED•21m ago
Please, feel free to elaborate so we can all learn
pretoriusdre•1h ago
I really don't like using natural keys as primary keys.

Natural keys sometimes need to change for unforeseen reasons, such as identity theft, and this is really tricky to manage if those keys are cascaded into many tables as foreign keys.

Natural keys are often not unique either. Using the national ID example, there are millions of duplicate SSNs issued within USA. https://www.computerworld.com/article/1687803/not-so-unique....

So, don't use natural keys as primary keys. Put them in as surrogate keys, ideally with a unique constraint.

jiggawatts•1h ago
The "natural ID" for people design reminds me of a story from a state department of education: They had two students, both named John Smith Jr. They were identical twins and attending the same class.

They had the same birth date, school, parents, phone number, street address, first name, last name, school, teachers, everything...

The story was that their dad was John Smith Sr in a long line of John Smiths going back a dozen generations. It was "a thing" for the family line, and there was no way he was going to break centuries of tradition just because he happened to have twins.

Note: In very junior grades the kids aren't expected to memorise and use a student ID because they haven't (officially) learned to read and write yet! (I didn't use one until University.)

jandrewrogers•1h ago
This takes an overly simple view of what domains can look like. There are data models that necessarily violate these principles, and they aren’t all that rare.

Some examples:

> A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity

In some domains there is no natural key because the identity is literally an inference problem and relations are probabilistic. The objective of the data model is to aggregate enough records to discover and attribute natural keys with some level of confidence. A common class of data models with this property are entity resolution data models.

> All information in the database is represented explicitly and in exactly one way

Some data models have famously dual natures. Cartographic data models, for example, must be represented both as a graph models (for routing and reachability relationships) and as geometric models (for spatial relationships). The “one true representation” has been a perennial argument in mapping for my entire life and both sides are demonstrably correct.

> Every base relation should be in its highest normal form (3, 5 or 6th normal form).

This is one of those things that sounds attractive because it ignores that it requires no ambiguities about domain boundaries or semantics, which doesn’t exist in practice. I bought into this idea too when I was a young and naive data modeler. Trying to tamp out these ambiguities adds an unbounded number of data model epicycles that add a lot of complexity and performance loss. At some point, strict normalization is not worth the cost in several aspects.

In almost all cases, it is far more important that the data model be efficient to work with than it be the abstract platonic ideal of a domain model. All of these principles have to work on real hardware in real operational environments with all of the messy limitations that implies.

peanut-walrus•25m ago
No. Real life rarely has natural keys that are unique and do not change. For example the national id number in several countries can change in some circumstances...and that is already a synthetic key.
stevoski•24m ago
Bad luck if you don’t yet have (or know) your national ID.

National id is not something issued at birth in the country I live in. It’s something applied for at a certain age.

sitharus•18m ago
Where I live there’s no such thing as national ID. There’s a few documents that can be used as such depending on the purpose, and some of those change the number on every update!

Never trust something outside your system to be stable.

cess11•18m ago
"Databases are representations of reality"

"tell the truth that is out there"

Both truth and representation are very slippery, many-faceted concepts, encumbered with millennia of use and philosophy. Using them in this way is deceptive to the junior and useless to the senior.