Nullable but not null

https://efe.me/posts/nullable-but-not-null/

63•efeoge•17h ago

Comments

lblume•16h ago

> But a field that is nullable in the schema and never null in practice is a silent lie.

This seems to be the central claim. But just as lies require intent, so does database design to some degree.

A column that is nullable but never null does not conclusively say anything, really. That's like saying a birthday column that is never before 1970 in the current data should be restricted to years after that date.

A nullable column signals that the data may be left empty, which is entirely different from that column actually having to be empty with the current data. Is-ought distinction, the potentiality of null ("nullability") is not to be confused with the actuality of current vacancy ("null values"). The programmer extracting data from the database must check for null if the data is nullable, not just if is null now as a principle of robust fault-tolerant code.

matsemann•16h ago

Problem is you end up other places with the assumption thar it's never null. So in the future when you actually set it to null somewhere it will blow up.

tuyiown•16h ago

People that assume that a (nullable) value is never null because the sample at hand does not contain a null value _will_ learn their lesson the hard way, the one you describe.

But even non-nullable does not always resist to time, I'd argue that use cases where the field _has_ to be null eventually emerges and somehow have to be mitigated. There is no easy solution to safely workaround that without either tons work that duplicates lots of things or taking risks by adapting the code base.

darkwater•16h ago

I think their point is that for all intents, the column IS not nullable. It's nullable as an artifact of making live schema migration easier, with no blocking/downtime. But as the data model is concerned, it should not be nullable.

lblume•16h ago

Sure, if one just leaves a column nullable due to negligence, one should check the actual requirements and enforce them to make invalid states unrepresentable. The author still makes too strong of a claim that becomes detached from the migration aspect, insinuating that one can just empirically test the database to check whether this is the case, to which I disagree.

jerf•16h ago

I don't think the author is talking generally about fields that could be NULL but just happen to never be so in the production DB. The piece is specifically in the context of a new database field that is fully intended by its designer to be NOT NULL, which was NULL only for migration purposes, and which was never updated to be NOT NULL once the migration is complete. The point was not meant to be extended beyond that.

One could write a separate piece about maybe using that as a clue that the field could be NOT NULL'd in the future but that's not what this post is.

lblume•16h ago

In that case we read it entirely differently — the last paragraph explicitly mentions inferring "Nullable fields that never hold nulls are often the result of incomplete migrations or forgotten cleanup steps". The author also proposes a script "to identify cases where the field could safely be changed to non-nullable". But alas, we cannot do that with full generality, due to there being a big difference in intent that cannot be captured purely by counting nulls, and surely not by only calculating percentages.

dataflow•16h ago

> That's like saying a birthday column that is never before 1970 in the current data should be restricted to years after that date.

No it's not, because they specifically started with the premise that the field was initially intentionally non-null and was only temporarily made nullable for migration purposes. That is obviously not the situation you are describing, right?

codingdave•16h ago

How would the database know whether the other app layers depend on that value or not? You could absolutely have an app that does not require data in a specific field to function, yet all records happen to have data. This is actually fairly common in single-tenant apps, where some tenants populate a field and others do not. You need to look at how the data is used across the entire stack to know whether or not it should be nullable, not whatever the current data happens to be.

cerved•15h ago

It doesn't. That's why it's the responsibility of the application layer to correctly implement the data model of the database and not the other way around

danbruc•16h ago

A column that is nullable but never null might indicate that it should be non-nullable but does not necessarily imply so. Say you have an optional comment field, it might just happen by accident that a comment was recorded for each row, but that of course becomes increasingly unlikely with each additional row in the table. There is probably no harm in checking your database for such columns, especially in tables with many rows, but in the end it is up to the semantics of the data model whether a column should be nullable or not. The absence of NULLs is an indicator but not a determiner.

Szpadel•15h ago

there is also other scenario, field might obviously looks like it should have value, and in check it might event always have it, but it might be "lazy" value.

eg. you might have some bug CSV uploaded and your have number of rows in it, your app could insert record without this number and async process would fill that later.

there might be even some corner case where null value is possible

I believe solution here isn't to check what fields do not use null, but to improve process of creating such migration. either you should create second ticket for next release to update db or commit new migration to some kind of next branch.

marcosdumay•15h ago

> There is probably no harm in checking your database for such columns

The harm is the same as any other unreliable linter rule.

Each one such rule is almost harmless. And on most places that use that kind of rule, they are extremely harmful.

danbruc•14h ago

With that I agree, once you start to treat the output of any linter as the truth, you are actively lowering the quality of your codebase. At the very least if it has reasonably good quality, if it is in bad shape, blindly following linters might still increase code quality, you will just not reach the peak.

mdavid626•15h ago

I completely agree. Unfortunately, this is one of those things that’s hard to convince people of. You often hear: ‘It doesn’t matter,’ ‘It works fine without it,’ or ‘Why overcomplicate things?’—but that mindset can be limiting.

deepsun•15h ago

I've seen worse. Some teams use JSON for their data. Not only each field can be missing (aka NULL), it can also be "null". Or a different type.

I envy your team who's only mistake is to forget setting NULLABLE. Rainbows and unicorns ;)

thomas-st•15h ago

That's just the JSON equivalent of "we have data, and it's null" vs "data is missing", and consistency could be enforced using a constraint or by making it non-NULL in the first place.

It's more common in string fields, which in many cases just get rendered on a web form that doesn't differentiate blank (empty string) from null state, therefore in the database we should in most cases set it up as follows:

- Value required: field should be non-NULL + at least length 1 (via check constraint)

- Value optional: either field is non-NULL, or field is nullable + at least length 1

I'm curious if you prefer to store optional strings as non-NULL and rely on the length, or as nullable and have a length constraint.

lock1•15h ago

What if almost everything is NULLABLE? including the supposedly primary key and foreign keys of the table?

I've had the firsthand experience building a consumer for that kind of DB and it's hell to get anything running correctly without literally writing a dedicated layer to sanity check everything.

phplovesong•14h ago

Sounds like a bad domain, and/or poor design.

gopher_space•11h ago

> without literally writing a dedicated layer to sanity check everything

It's not attractive to developers for a variety of reasons, but encapsulation is generally cheaper (in every sense) than the alternatives by orders of magnitude. If a system is hard to alter and other people rely on it to do their jobs, thinking about it as set in stone during my planning process will save me grief and heartache.

Starting with an independent client in mind makes it easier to plan around things like reconstructing tables from composed exports or dealing with authoritatively incorrect data. It's a good collection point for hard-coded edge case handling, and a great location for all of your discovery notes.

antif•9h ago

No shame on JSON.. XML can do this too!

toledocavani•6h ago

Worked in an enterprise project where they use a BPMN engine for all business logic (designed by non-technical people).

Each process can either have predefined fields (take time to implement) or key-string value store (yes, they chose this).

Either the BPMN logic or some JSs (some, as not all people can code) in PBMN nodes or some Java hooks (a lot, wrote under those other people requests) edit those values.

So when something's wrong, we swam in the sea of key-value, search all the code for who update this field, and why it's value is not JSON nor null nor boolean but a string "Not used, waiting for policy 1234".

After that project, I'm happy every time I can use a DB to store data, more so if it can run in local, much more so if it's relational.

Trigg3r•15h ago

Am I missing something here in my (MS) SQL world? if a new field is added as null, I do that to the (now) 20 year old system to we don't break 100's of stored procs - any (new) code that needs that field, has to check for it being null...

minkeymaniac•12h ago

in sql server, you can simply add a not null check constraint with nocheck (see my comment with full code)

ALTER TABLE foo WITH NOCHECK ADD CONSTRAINT CheckNotnull CHECK (id IS NOT NULL)

any new values coming in cannot be null but the values already in the table with null are fine... then you can update them to not null over time

stux•14h ago

This is interesting! A field being nullable because it's legitimately optional in the domain model is one thing, but for new fields which shouldn't be nullable in the domain model, unless you can pick a reasonable identity value, you need a concept of absence that's different from null. Luckily the intersection of "non-nullable fields" and "fields with no reasonable identity value" and "fields which didnt exist in v1 of the domain model" is normally pretty small, but it's painful when it happens.

This reminds me of frozen/nonfrozen enums in Swift. You can do exhaustive case analysis on frozen enums, but case analysis on nonfrozen enums requires adding an `@unknown default` case.

https://docs.swift.org/swift-book/documentation/the-swift-pr...

nlitened•1h ago

> you need a concept of absence that's different from null

Could you give an example? Null _is_ absence, the way I see it

PeterZaitsev•14h ago

Should not we look for database to be able to do online, efficient non locking addition of column with any default value, not just NULL rather than application to have a complicated and fragile logic ?

avg_dev•14h ago

I believe PostgreSQL does this since v11, which was released in 2018: (current is v17)

> Many other useful performance improvements, including the ability to avoid a table rewrite for ALTER TABLE ... ADD COLUMN with a non-null column default

https://www.postgresql.org/docs/release/11.0/

I think there is some restriction there, like the default can't be "volatile" - I can't remember the precise definition here but I think current_timestamp would be volatile, but any static value would not.

tudorg•13h ago

That is correct, for non-volatile default values Postgres is quick, which means that it is generally a safe operation.

Also interesting, `now()` is non-volatile because it's defined as "start of the transaction". So if you add a column with `DEFAULT now()` all rows will get the same value. But `timeofday()` is not volatile, so `DEFAULT timeofday()` is going to lock the table for a long time. A bit of a subtle gotcha.

avg_dev•12h ago

Thanks for the info. One minor point:

> But `timeofday()` is not volatile, so `DEFAULT timeofday()` is going to lock the table for a long time.

Perhaps the “not” was a typo?

HocusLocus•14h ago

I'm glad to see people discussing [zero,infinitesimals,false,emptystring,isnull,unpopulated,missing] as if each one is a thing. They've always been things! We've just been buried in compromises and shortcuts all these years.

There should also be a [the-asteroid-has-hit-y'all-are-so-stupid] and global data systems should just just pass that around after impact until the power goes out for good.

comrade1234•13h ago

I've never seen step 5 happen...

tudorg•13h ago

For a tooling solution for this problem, and many others, pgroll (https://github.com/xataio/pgroll) automates the steps from the blog post in a single higher-level operation. It can do things like adding a hidden column, backfill it with data, then adds the constraint, and only then expose it in the new schema.

jonny_eh•13h ago

I recommend using checklists for schema changes like this.

minkeymaniac•12h ago

If you don't care for old data having null , you could add a check contraint with nocheck (this is sql server fwiw)

for example

create table foo(id int) insert foo values (1), (2), (3)

insert foo values (null)

select * from foo

NULL

ALTER TABLE foo with nocheck ADD CONSTRAINT CheckNotnull check (id IS NOT NULL)

insert foo values (null)

Msg 547, Level 16, State 0, Line 13 The INSERT statement conflicted with the CHECK constraint "CheckNotnull". The conflict occurred in database tempdb", table "dbo.foo", column 'id'. The statement has been terminated.

However be aware that if you update an existing value to NULL, you will still get the error

update foo set id = null where id = 2

Msg 547, Level 16, State 0, Line 20 The UPDATE statement conflicted with the CHECK constraint "CheckNotnull". The conflict occurred in database "tempdb", table "dbo.foo", column 'id'.

nlitened•46m ago

From my experience, any new field that gets added to existing tables (after a non-trivial system got deployed to production environment with actual clients) must always be nullable.

Firstly, it’s more practical and brings fewer surprises to all people involved (especially if there are many people involved).

Secondly, if we’re being pedantic, it is a mere acknowledgement of the fact that the field was not there before and things worked without it, and now the field is here, so in all senses the field _is actually_ optional, so all the touching code should know how to handle this.

Do not download the app, use the website

Open Sauce is a confoundingly brilliant Bay Area event

Turn any diagram image into an editable Draw.io file. No more redrawing

It's time for modern CSS to kill the SPA

CCTV Footage Captures the First-Ever Video of an Earthquake Fault in Motion

Show HN: Auto Favicon MCP Server

The Rise and Fall of the Hanseatic League

Users claim Discord's age verification can be tricked with video game characters

Simon Tatham's Portable Puzzle Collection

It's a DE9, not a DB9 (but we know what you mean)

Never write your own date parsing library

Windsurf employee #2: I was given a payout of only 1% what my shares where worth

Why MIT switched from Scheme to Python (2009)

Efficient Computer's Electron E1 CPU – 100x more efficient than Arm?

Why I Do Programming

Vanilla JavaScript support for Tailwind Plus

Animated Cursors

Ambigrammia: Between Creation and Discovery (Hofstadter, 2025)

Experimental surgery performed by AI-driven surgical robot

The future is not self-hosted

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope

Developing our position on AI

What is X-Forwarded-For and when can you trust it? (2024)

A Union Pacific-Norfolk Southern combination would redraw the railroad map

CO2 Battery

Programming vehicles in games

Women dating safety app 'Tea' breached, users' IDs posted to 4chan

Generic Containers in C: Vec

Researchers value null results, but struggle to publish them

Show HN: Apple Health MCP Server

Nullable but not null

Comments

Do not download the app, use the website

Open Sauce is a confoundingly brilliant Bay Area event

Turn any diagram image into an editable Draw.io file. No more redrawing

It's time for modern CSS to kill the SPA

CCTV Footage Captures the First-Ever Video of an Earthquake Fault in Motion

Show HN: Auto Favicon MCP Server

The Rise and Fall of the Hanseatic League

Users claim Discord's age verification can be tricked with video game characters

Simon Tatham's Portable Puzzle Collection

It's a DE9, not a DB9 (but we know what you mean)

Never write your own date parsing library

Windsurf employee #2: I was given a payout of only 1% what my shares where worth

Why MIT switched from Scheme to Python (2009)

Efficient Computer's Electron E1 CPU – 100x more efficient than Arm?

Why I Do Programming

Vanilla JavaScript support for Tailwind Plus

Animated Cursors

Ambigrammia: Between Creation and Discovery (Hofstadter, 2025)

Experimental surgery performed by AI-driven surgical robot

The future is not self-hosted

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope

Developing our position on AI

What is X-Forwarded-For and when can you trust it? (2024)

A Union Pacific-Norfolk Southern combination would redraw the railroad map

CO2 Battery

Programming vehicles in games

Women dating safety app 'Tea' breached, users' IDs posted to 4chan

Generic Containers in C: Vec

Researchers value null results, but struggle to publish them

Show HN: Apple Health MCP Server