In POSIX, you can theoretically use inode zero

https://utcc.utoronto.ca/~cks/space/blog/unix/POSIXAllowsZeroInode

68•mfrw•3d ago

Comments

Animats•1d ago

It's been a long time since what user space sees as an "inode" has anything to do with the representation within the file system.

nulld3v•1d ago

Also, there seems to be an effort brewing in the kernel to push userspace away from depending on inode #s due to difficulty in guaranting uniqueness and stability across reboots. https://youtu.be/TNWK1zbTMOU

AndrewDavis•1d ago

They definitely aren't unique even without reboots. Postfix uses the inode number as a queue id. At $dayjob we've seen reuse surprisingly quickly, even within a few hours. Which is a little annoying when we're log spelunking and we get two sets of results because of the repeating id!

(there is now a long queue id option which adds a time component)

amiga386•1d ago

...but it's unique while the file exists, right?

The combination of st_dev and st_ino from stat() should be unique on a machine, while the device remains mounted and the file continues to exist.

If the file is deleted, a different file might get the inode, and if a device is unmounted, another device might get the device id.

the_mitsuhiko•1d ago

> The combination of st_dev and st_ino from stat() should be unique on a machine

It should, but it seems no longer to be the case. I believe there was an attempt to get a sysctl flag in to force the kernel to return the same inode for all files to see what breaks.

AndrewDavis•1d ago

Yes! It's reusable, but not duplicated.

londons_explore•1d ago

> ...but it's unique while the file exists, right?

I don't think all filesystems guarantee this. Especially network filesystems.

the_mitsuhiko•1d ago

It's effectively impossible to guarantee this when you have a file system that unifies and re-exports. Network file systems being an obvious one, but overlayfs is in a similar position.

Even if inodes still work nowadays they will eventually run into issues a few years down the line.

account42•17h ago

Then unifying file systems is not something that POSIX support and a POSIX system shouldn't do it unless it can somehow map inodes with POSIX semantics. E.g. for a network mount spanning multiple remote filesystems you could also have multiple st_dev locally.

amiga386•1d ago

That's a problem for programs that do recursive fs descent (e.g. find, tar) because they use st_dev and st_ino alone for remembering what directories they've been in. They can't just use the absolute path, because symbolic links allow for loops.

find:

* https://cgit.git.savannah.gnu.org/cgit/findutils.git/tree/fi...

tar:

* https://cgit.git.savannah.gnu.org/cgit/tar.git/tree/src/crea...

* https://cgit.git.savannah.gnu.org/cgit/tar.git/tree/src/name...

* https://cgit.git.savannah.gnu.org/cgit/tar.git/tree/src/incr...

In particular, I'm intrigued by the comment in the last link:

      /* With NFS, the same file can have two different devices
         if an NFS directory is mounted in multiple locations,
         which is relatively common when automounting.
         To avoid spurious incremental redumping of
         directories, consider all NFS devices as equal,
         relying on the i-node to establish differences.  */

So GNU tar expects an inode to be unique across _all_ NFS mounts...

the_mitsuhiko•1d ago

You are not wrong, but the issues with tar are well known. Linus himself had this to say [1]:

> Well, the fact that it hits snapshots, shows that the real problem is just "tar does stupid things that it shouldn't do".

> Yes, inode numbers used to be special, and there's history behind it. But we should basically try very hard to walk away from that broken history.

> An inode number just isn't a unique descriptor any more. We're not living in the 1970s, and filesystems have changed.

You might still get away with it most of the time today, but it's causing more and more issues.

[1]: https://lkml.iu.edu/hypermail/linux/kernel/2401.3/04127.html

amiga386•1d ago

That sounds like blaming userspace.

If it's not the 1970s anymore, then update the POSIX standard with a solution that works for all OSes (including the BSDs) and can be relied upon. Definitely don't suggest a Linux-only solution for a Linux-only problem.

dwattttt•1d ago

Have you checked what POSIX has to say about inode numbers? It may say less than you think.

amiga386•1d ago

https://pubs.opengroup.org/onlinepubs/009696799/basedefs/sys...

> The st_ino and st_dev fields taken together uniquely identify the file within the system.

It says exactly what it ought to say.

dwattttt•1d ago

Issue 8 (2024, https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sy...) relaxed the language there. st_ino and st_dev still uniquely identify a file, but it now notes that the duration it identifies it for is not indefinite.

As an example, it offers that the identity of a file that's deleted could be reused.

amiga386•1d ago

It says pretty much what I said at the start of the thread (https://news.ycombinator.com/item?id=44157026), and yet this is what Linux is having problems complying with:

> A file identity is uniquely determined by the combination of st_dev and st_ino. At any given time in a system, distinct files shall have distinct file identities; hard links to the same file shall have the same file identity. Over time, these file identities can be reused for different files. For example, the st_ino value can be reused after the last link to a file is unlinked and the space occupied by the file has been freed, and the st_dev value associated with a file system can be reused if that file system is detached ("unmounted") and another is attached ("mounted").

I still think POSIX says exactly what it needs to say, and Linux ought to either comply with it, or lead the standardisation process on what should be done instead.

Don't say "tar is old". Tar's problems with Linux are the same problems that find, zip, rsync, cp and all other fs walking programs have. If memorising st_dev and st_ino are no good, tell us what cross-platform approach should be taken instead.

Brian_K_White•1d ago

This. You can't break a fundamental assumption without providing it's replacement, and call anyone else stupid.

"A centimeter is no longer based on anything and has an unpredictable length. Rulers always did stupid things relying on that assumption."

the_mitsuhiko•1d ago

> You can't break a fundamental assumption without providing it's replacement, and call anyone else stupid.

Sure, within the bounds of what's documented you are right. However tar is going beyond what either standard or Linux guarantee so a lot of bets are off.

The guarantee that tar wants is not given by any FS that recycles inodes and most importantly, tar already completely disregards the file-system locality when network drives are involved.

The actual issue here is that both tar and Linux are just in a tough situation because a) the POSIX spec is problematic b) no alternative API exists today. Something has to give.

the_mitsuhiko•1d ago

Again, you are not wrong. This is all clearly not intended. However it has become a challenge to map things like Btrfs subvolumes (when seen from a Btrfs mount) onto POSIX semantics [1].

You are absolutely right that ideally there is an update to the POSIX standard. But things like this take time and it's also not necessarily clear yet what the right path here is going forward. You can consider a lot of what is currently taking place as an experiment to push the envelope.

As for if this is a Linux specific problem I'm not sure. I'm not sufficiently familiar with the situation on other operating systems to know what conversations are taking place there.

[1]: https://lwn.net/Articles/866582/

db48x•1d ago

ZFS has the same problem, for the same reasons. But it also has additional reasons. The simplest of them is that inode numbers are 64–bit integers but ZFS filesystems can have up to 2¹²⁸ files.

jcranmer•1d ago

There is no solution, much less one that is portable across different Unixen.

The core problem is that, because of the ability of filesystems to effectively contain other filesystems within them, the number of bits to uniquely identify a file within a filesystem is not a constant number across different filesystem types. It's a harder problem on Linux because Linux is also full of filesystems that aren't really filesystems, where trying to come up with a persistent, unique identifier for people to use is a lot more bother than it's really worth.

account42•17h ago

> is a lot more bother than it's really worth

According to who? Clearly there are user space utilities that need a (somewhat) persistent identifier to work correctly.

account42•17h ago

Sounds like Linus is advocating for ... breaking userspace. We are truly living in the end times.

koverstreet•1d ago

The combination of st_ino and the inode generation is guaranteed to be unique (excepting across subvolumes, because snapshots screw everything up). Filesystems maintain a generation number that's incremented when an inode number is being used, for NFS.

Unfortunately, it doesn't even seem to be exposed in statx (!). There's change_cookie, but that's different.

If anyone wants to submit a patch for this, I'll be happy to review it.

quotemstr•1d ago

The problem isn't relying on inode numers; it's inode numbers being too short. Make them GUIDs and the problems of uniqueness disappear. As for stability: that's just a matter of filesystem durability in general.

the_mitsuhiko•1d ago

> The problem isn't relying on inode numers; it's inode numbers being too short.

It's a bit of both. inodes are conflating two things in a way. They are used by the file system to identify a record but they are _also_ exposed in APIs that are really cross file system (and it comes to a head in case of network file systems or overlayfs).

What's a more realistic path is to make inodes just an FS thing, let it do it's thing, and then create a set of APIs that is not relying on inodes as much. Linux for instance is trying to move towards file handles as being that API layer.

bastawhiz•1d ago

You could make it bigger, but then your inode table gets pretty big. If an inode number is 32 bits today, then UUIDs would take up four times the space. I'd also guess that the cost of hashing the UUIDs is significant enough that you'd see a user-visible performance hit.

And really, it's not even super necessary. 64-bit inode numbers already exist in modern file systems. You don't need UUIDs to have unique IDs forever: you'll never run out of 64-bit integers. But the problem wasn't really ever that you'd run out, the problem is in the way they're handled.

quotemstr•1d ago

> You could make it bigger, but then your inode table gets pretty big.

You could do it like Java's Object.identityHashCode() and allocate durable IDs only on demand.

> If an inode number is 32 bits today, then UUIDs would take up four times the space.

We probably waste more space on filesystems that lack tail-packing.

> I'd also guess that the cost of hashing the UUIDs is significant enough that you'd see a user-visible performance hit.

We're hashing filenames for H-tree indexing anyway, aren't we?

> you'll never run out of 64-bit integers

Yeah, but with 128-bit ones you'll additionally never collide.

account42•17h ago

> Yeah, but with 128-bit ones you'll additionally never collide.

dd if=/dev/sda of=/dev/sdb would like a word with you.

quotemstr•12h ago

Touche. TBD, all forms of stored ID can collide in that case, so I'm not sure it counts

inkyoto•15h ago

> You could do it like Java's Object.identityHashCode() and allocate durable IDs only on demand

The two real issues here are 1) inode numbers are no longer sequentially allocated as back in the 1970-80's, and 2) an inode number in a modern file system usually carries extra significant information.

In XFS, the inode number is derived from the disk address (block address) of the inode within the file system. Specifically, the inode number encodes:

  • The allocation group (AG) number.

  • The block offset within the AG.

  • The inode’s index within its block.

It is not entirely unfathomable that other modern file systems likely apply similar heuristics to support the dynamic on-disk data structure allocation. So turning to pseudorandom hash values is not going to work – not easily anyway, and not without a redesign of the on-disk data structures.

Then, there is an issue of the finite size (or width) of the inode field (64 bits, 128 bits etc). On an exascale file system with a very high file turnover and a huge number of files, at some point a previously allocated inode will have to be recycled, irrespective of whether it was a pseudorandom or a calculated number. It is not a problem for most installations as the exascale is not there, but I don't think the problem can be solved using non-esoteric approaches.

the_mitsuhiko•1d ago

The OpenBSD UFS documentation says this:

> The root inode is the root of the file system. Inode 0 can't be used for normal purposes and historically bad blocks were linked to inode 1 (inode 1 is no longer used for this purpose; however, numerous dump tapes make this assumption, so we are stuck with it). Thus the root inode is 2.

This is also echoed on the wikipedia page for it.

The linux Kernel also has this comment for why it does not dish out that inode for shmem for instance:

> Userspace may rely on the the inode number being non-zero. For example, glibc simply ignores files with zero i_ino in unlink() and other places.

On macOS it's pretty clear that inode 0 is reserved:

> Users of getdirentries() should skip entries with d_fileno = 0, as such entries represent files which have been deleted but not yet removed from the directory entry

duckerude•1d ago

A file descriptor can't be -1 but it's not 100% clear whether POSIX bans other negative numbers. So Rust's stdlib only bans -1 (for a space optimization) while still allowing for e.g. -2.

jcalvinowens•1d ago

It used to happen on Linux with tmpfs, but kernel doesn't allow it anymore: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds...

It turns out that glibc readdir() assumes inode zero doesn't happen and the files are "invisible" to anything using libc. But you can call getdents() directly and see them.

I actually ran into this on a production machine once a few years ago, a service couldn't restart because a directory appeared to be "stuck" because it had one of these invisible zero inode files in it. It was very amusing, I figured it out by spotting the invisible filename in the strace output from getdents().

etbebl•21h ago

Isn't writing an article like this just daring someone to make the system in question more cursed, more likely to produce errors, harder to reason about, etc.? In the same spirit as, "well technically this common assumption about C behavior is undefined, so let's add some nasal demons to save 2 us (and make me look clever)"?

Am I missing something or is this just evil? I guess I'm taking it too seriously.

account42•17h ago

Not really, you should understand the limitations of the system you are working with, including expected limitations that don't actually exist or only exist for some parts.

For example, knowing that inodes 0 are technically possible will tell you to not rely on inode 0 as a special magic number. It's when parts of the system disagree about details like this that you get issues, not when you learn about them.

Precious Plastic Is in Trouble

Deep learning gets the glory, deep fact checking gets ignored

A deep dive into self-improving AI and the Darwin-Gödel Machine

Show HN: Ephe – A Minimalist Open-Source Markdown Paper for Today

Human Brain Cells on Chip for Sale – First biocomputing platform hits the market

Destination: Jupiter

Brain aging shows nonlinear transitions, suggesting a midlife "critical window"

Patched (YC S24) Is Hiring SWEs in Singapore

Covert Web-to-App Tracking via Localhost on Android

The Small World of English

Show HN: AirAP AirPlay server - AirPlay to an iOS Device

Show HN: I wrote a Java decompiler in pure C language

Mapping latitude and longitude to country, state, or city

New study casts doubt on the likelihood of Milky Way collision with Andromeda

Meta pauses mobile port tracking tech on Android after researchers cry foul

Show HN: Localize React apps without rewriting code

'Wind theft': The mysterious effect plaguing wind farms

Polish engineer creates postage stamp-sized 1980s Atari computer

Show HN: An Alfred workflow to open GCP services and browse resources within

(On | No) Syntactic Support for Error Handling

Show HN: Gradle plugin for faster Java compiles

The Shape of the Essay Field

CVE-2024-47081: Netrc credential leak in PSF requests library

Show HN: Controlling 3D models with voice and hand gestures

Ask HN: Options for One-Handed Typing

Changing Directions

Quarkdown: A modern Markdown-based typesetting system

Swift at Apple: Migrating the Password Monitoring Service from Java

How much do language models memorize?

Vision Language Models Are Biased

Precious Plastic Is in Trouble

Deep learning gets the glory, deep fact checking gets ignored

A deep dive into self-improving AI and the Darwin-Gödel Machine

Show HN: Ephe – A Minimalist Open-Source Markdown Paper for Today

Human Brain Cells on Chip for Sale – First biocomputing platform hits the market

Destination: Jupiter

Brain aging shows nonlinear transitions, suggesting a midlife "critical window"

Patched (YC S24) Is Hiring SWEs in Singapore

Covert Web-to-App Tracking via Localhost on Android

The Small World of English

Show HN: AirAP AirPlay server - AirPlay to an iOS Device

Show HN: I wrote a Java decompiler in pure C language

Mapping latitude and longitude to country, state, or city

New study casts doubt on the likelihood of Milky Way collision with Andromeda

Meta pauses mobile port tracking tech on Android after researchers cry foul

Show HN: Localize React apps without rewriting code

'Wind theft': The mysterious effect plaguing wind farms

Polish engineer creates postage stamp-sized 1980s Atari computer

Show HN: An Alfred workflow to open GCP services and browse resources within

(On | No) Syntactic Support for Error Handling

Show HN: Gradle plugin for faster Java compiles

The Shape of the Essay Field

CVE-2024-47081: Netrc credential leak in PSF requests library

Show HN: Controlling 3D models with voice and hand gestures

Ask HN: Options for One-Handed Typing

Changing Directions

Quarkdown: A modern Markdown-based typesetting system

Swift at Apple: Migrating the Password Monitoring Service from Java

How much do language models memorize?

Vision Language Models Are Biased

In POSIX, you can theoretically use inode zero

Comments