frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The length of file names in early Unix

https://utcc.utoronto.ca/~cks/space/blog/unix/UnixEarlyFilenameLenghts
77•ingve•8mo ago

Comments

raydenvm•8mo ago
File names were limited to just 14 characters - UNIX file system called as "Version 6 filesystem". Directory entries, for instance, consisted of: - 2 bytes for the inode number - 14 bytes for the filename

After that, the next file systems went up to 255 characters.

jmclnx•8mo ago
Interesting read and I found out things I did not know.

My first UNIX was Wang IN/ix, which was 14 characters. That was in the very late 80s. Not long afterwards for home use I got Coherent OS, that also had a max of 14.

Later was Slackware, seemed the sky was the limit.

But to be honest, I wish there was still a smaller file name limit, I usually keep my file names size small. But I know I am in a tiny minority when it comes to the size of the file name :)

rwmj•8mo ago
Pretty sure the original Minix filesystem name length was also limited to 14 bytes.

Edit: Yes:

https://github.com/gdevic/minix1/blob/3475e7ed91a3ff3f8862b2...

https://github.com/gdevic/minix1/blob/master/fs/type.h

trollbridge•8mo ago
I don't miss dealing with 8.3 filename conventions at all (or the 255-byte size limit DOS had for file paths, so a list of subdirectories can't typically be more than 30 or so deep).
thesuitonym•8mo ago
Windows 11 still has this limit baked in for some reason.

EDIT: Upon further reading, it appears MS did finally get rid of this restriction, but some 32-bit apps still don't play nice.

qingcharles•8mo ago
I still feel there is some limit because I'm constantly finding that I can't nest long folder names very deep.
AStonesThrow•8mo ago
This would be something that depends on the filesystem. Windows 11 supports various different types.

Some would even have configurable parameters.

dcminter•8mo ago
Huh, I never knew there was a Wang Unix before. Presumably this was on the VS8000 systems? I found this brochure mentioning it:

https://www.1000bit.it/ad/bro/wang/vs8000.pdf

"Wang also offers a UNIX System V.2-compatible operating system, IN/ix for the VS8000 series" and a footnote mentions that it's due in 1990

My Dad did some work with Wang 2200 systems, but by 1990 it had become clear that the IBM PC compatible was inevitable and he'd switched to Niakwa Basic instead of Wang systems for his customers (mostly running a bespoke small business payroll system).

clausecker•8mo ago
One generation of the BSD FFS had a 64 B limit. May have been only 2.11BSD though.
zabzonk•8mo ago
Anyone else remember messing around with inode fixing tools? Used to leave me with my bowels quaking. This would have been mid-80s, but can't remember unix version or the tool names now.
flyinghamster•8mo ago
I recall icheck, dcheck, and clri, and never had to use fsdb.

Also, mknod for populating entries in /dev, creating fifos, etc.

kjs3•8mo ago
Luxury. You haven't lived until you had to use 'adb' to try and debugger a filesystem.
rootbear•8mo ago
I share your trepidation. I once had to use clri and such to clean up a mess and it was terrifying.
qingcharles•8mo ago
I think I was still doing this in the mid-90s?
themadsens•8mo ago
Oh yes. For one thing "hexdump -C ." would work out nicely in those days.
somat•8mo ago
Isn't hexdump the modern punk upstart.

In my day we had to use od, and were happy to have it, now get off my lawn.

Full disclosure, I am one of those "modern punk kids" but my first boss was firmly in the od generation. and all his documentation referenced it as such. hexdump is ergonomic heaven in comparison.

Someone•8mo ago
IIRC, that 14-byte limit stayed relevant for quite a while because ar (https://en.wikipedia.org/wiki/Ar_(Unix)#File_header), on some systems, was limited to 14-byte file names, even though the default file system allowed way longer names.

I remember hitting that problem on HP-UX while building gcc and GNU tools.

Aside: https://en.wikipedia.org/wiki/Ar_(Unix) says both “The ar format has never been standardized” and “Depending on the format, many ar implementations include a global symbol table (aka armap, directory or index) for fast linking without needing to scan the whole archive for a symbol. POSIX recognizes this feature, and requires ar implementations to have an -s option for updating it.”

Does that mean POSIX defines what the CLI ar tool must do, but not what the files it writes must look like?

MathMonkeyMan•8mo ago
Looks like it. There's [ar][1] and that's all I can find about it. [pax][2] is linked at the bottom, but is not required to understand the ar format.

[1]: https://pubs.opengroup.org/onlinepubs/7908799/xcu/ar.html

[2]: https://pubs.opengroup.org/onlinepubs/7908799/xcu/pax.html

mhw•8mo ago
I think pax was more of a POSIX replacement for cpio and tar. While those three and ar all could be used to bundle files together into a single file, only ar had the special affinity with the linker as a way of bundling object files together into libraries. The others were more aimed at backing up directories to a tape drive, as the example on the pax man page shows.
pjmlp•8mo ago
Yes, that is the joy people discover regarding UNIX/POSIX portablitly, or any other standards based environment actually.
rafaepta•8mo ago
Never wondered why 14. But makes perfect sense. Early Unix feels like a masterclass in minimalism under constraint. Also fun detail: Unix V3 had 8-character limits before DOS made it famous.
layer8•8mo ago
To be fair, it was 8 plus 3 characters in DOS, which with the separator dot would translate to 12 characters on Unix.
jaoane•8mo ago
Just to be clear the dot wasn’t stored in the DOS FAT. Therefore a name would take up 11 bytes.
layer8•8mo ago
Yes, and just to be clear, presenting that name with the same readability under Unix would take up 12 bytes. Or for example, just copying files from DOS to Unix.
_mlbt•8mo ago
I am still convinced that they made the wrong design trade off when it comes to C-style strings. A Pascal-style string with a length prefix byte takes up the same amount of memory as a null terminated string but is immune to buffer overflow and faster to strlen as well. The only time C style strings offer up any advantages are when dealing with a string that is longer than 255 bytes, which given the memory constraints of the era were incredibly rare.
pjmlp•8mo ago
See PL/I, NEWP, JOVIAL, predating C for a decade.

Also even with 255 limitation, there could be optimizations in place with struct/union, similar to small string optimization done in C++ and modern languages.

However even considering this was a non-starter in PDP-7/11, there is no excuse why since 1989, WG14 never considered adding capabilities to the standard library similar to SDS, or language improvements like fat pointers (There was even an attempt proposal from Ritchie himself).

They had 36 years to improve this.

staplung•8mo ago
I feel like C-style strings were nearly inevitable given C's array-pointer duality law. Square bracket array operations just translate into simple pointer arithmetic. To have a length prefix byte you'd need to either make the compiler treat char* differently than other arrays/pointers or you'd need to have the programmer always account for the special byte at the beginning of the string.
rootbear•8mo ago
I’ve wondered if there should be two string types in C, analogous to different length numeric types. Short strings (“tokens”? “words”?) could be Pascal style strings and longer strings (“buffers”) could be standard C-style null terminated strings. I’ve never tried to work this out for real, it just seems to me that one-size-fits-all strings might not be the best model.
zzo38computer•8mo ago
I think it would be better to have the string type being the address and length pair. Pascal strings store the length immediately before the data, which is different than what I think is better is to store the length separately.
tiffanyh•8mo ago
I’m old enough to recall local user account names being limited to 8-characters.
layer8•8mo ago
I was wondering whether this was strongly tied to the initial file name limit, since home directories were named after the user.

And what do you mean "old enough", I still had new support accounts restricted to 8 characters by clients this year. ;)

einr•8mo ago
This wasn't even that long ago if you were on Solaris. We used to be a Solaris shop, and even though we migrated our last Solaris boxes to Linux about 7 years ago, most of our usernames still adhere to the 8 character limit for the sake of consistency.
TheOtherHobbes•8mo ago
DEC's TOPS-10 had six chars plus a three char extension - 6.3

CP/M and MS-DOS extended this to a much more generous 8.3

tssva•8mo ago
TRSDOS for the TRS-80 line of computers was like CP/M heavily influenced by TOPS-10. It allowed 8 characters names with a 3 letter extension but used “/“ instead of “.” as the separator.
drob518•8mo ago
I worked at an F500 company a couple years ago that still restricts all its email addresses to 8 characters because those double as your corporate ID and sometimes got used in Unix systems with an 8-character restriction. I don’t know if they still have any systems with that restriction. They probably don’t know either, and it has become a cargo cult.
bitwize•8mo ago
I remember the 14-character limit from the Xenix system I worked on as a kid. Xenix was one of the reasons I got into Linux in my late teens, and it blew my mind how much more flexible the latter system was, despite being very familiar.
mixmastamyk•8mo ago
I tried Minix as my first unix-like, which had similar limitations. But was used to Commodore and DOS so it felt more advanced. Linux was still a year or so away.

I remember the thrill of multitasking at the command line for the first time with &, blew my mind. :-D

Can’t remember how I obtained Minix before having access to the internet. Maybe downloaded from a BBS? Or possibly an instructor had a copy on floppy.

AStonesThrow•8mo ago
Minix was generally distributed with Andrew Tanenbaum’s book. When I installed it on my 286, it was on 3.5” floppies — two dozen or so.
mixmastamyk•8mo ago
Hmm, maybe I borrowed from a teacher. I only remember maybe two floppies however. Perhaps there was a smaller subset to install.

I do remember that many floppies for Win95 and Slackware, though a few years later.

bitwize•8mo ago
Early versions of Minix, as I recall, were supposed to be runnable off two floppies, for the benefit of those with PCs or XTs that didn't have a hard disk.
AStonesThrow•8mo ago
Surely I am exaggerating from memory. It looks like the base system for Minix could fit on one HD floppy (1.44MB).

My PS/2 HDD was only 20MB, and I recall having enough free space after the install. For Minix 1.5, there's another estimate here of 9 DD floppies -- 6.5MB compressed. That seems about right.

I definitely went for the compiler and all the accessories. The C compiler was far from ANSI-compliant and barely K&R. It made for some interesting times.

Minix 3 is BSD-licensed, and distributed on CD-ROM now, and can still be purchased from Pearson! https://www.pearson.com/en-us/subject-catalog/p/operating-sy...

didgetmaster•8mo ago
Can you imagine trying to have meaningful file names today on these new 20+ TB drives that hold a few hundred million files, if file names were still limited to 8 or even 14 characters?
megapoliss•8mo ago
For everage user nothing changes - each folder will just have "thumb.db" with all metadata and long names.
layer8•8mo ago
You could maintain an index for the 2^64 or 2^112 names this allows.
duped•8mo ago
PATH_MAX is still comically small today.
throw0101b•8mo ago
> PATH_MAX is still comically small today.

Per getconf -a, I see PATH_MAX (and _POSIX_PATH_MAX) as 4096. Is that small? What would be not-small?

duped•8mo ago
Yes that's small, it's also incorrect. The correct thing is to recognize PATH_MAX can't be defined by the kernel or in limits.h, so what you're seeing is a hint and not the actual limit.

"Not small" is "limited only by resource constraints." Software often breaks when it hits large (but correct!) paths even though there's no technical limitation to using them, and valid ways to construct them, even if POSIX APIs are required by spec to fail for some valid paths because of arbitrary limits.

Linux is actually pretty good about ignoring unnecessary error conditions even if it violates the spec, other unixes not so much.

wahern•8mo ago
The Linux kernel doesn't actually support path names longer than 4095 (4096 with NUL). See fs/namei.c:getname_flags, which is used early in every syscall that takes a file path. That function will attempt to copy the path into kernel space, but fails when the length exceeds 4095: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... (strncpy_from_user is defined in lib/strncpy_from_user.c)

IIRC, some real-world systems set PATH_MAX to INT_MAX (Solaris?), but I don't know if any modern Unix systems support arbitrary length paths. Everybody demanded features that required the kernel to cache the path in the kernel, at which point the notion of just walking the userspace path buffer (no separate allocation, no need for a limit) went out the window.

EDIT: glibc sets NL_TEXTMAX to INT_MAX. I always get confused when discussing this issue. NL_TEXTMAX made good on the threat that these MAX macros might be (effectively) unlimited and so shouldn't be used to size buffers, but I don't think any system ever did so with PATH_MAX.

AStonesThrow•8mo ago
So, a user should be able to exhaust resources by making paths longer and longer until it just errors out?

And programmers should allocate buffers of absurd size in order to contain pathnames of unpredictable depth?

rurban•8mo ago
Also paths need to stay identifiable. And you would not be able to recognize differences of such overlong identifiers in the middle. Unicode confusables aside
lupusreal•8mo ago
I regularly hit it when using yt-dlp, particularly from twitter where it puts the text of the tweet into the file name by default.
jwilk•8mo ago
That's more likely NAME_MAX, which is 255.
NoMoreNicksLeft•8mo ago
Yeh, but everyone still has DOS PTSD. Underscores instead of spaces, sticking to 7bit ascii characters, etc. so not much has changed. On the rare occasion that I need a slash in a filename, I've been using the full-width solidus unicode character...

Also discovered that I could fool the Macos Finder file sorting the other day if I put zero-width spaces in between numerals in the filename. Not sure how I feel about that one though.

hulitu•8mo ago
> Underscores instead of spaces

After being biten so many times with spaces or special characters in filenames, one learns.

pjmlp•8mo ago
By the way, C compilers had a similar limitation on their symbol tables.
pavlov•8mo ago
I vaguely recall there was an early-1980s microcomputer Forth that only supported 3-letter identifiers. You could use longer variable names if you wanted, but the parser just silently truncated them, so "TYPE" and "TYPOGRAPHY" would both refer to "TYP" internally.

This sounds like a very onerous limitation, so the company's ads had copy that consisted only of three-letter words, to make the case that it's not so bad after all...

(I'm guessing the engine used four bytes to store a reference, and it needed one byte for other metadata.)

layer8•8mo ago
Who would ever need more than 17576 identifiers, especially with the memory limitations of the time? ;)
drob518•8mo ago
Forths created for constrained environments will sometimes use this sort of dictionary header structure, but it’s not just 3 characters. It also typically includes a length of the whole thing. So, TYPE would be TYP/4 (inventing a notation there) versus TYPOGRAPHY as TYP/10. While you can still get clashes, you’d be surprised how much better it does than a straight prefix.

EDIT: Sorry, I realize I didn't explain that very well. A Forth system has a "dictionary" that stored the names of words (think procedures or functions) as well as their code. The dictionary is compressed like this (e.g., TYP/4 or TYP/10). When you're writing code, you write TYPE or TYPOGRAPHY in the code and the compiler searches the dictionary for either TYP/4 or TYP/10 depending on what you typed, ignoring the other one. Most of the time this works. If you have a bunch of words that are the same length with different suffixes, they will clash (e.g., AVGX and AVGY both reduce to AVG/4).

pavlov•8mo ago
Thanks, that must be it. It makes sense as a very simple way to generate fixed 32-bit keys from words.
drob518•8mo ago
Exactly. It's a tradeoff between storage space and flexibility. Forths built for larger environments will store the whole string and won't worry about it. If you're a 32-bit Forth with GBs of memory in the system, it's no big deal. But remember that Forths are often used in 16-bit systems with KBs of memory (max 64 KB), so they're trying to be quite careful with each byte allocated. The fact that Forth can operate in such environments is what makes them popular for embedded programming.
shakna•8mo ago
I wonder if adopting soundex for such a system would be more pleasant, for dealing with spelling mistakes, or worse, because of shadowing.
drob518•8mo ago
I think that would typically be worse. Lots of letters "sound the same" and get collapsed into a reduced number of distinct Soundex letters.
teo_zero•8mo ago
> I vaguely recall there was an early-1980s microcomputer Forth that only supported 3-letter identifiers

That's still one letter more than the earlier Microsoft BASIC ;)

sdf_pubnix•8mo ago
You can play with various versions of UNIX from UNIC v0 to SVR3 and BSD at https://unix50.org through SDF Vintage Systems. Additionally, access to Multics as well as other various historical operating systems is available through their museum.