I'm curious to take a closer look at fmtlib/fmt, which APIs treat FILE as non-opaque?
Edit: ah, found some of the magic, I think: https://github.com/fmtlib/fmt/blob/35dcc58263d6b55419a5932bd...
I'm curious how much speedup is gained from this.
It was a then-important optimization to do the most common operations with macros since calling a function for every getc()/putc() would have slowed I/O down too much.
That's why there is also fgetc()/fputc() -- they're the same as getc()/putc() but they're always defined as functions so calling them generated less code at the callsite at the expense of always requiring a function call. A classic speed-vs-space tradeoff.
But, yeah, it was a mistake that it originally used a "char" to store the file descriptor. Back then it was typical to limit processes to 20 open files ( https://github.com/dspinellis/unix-history-repo/blob/Researc... ) so a "char" I'm sure felt like plenty.
"For you the day you changed your ABI was the most important day in your life, but for me? It was Tuesday"
I enjoy the dichotomy between how bad the Linux project is at changing their ABI and how good OpenBSD is at the same task.
Where for the most part Linux just decides to live with the bad ABI forever. and if they do decide it actually needs to be changed it is a multi year drama with much crying and missteps.
I mean sure, linux has additional considerations that make breaking the ABI very scary for them. the big one is the corpus of closed source software, but being a orders of magnitude bigger project and their overall looser integration does not help any.
From my perspective as a user who wants to have his programs keep working whenever the OS updates and as a programmer who does not want to waste their time playing nanny with broken dependency upgrades for previously working code (working in the sense that it did what it was supposed to do), the Linux project is actually doing the thing the right way and OpenBSD the bad way. It is basically the #1 reason i never considered using OpenBSD.
Linux' stance on not breaking backwards compatibility is exactly what i want from an OS. Now if only the userspace libraries weren't so happy to break things too...
[1]: https://github.com/freebsd/freebsd-src/commit/c17bf9a9a5a3b5...
[2]: https://github.com/freebsd/freebsd-src/commit/19e03ca8038019...
[3]: https://github.com/freebsd/freebsd-src/blob/main/include/std...
Obviously making FILE opaque completely breaks every program that used this feature, so no surprise it was reverted.
stdin, stdout, and stderr were already pointers rather than array element addresses, and the external symbol references to __stdinp, __stdoutp, and __stderrp did not change; compiled code using the old macros continued to work as the actual structure layout was not changed; compiled code using FILE* would have continued to work as the pointer implementation didn't change; compiled C++ code with C++ function parameter overloading would have continued to link as the underlying struct type did not change; source code using the ferror_unlocked() and suchlike function-like macros would have not needed changing as there were already ferror_unlocked() and suchlike functions and those remained.
Looking at things like https://reviews.freebsd.org/D4488 from 2015 there was definitely stuff in the ports tree that would have broken back in 2008. But that won't break now should this change be made again, and that's not base.
What actually broke was libftpio, a library that was in base up until 2011, and definitely won't break now, nearly 14 years after being removed for being orphaned after sysinstall(8) itself has gone away.
* https://cgit.freebsd.org/src/commit/lib/libftpio?id=430f2c87...
It may not be all encompassing,but I was referring to GNU/Linux. you can swap out bits and pieces, but what mainstream distros include by default, that's what I meant.
Quite acceptable for not having the headache for things breaking.
When you have major platform updates like windows NT rewrite back in 2000, windows 8 and now windows 11, they're opportunities to shed legacy things. The choice should have been to keep supporting a long-term-stable version of windows for security fixes (like XP or Win7) and get rid of tech designed to support old software.
Their problem now is they want everyone to be on win10 and then win11 and then whatever they come up with, or else.
You can carry legacy dependencies with you to new major versions or you can support old versions for security fixes long-term.
That said, there isn't a one to one correlation between the gains of dropping a legacy component and the increased capability to innovate either. Windows is highly modular, and maintaining legacy compatibility rarely is a blocker for a line of innovation, nor is a high budget workload.
I wonder though, what kind of innovation do you want to see but think that it's held back by legacy components?
>FILE Encapsulation: In previous versions, the FILE type was completely defined in <stdio.h>, so it was possible for user code to reach into a FILE and muck with its internals. We have refactored the stdio library to improve encapsulation of the library implementation details. As part of this, FILE as defined in <stdio.h> is now an opaque type and its members are inaccessible from outside of the CRT itself.
https://devblogs.microsoft.com/cppblog/c-runtime-crt-feature...
Ugh, no, it should not. As a user i prefer my existing programs to keep working whenever i update my OS and as a developer i prefer to work on new code than playing nanny with existing previously working code (working code here means the code did the task it was supposed to do) because some dependency broke itself.
Sadly, the userland does not have the same mindset as very few libraries and languages care about backwards compatibility.
However at least on Linux it is possible to do that if you care by choosing languages and libraries that do not break compatibility without worrying that the OS will break things. On OpenBSD you may write your program in C (which has strong backwards compatibility as a language) and use only libraries such as curl, cairo, opengl, openal and x11 that in general do not break backwards compatibility and yet have the OS throw all that effort out of the window.
Features shouldn't be ridden of just because either, there needs to be a good reason, like a better alternative.
On Windows the user space APIs do not change though - and Windows has a MUCH larger API exposed compared to Linux. Of course stuff does break occasionally, but considering how much software is released constantly for Windows, that breakage is incredibly minimal (and Microsoft doesn't seem to care as much about it as they used to in the past).
> For example, why hasn't ripgrep (and similar tools) replaced grep as a default?
I do not think there is a case of replacing anything here, you can use ripgrep alongside grep just fine, they are two different programs.
However there are way more people knowing grep and a ton of scripts, applications, etc, assume grep is there so it makes sense to both avoid breaking existing working stuff and take advantage of people's existing knowledge. If someone needs ripgrep's performance they can use that instead.
> i get supporting old stuff, that's why there are LTS releases.
LTS releases exist largely exactly because stuff breaks way more than it should.
> But I think it is the fear of not being able to shed enough old things in new releases to displace the maintenance burden of LTS type releases that's the big issue.
No, there is nothing about "fear" here, it is purely practical: breaking stuff that works is not a good thing. If there is some better way to do things then that is fine but 99.9999% of the cases it can be implemented alongside the existing stuff (sometimes even having the existing stuff use the new thing) instead of breaking people's code (and, even worse, forcing them to waste time learning how to use a new tool just so they can do the exact same thing they were already doing previously).
> Features shouldn't be ridden of just because either, there needs to be a good reason, like a better alternative.
No, a "better alternative" is not a good reason by itself, to get rid something you must make sure that practically nobody uses it. For example, if there was a bug on a feature or program that made it unusable and yet nobody bothered to report the bug in years, that is a good indicator nobody is using that feature / program so that it can be removed. But as long as there are people who rely on things, you should never break them.
You can’t just memcpy the bits and then mix calls to fread using pointers to the old and the new FILE struct, for example. I think the standard library need not even support calls using a pointer to a FILE struct it didn’t create.
You certainly shouldn't, but sadly this is something which people do.
See https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/202306...
When the pointer is passed back into libc, libc can combine the pointer with an internal capability that has the actual size/range of the structure.
This isn't _too_ different to having libc just hand out arbitrary integers as FILE; libc has to have some way to map the 'FILE' back to the real structure.
The big breaking change is usually the historical implementation of the standard streams as addresses of elements of an array rather than as named pointers. (Plauger's example implementation had them as elements 0, 1, and 2 of a _Files[] array, for example.) It's possible to retain binary compatibility with unrecompiled code that uses the old getc/putc/feof/ferror/fclearerr/&c. macros by preserving structure layouts, but changing stdin, stdout, and stderr can make things not link.
And indeed that has happened here.
The warning, and the bumping of several shared library major version numbers, is most definitely about the standard streams breaking binary, not source as you have it, compatibility. Any newly compiled binary that is using the C standard streams won't run on old shared libraries because of the new symbol references for __stdin, __stdout, and __stderr.
Does anyone know why this change was done? Security reasons? Preparing for future changes?
...
struct _IO_FILE;
/* The opaque type of streams. This is the definition used elsewhere. */
typedef struct _IO_FILE FILE;
... struct _IO_FILE
{
int _flags;
...
so I suppose this means it's indeed expanded somewhere and, thus, not opaque?EDIT: after some more thinking I assume the key is that we wouldn't be able to have a variable of type FILE, but a pointer, whose size is always known.
> a pointer, whose size is always known
Yeah, this is exactly how it works. You work with a pointer that acts like a void* in your code, and the library with the definition is allowed to reach into the fields of that pointer. Normally you'd have a C API like
struct Op;
Op* init_op();
void free_op( Op* );
void do_something_with_op( Op* );
in the header provided by the library that you compile as part of your code, and the definition/implementation in some .a or .so/.dll that you'll link against.*
abnercoimbre•6mo ago
pjmlp•6mo ago
So it wouldn't surprise me, that a few folks would do some tricks with FILE internals.
recipe19•6mo ago
https://github.com/openbsd/src/commit/b7f6c2eb760a2da367dd51...
If you expose it, someone will probably sooner or later use it, but probably not in any sane / portable code. On the face of it, it doesn't seem like a consequential change, but maybe they're mopping up after some vulnerability in that one weird package that did touch this.
fweimer•6mo ago
https://cgit.git.savannah.gnu.org/cgit/gnulib.git/tree/lib/s...
Yes, it's not a good idea to do this. There are more questionable pieces in gnulib, like closing stdin/stdout/stderr (because fflush and fsync is deemed too slow, and regular close reports some errors on NFS on some systems that would otherwise go unreported).
collinfunk•6mo ago
P.S. Hi Florian :)
collinfunk•6mo ago
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=69a0...
quotemstr•6mo ago
Hyrum's law strikes again. People cast dl_info and poke at internal bits all the time too.
glibc and others should be using kernel-style compiler-driven struct layout randomization to fight it.
jancsika•6mo ago
Is there a name for APIs that are drawn directly from some subset of observed behaviors?
Like Crockford going, "Hey, there's a nice little data format buried in these JS objects. Schloink"
quotemstr•6mo ago
Desire paths. https://en.wikipedia.org/wiki/Desire_path
ksherlock•6mo ago
pm215•6mo ago
https://cgit.git.savannah.gnu.org/cgit/nmh.git/tree/sbr/m_ge...
It's basically searching an email file to find the contents of either a given header or the mail body. These days there is no need to go under the hood of libc for this (and this code got ripped out over a decade ago), but back when the mail client was running on elderly VAXen this ate up significant time. Sneaking in and reading directly from the internal stdio buffer lets you avoid copying all the data the way an fread would. The same function also used to have a bit of inline vax assembly for string searching...
The only reason this "works" is that traditionally the FILE struct is declared in a public header so libc can have some of its own functions implemented as macros for speed, and that there was not (when this hack was originally put in in the 1980s) yet much divergence in libc implementations.
loeg•6mo ago
abnercoimbre•6mo ago
bitwize•6mo ago
krylon•6mo ago
OTOH, when coding, I consider FILE to be effectively opaque in the sense that it probably is not portable, and that the implementers might change it at any time.
I am reminded of this fine article by Raymond Chen, which covers a similar situation on Windows way back when: https://devblogs.microsoft.com/oldnewthing/20031015-00/?p=42...
brokencode•6mo ago
But the sad reality is that many developers (myself included earlier in my career) will do insane things to fix a critical bug or performance problem when faced with a tight deadline.
krylon•6mo ago
ars•6mo ago
Or functionality. Happens to me all the time I have some Java class that's marked Final, so instead of just extending the class and moving on, I have to copy/paste the entire class wholesale to accomplish my goal.
Personally I hate "nanny" languages that block you from accessing things. It's my computer, and my code, and my compiler. Please don't do things "for my own good", I can decide that for myself.
(And yes, I am aware of the argument that this lets the original programmer change the internals, in practice it's not such a big problem. Or the cure is worse than the problem - for example my copy/paste example.)
Another example is a private constant. Instead of allowing me to reference it, I have to copy it. How is that any better? If the programmer has to change how the constant works then they can do so, and at that point my code will break and I'll .... copy the constant. But until then I can just use the constant.
Tractor8626•6mo ago
All projects mentioned should have forked stdio and added their hacks/optimisations/functionality to that.
They were just too lazy. Can't blame them though. Writing C code is torture after all. One should cut all the corners they could.
ars•6mo ago
Other way around. When I first started I thought these access restrictions were a great idea. Make sure, I, and other, only program "correctly".
When I got more mature I found how often they are impediments, and how little they actually help.
I don't want someone else protecting me from myself. I just don't want that, no matter how well intentioned.
Tractor8626•6mo ago
About access restrictions. We have two nice examples here
1. Stdio devs can't freely make modifications because someone's code depends on private implementation details
2. And yours example. You tried to do wrong thing and those access restriction made you suffer and do the right thing.
Works as intended I'd say.
high_na_euv•6mo ago
ars•6mo ago
high_na_euv•6mo ago
You can usually use other langs
crest•6mo ago
zahlman•6mo ago
bitwize•6mo ago
"Once upon a time, pointers on the Macintosh had 24 bits. The upper 8 bits were reserved for flags. Apple warned developers not to look directly at the flags in the upper 8 bits, but to use the macros that were supplied as part of the API -- but third-party developers looked directly at the upper 8 bits anyway. When System 7 came out with full 32-bit pointers, a lot of old applications broke because of this!"
Of course, what he didn't mention at the time was that System 7 provided a toggle that allowed these programs to run with old-school 24-bit pointers -- the equivalent concession is something I don't think OpenBSD is willing to make.
Nevertheless, vendors can and have broken full backward compatibility in cases where the developers "should've known better". Hyrum's Law just states that there will be a few that don't get the message and will watch their software break when these changes are made...
zahlman•6mo ago
asveikau•6mo ago
skissane•6mo ago
I guess part of why it is not in the standard is that it is rarely requested functionality, but there are rare use cases where it may have value. And I think it is an unfortunate lack of orthogonality to have a setter but no corresponding getter.
stdio_ext.h offers some functionality like a "getvbuf", but not quite – e.g. __fbufsize tells you a stream's buffer size, and __flbf whether it is line-buffered – but it isn't clear how to distinguish fully buffered and unbuffered streams. And stdio_ext.h has never been standardised, it is an extension invented on Solaris and copied by Linux (and a few other platforms too, e.g. IBM z/OS).