My getaway is: glibc is bloated but fast. Quite unexpected combination. Am I right?
Something like glibc has had decades to swap in complex, fast code for simple-looking functions.
Independently from that glibc implements a lot of stuff that could be considered bloat:
- Extensive internationalization support
- Extensive backward compatibility
- Support for numerous architectures and platforms
- Comprehensive implementations of optional standards
Is there a fork of glibc that strips ancient or bizarre platforms?
That’s not the strongest example. I just meant it to be illustrative of the idea.
Math functions aren't going to be strongly impacted by diverse hardware support. In practice, you largely care about 32-bit and 64-bit IEEE 754 types, which means your macros to decompose floating-point types to their constituent sign/exponent/significand fields are already going to be pretty portable even across different endianness (just bitcast to a uint32_t/uint64_t, and all of the shift logic will remain the same). And there's not much reason to vary the implementation except to take advantage of hardware instructions that implement the math functions directly... which are generally better handled by the compiler anyways.
https://github.com/lattera/glibc/blob/master/string/strlen.c
The author of musl made a chart, that focused on the things they cared about and benchmarked them, and found that for the things they prioritized they were better than other standard library implementations (at least from counting green rows)? neat.
I mean I'm glad they made the library, that it's useful, and that it's meeting the goals they set out to solve, but what would the same chart created by the other library authors look like?
For example, Chimera Linux uses MUSL with mimalloc and it is quite snappy.
However, it does have super-optimized string/memory functions. There are highly optimized assembly language implementations of them that use SIMD for dozens of different CPUs.
Having them be the same means that if there is any libc function that is best implemented by having userland call a Fil-C runtime wrapper for the yololand implementation (say because what it’s doing requires platform specific assembly) then I can be sure that the yololand libc really implements that function the same way with all the same corner cases.
But there aren’t many cases of that and they’re hacks that I might someday remove. So I probably won’t have this “libc sandwich” forever
What's yoyoland? All I can find is an amusement park in Bangkok, and some 1990s-era communication software for Classic Mac OS: https://www.macintoshrepository.org/39495-yoyo-2-1
- Userland: the place where you C code lives. Like the normal userland you're familiar with, but everything is compiled with Fil-C, so it's memory safe.
- Yololand: the place where Fil-C's runtime lives. Fil-C's runtime is about 100,000 lines of C code (almost entirely written by me), which currently has libc as a dependency (because the runtime makes syscalls using the normal C functions for syscalls rather than using assembly directly; also the runtime relies on a handful of libc utility functions that aren't syscalls, like memcpy).
So Fil-C has two libc's. The yololand libc (compiled with a normal C compiler, only there to support the runtime) and the userland libc (compiled with the Fil-C compiler like everything else in Fil-C userland, and this is what your C code calls into).
On Linux, if all you need is syscalls, you can just write your own syscall wrapper-like Go does.
Doesn’t work on some other operating systems (e.g. Solaris/Illumos, OpenBSD, macOS, Windows) where the system call interface is private to the system shared libraries
Unless you do special things, the compiler turns __builtin_memcpy into a call to memcpy. :-)
There is __builtin_memcpy_inline, but then you're at the compiler's whims. I don't think I want that.
A faithful implementation of what you're proposing would have the Fil-C runtime provide a memcpy function so that whenever the compiler wants to call memcpy, it will call that function.
> On Linux, if all you need is syscalls, you can just write your own syscall wrapper-like Go does.
I could do that. I just don't, right now.
You're totally right that I could remove the yolo libc. This is one of like 1,000 reasons why Fil-C is slower than it needs to be right now. It's a young project so it has lots of this kind of "expedient engineering".
I needed a fun term to refer to the C that isn’t Fil-C. I call it Yolo-C.
Hence yololand - the part of the Fil-C process that contains a bit of Yolo-C code for the Fil-C runtime.
> It's even possible to allocate memory using malloc from within a signal handler (which is necessary because Fil-C heap-allocates stack allocations).
Hmm, really? All stack allocations are heap-allocated? Doesn't that make Fil-C super slow? Is there no way to do stack allocation? Or did I misread what you meant by 'stack allocations'?
And that GC allocation only happens if the compiler can’t prove that it’s nonescaping. The overwhelming majority of what look like stack allocations in C are proved nonescaping.
Consequently, while Fil-C does have overheads, this isn’t the one I worry about.
You say you don't have to instrument malloc(), but somehow you must learn of the allocation size. How?
Are aliasing bugs detected?
I assume that Fil-C is a whole-program-only option. That is, that you can't mix libraries not compiled with Fil-C and ones compiled with Fil-C. Is that right?
So one might want a whole distro built with Fil-C.
How much are you living with Fil-C? How painful is it, performance-wise?
BTW, I think your approach is remarkable and remarkably interesting. Of course, to some degree this just highlights how bad C (and C++) is (are) at being memory-safe.
Not sure exactly what you mean by aliasing bugs. I'm assuming strict aliasing violations. Fil-C allows a limited and safe set of strict aliasing optimizations, which end up having the effect of loads/stores moving according to a memory model that is weaker than maybe you'd want. So, Fil-C doesn't detect those. Like in any clang-based compiler, Fil-C allows you to pass `-fno-strict-aliasing` if you don't want those optimizations.
That's right, you have to go all in on Fil-C. All libs have to be compiled with Fil-C. That said, separate compilation of those modules and libraries just works. Dynamic linking just works. So long as everything is Fil-C.
Yes you could build a distro that is 100% Fil-C. I think that's possible today. I just haven't had the time to do that.
All of the software I've ported to Fil-C is fast enough to be usable. You don't notice the perf "problem" unless you deliberately benchmark compute workloads (which I do - I have a large and ever-growing benchmark suite). I wrote up my thoughts about this in a recent twitter discussion: https://x.com/filpizlo/status/1920848334429810751
A bunch of us PL implementers have long "joked" that the only thing unsafe about C are the implementations of C. The language itself is fine. Fil-C sort of proves that joke true.
I meant that if the same allocation were accessed as different kinds of objects, as if through a union, ... I guess what I really meant to ask is: does Fil-C know the types of objects being pointed to by a pointer, and therefore also the number of elements in arrays?
So, if you store a pointer to a location in memory and then load from that location using pointer type, then you get the capability that was last stored. But if the thing stored at the location was an integer, you get an invalid capability.
So Fil-C’s “type” for an object is ever evolving. The memory returned from malloc will be nothing but invalid capabilities for each pointer width word in that allocation but as soon as you store pointers to it then the locations you stored those pointers to will be understood as being pointer locations. This makes unions and weird pointer casts just work. But you can ever type confuse an int with a pointer, or different pointer types, in a manner that would let you violate the capability model (ie achieve the kind of weird state where you can access any memory you like).
Lots of tricks under the hood to make this thread safe and not too expensive.
GCC replaces memcpy/memmove/memset with its own intrisics, if compiling in high optimization levels.
Yeah, pretty obvious when they state as much in the first paragraph.
This sort of thing makes me really appreciate zig’s comptime. Even rust uses a macro for println!().
That being said, in those larger programs, it's still likely going to be a negligible part of the binary size, and the additional code paths are unlikely to affect performance unless you're doing string formatting in multiple hot-paths which is generally a poor choice anyway.
ObscureScience•2mo ago
lifthrasiir•2mo ago