Windows support is a requirement, and no WSL2 doesn’t count.
C standard library is pretty bad and it’d be great if not using it was a little easier and more common.
I wrote this little systemwide mute utility for Windows that way, annoying to be missing some parts of the CRT but not bad, code here: https://github.com/pablocastro/minimute
The project has some of the properties discussed above such as not having a typical main() (or winmain), because there’s no CRT to call it.
You have your usual Win32 API functions found in libraries like Kernel32, User32, and GDI32, but since after Windows XP, those don't actually make system calls. The actual system calls are found in NTDLL and Win32U. Lots of functions you can import, and they're basically one instruction long. Just SYSENTER for the native version, or a switch back to 64-bit mode for a WOW64 DLL. The names of the function always begin with Nt, like NtCreateFile. There's a corresponding Kernel mode call that starts with Zw instead, so in Kernel mode you have ZwCreateFile.
But the system call numbers used with SYSENTER are indeed reordered every time there's a major version change to Windows, so you just call into NTDLL or Win32U instead if you want to directly make a system call.
Why, exactly?
For what?
There is some software for which Windows support is required. There are others for which it is not, and never will be. (And for an article about running ELF files on RiscV with a Linux OS, the "Windows support" complaint seems a bit odd...)
I’ve spent quite a lot of time dealing with code that will ever run on Linux which did not in fact only ever run on Linux!
Obviously for hobby projects anyone can do what they want. But adult projects should support Windows imho and consider Windows support from the start. Cross-platform is super easy unless you choose to make it hard.
Probably a server that is only ever run by a single company on a single CPU type. That company will have complete control of the OS stack, so if it says no Windows, then no Windows has to be supported.
[1] "svc 0" on ARM, "int 0x80" on i386, etc...
So only part of that gets "bloated" is Win32 API itself (which is spread across multiple DLLs and don't actually bloat RAM usage). Most of the time even those functions and structures are carefully designed to have some future-proofness but it is usual to see APIs like CreateFile, CreateFile2, CreateFile3. Internally the earlier versions are upgraded to call the latest version. So not so much bloating there either.
When the C runtime and the OS system calls are combined into the single binary like POSIX, it creates the ABI hell we're in with the modern Unix-likes. Either the OSes have to regularly break the C ABI compatibility for the updates or we have to live with terrible implementations.
GNU libc and Linux combo is particularly bad. On GNU/Linux (or any other current libc replacements), the dynamic loading is also provided by the C library. This makes "forever" binary file compatibility particularly tricky to achieve. Glibc broke certain games / Steam by removing some parts of their ELF implementation: https://sourceware.org/bugzilla/show_bug.cgi?id=32653 . They backed due to huge backlash from the community.
If "the year of Linux desktop" would ever happen, they need to either do an Android and change the definition of what a software package is, or split Glibc into 3 parts: syscalls, dynamic loader and the actual C library.
PS: There is actually a catch to your " C runtime is optional." argument. Microsoft still intentionally holds back the ability of compiling native ABI Windows programs without Visual Studio.
The structured exception handlers (equivalent of Windows for SIGILL, SIGBUS etc.. not for SIGINT or SIGTERM though) are populated by the object files from the C runtime libraries (called VCRuntime/VCStartup). So it is actually not possible to have official Windows binaries without MSVC or any other C runtime like Mingw-64 that provides those symbols. It looks like some developers in Microsoft wanted to open-source VCRuntime / VCStartup but it was ~vetoed~ not fully approved by some people: https://github.com/microsoft/STL/issues/4560#issuecomment-23... , https://www.reddit.com/r/cpp/comments/1l8mqlv/is_msvc_ever_g...
What is left of the C standard library, if you remove syscall wrappers?
> ABI hell
Is that really the case? From my understanding the problem is more, that Linux isn't an OS, so you can't rely on any *.so being there.
> What is left of the C standard library, if you remove syscall wrappers?
Still quite a bit actually. Stuff like malloc, realloc, free, fopen, FILE, getaddrinfo, getlogin, math functions like cos, sin tan, stdatomic implementations, some string functions are all defined in C library. They are not direct system calls unlike: open, read, write, ioctl, setsockopt, capget, capset ....
> > ABI hell
> Is that really the case? From my understanding the problem is more, that Linux isn't an OS, so you can't rely on any *.so being there.
That's why I used more specific term GNU/Linux at the start. There is no guarantee of any .so file can be successfully loaded even if it is there. Glibc can break anything. With the Steam bug I linked this is exactly what happened. Shared object files were there, Glibc stopped supporting a certain ELF file field.
There is only and only one guarantee with Linux-based systems: syscalls (and other similar ways to talk with kernel like ioctl struct memory layouts etc) always keep working.
There is so much invisible dependence on Glibc behavior. Glibc also controls how the DNS works for the programs for example. That also needs to be split into a different library. Same for managing user info like `getlogin`. Moreover all this functionality is actually implemented as dynamic library plugins in Glibc (NSSwitch) that rely on ld.so that's also shipped by Glibc. It is literally a Medusa head of snakes that bite multiple tails. It is extremely hard to test ABI breakages like this.
Wrapper around sbrk, mmap, etc. whatever the modern variant is.
> fopen, FILE
Wrapper around open, write, read, close.
> stdatomic implementations
You can argue, these are wrappers around thread syscalls.
> math functions like cos, sin tan, some string functions are all defined in C library
True for these, but they are so small, they could just be inlined directly, on their own they wouldn't necessarily deserve a library.
> That's why I used more specific term GNU/Linux at the start.
While GNU/Linux does describe a complete OS, it doesn't describe any specific OS. Every Distro does it's own thing, so I think these is what you actually need to call an OS. But everything is built so that the user can take the control over the architecture and which components the OS consists of, so every installation can be a snowflake, and then it is technically its own OS.
I personally consider libc and the compiler (which both make a C implementation) to be part of the OS. I think this is both grounded in theory and in practice. Only in some weird middle ground between theory and practice you can consider them to not be.
This caused me a lot of pain while trying to debug a 3rd party Java application that was trying to launch an executable script, and throwing an IO error "java.io.IOException: error=2, No such file or directory." I was puzzled because I know the script is right there (using its full path) and it had the executable bit set. It turns out that the shebang in the script was wrong, so the OS was complaining (actual error from a shell would be "The file specified the interpreter '/foo/bar', which is not an executable command."), but the Java error was completely misleading :|
Note: If you wonder why I didn't see this error by running the script myself: I did, and it ran fine locally. But the application was running on a remote host that had a different path for the interpreter.
This is not how dynamic linking works on GNU/Linux. The kernel processes the program headers for the main program (mapping the PT_LOAD segments, without relocating them) and notices the PT_INTERP program interpreter (the path to the dynamic linker) among the program headers. The kernel then loads the dynamic linker in much the same way as the main program (again without relocation) and transfers control to its entry point. It's up to the dynamic linker to self-relocate, load the referenced share objects (this time using plain mmap and mprotect, the kernel ELF loader is not used for that), relocate them and the main program, and then transfer control to the main program.
The scheme is not that dissimilar to the #! shebang lines, with the dynamic linker taking the role of the script interpreter, except that ELF is a binary format.
Loading ELFs and processing relocations is actually not too bad. It’s fun after the initial learning curve.
Then one has to worry about handling of “dlopen” and the loader creating the data structures it cares about. Yuck!!!
It’s kinda a shame because the glibc loader is a bit bloated with all the audit and preload handling. Great for flexibility, not for security.
https://www.kernel.org/doc/html/latest/admin-guide/binfmt-mi...
iirc execve maps pt_load segments from the program header, populates the aux vector on the stack, and jump straight to the ELF interpreter's entry point. Any linked objects are loaded in userspace by the elf interpreter. The kernel has no knowledge of the PLT/GOT.
Most diagrams in books and slides use an old hardware-centric convention: they draw higher addresses at the top of the page and lower addresses at the bottom. People sometimes justify this with an analogy like “floors in a building go up,” so address 0x7fffffffe000 is drawn “higher” than 0x400000.
But this is backwards from how humans read almost everything today. When you look at code in VS Code or any other IDE, line 1 is at the top, then line 2 is below it, then 3, 4, etc. Numbers go up as you go down. Your brain learns: “down = bigger index.”
Memory in a real Linux process actually matches the VS Code model much more closely than the textbook diagrams suggest.
You can see it yourself with:
cat /proc/$$/maps
(pick any PID instead of $$).
...
[0x00000000] lower addresses ...
[0x00620000] HEAP start[0x00643000] HEAP extended ↓ (more allocations => higher addresses)
...
[0x7ffd8c3f7000] STACK top (<- stack pointer) ↑ the stack pointer starts here and moves upward
(toward lower addresses) when you push
[0x7ffd8c418000] STACK start ...
[0xffffffffff600000] higher addresses ...
The output is printed from low addresses to high addresses. At the top of the output you'll usually see the binary, shared libs, heap, etc. Those all live at lower virtual addresses. Farther down in the output you'll eventually see the stack, which lives at a higher virtual address. In other words: as you scroll down, the addresses get bigger. Exactly like scrolling down in an editor gives you bigger line numbers.The phrases “the heap grows up” and “the stack grows down” aren't wrong. They're just describing what happens to the numeric addresses: the heap expands toward higher addresses, and the stack moves into lower addresses.
The real problem is how we draw it. We label “up” on the page as “higher address,” which is the opposite of how people read code or even how /proc/<pid>/maps is printed. So students have to mentally flip the diagram before they can even think about what the stack and heap are doing.
If we just drew memory like an editor (low addresses at the top, high addresses further down) it would click instantly. Scroll down, addresses go up, and the stack sits at the bottom. At that point it’s no longer “the stack grows down”: it’s just the stack pointer being decremented, moving to lower addresses (which, in the diagram, means moving upward).
In your example "the stack grows down", seems to be wrong in the image.
The problem is that most textbooks draw the opposite, so the student leaves my lecture, opens a book or a slide deck, and now “down” means a different thing.
It gets worse when they get curious and look at a real process with /proc/<pid>/maps. Linux prints mappings from low address to high address as you scroll down (which matches my representation). That is literally reversed from the usual textbook diagram. Students notice and ask why the book is “wrong.”
So I've learned I have to explicitly call this out as notation.
Same story as in electronics class still teaching conventional current flow (positive to negative), even though electrons move the other way (negative to positive). Source: https://www.allaboutcircuits.com/textbook/direct-current/chp.... Historical convention, and then pedagogy has to patch it forever.
Related: In notation, one thing that I used to struggle with is how addresses (e.g. 0xAB_CD) actually have the bit representation of [0xCD, 0xAB]. Wonder if there's a common way to address that?
About the address notation you're describing, I'm not sure I fully get the problem. Can you spell out the question with a concrete example?
This is what the address space of a real bash process looks like on my machine:
__
$ cat /proc/$(pidof bash)/maps
5e6e8fd0f000-5e6e8fd3f000 r--p 00000000 fc:00 3539412 /usr/bin/bash
5e6e8fd3f000-5e6e8fe2e000 r-xp 00030000 fc:00 3539412 /usr/bin/bash
5e6e8fe2e000-5e6e8fe63000 r--p 0011f000 fc:00 3539412 /usr/bin/bash
5e6e8fe63000-5e6e8fe67000 r--p 00154000 fc:00 3539412 /usr/bin/bash
5e6e8fe67000-5e6e8fe70000 rw-p 00158000 fc:00 3539412 /usr/bin/bash
5e6e8fe70000-5e6e8fe7b000 rw-p 00000000 00:00 0
5e6e94891000-5e6e94a1e000 rw-p 00000000 00:00 0 [heap]
7ec3d1400000-7ec3d16eb000 r--p 00000000 fc:00 3550901 /usr/lib/locale/locale-archive
7ec3d1800000-7ec3d1828000 r--p 00000000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6
7ec3d1828000-7ec3d19b0000 r-xp 00028000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6
7ec3d19b0000-7ec3d19ff000 r--p 001b0000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6
7ec3d19ff000-7ec3d1a03000 r--p 001fe000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6
7ec3d1a03000-7ec3d1a05000 rw-p 00202000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6
7ec3d1a05000-7ec3d1a12000 rw-p 00000000 00:00 0
7ec3d1a2b000-7ec3d1a84000 r--p 00000000 fc:00 3549063 /usr/lib/locale/C.utf8/LC_CTYPE
7ec3d1a84000-7ec3d1a85000 r--p 00000000 fc:00 3549069 /usr/lib/locale/C.utf8/LC_NUMERIC
7ec3d1a85000-7ec3d1a86000 r--p 00000000 fc:00 3549072 /usr/lib/locale/C.utf8/LC_TIME
7ec3d1a86000-7ec3d1a87000 r--p 00000000 fc:00 3549062 /usr/lib/locale/C.utf8/LC_COLLATE
7ec3d1a87000-7ec3d1a88000 r--p 00000000 fc:00 3549067 /usr/lib/locale/C.utf8/LC_MONETARY
7ec3d1a88000-7ec3d1a89000 r--p 00000000 fc:00 3549066 /usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES
7ec3d1a89000-7ec3d1a8a000 r--p 00000000 fc:00 3549070 /usr/lib/locale/C.utf8/LC_PAPER
7ec3d1a8a000-7ec3d1a8b000 r--p 00000000 fc:00 3549068 /usr/lib/locale/C.utf8/LC_NAME
7ec3d1a8b000-7ec3d1a8c000 r--p 00000000 fc:00 3549061 /usr/lib/locale/C.utf8/LC_ADDRESS
7ec3d1a8c000-7ec3d1a8d000 r--p 00000000 fc:00 3549071 /usr/lib/locale/C.utf8/LC_TELEPHONE
7ec3d1a8d000-7ec3d1a90000 rw-p 00000000 00:00 0
7ec3d1a90000-7ec3d1a9e000 r--p 00000000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4
7ec3d1a9e000-7ec3d1ab1000 r-xp 0000e000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4
7ec3d1ab1000-7ec3d1abf000 r--p 00021000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4
7ec3d1abf000-7ec3d1ac3000 r--p 0002e000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4
7ec3d1ac3000-7ec3d1ac4000 rw-p 00032000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4
7ec3d1ac4000-7ec3d1ac5000 r--p 00000000 fc:00 3549065 /usr/lib/locale/C.utf8/LC_MEASUREMENT
7ec3d1ac5000-7ec3d1ac6000 r--p 00000000 fc:00 3549064 /usr/lib/locale/C.utf8/LC_IDENTIFICATION
7ec3d1ac6000-7ec3d1acd000 r--s 00000000 fc:00 3548984 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
7ec3d1acd000-7ec3d1acf000 rw-p 00000000 00:00 0
7ec3d1acf000-7ec3d1ad0000 r--p 00000000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ec3d1ad0000-7ec3d1afb000 r-xp 00001000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ec3d1afb000-7ec3d1b05000 r--p 0002c000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ec3d1b05000-7ec3d1b07000 r--p 00036000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ec3d1b07000-7ec3d1b09000 rw-p 00038000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffd266f8000-7ffd26719000 rw-p 00000000 00:00 0 [stack]
7ffd2678a000-7ffd2678e000 r--p 00000000 00:00 0 [vvar]
7ffd2678e000-7ffd26790000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
___
Each line is a memory mapping. The first field is the start address. The second field is the end address. So an entry like
7ffd266f8000-7ffd26719000
means "this mapping covers virtual addresses from 0x7ffd266f8000 up to 0x7ffd26719000."
The addresses are always increasing:
- left to right: within a single line you go from lower address to higher address
- top to bottom: as you go down the list you also go to higher and higher addresses
Exactly like reading a book: left to right and then top to bottom.
This convention started on early Intel chips and was kept for backward compatibility. It also has a practical benefit: it makes basic arithmetic and type widening cheaper in hardware. The "low" part of the value is always at the base address, so the CPU can load 8 bits, then 16 bits, then 32 bits, etc. starting from the same address without extra offset math.
So when you say an address like 0xABCD shows up in memory as [0xCD, 0xAB] byte-by-byte, that's not the address being "reversed". That's just the little-endian in-memory layout of that numeric value.
There are also big-endian architectures, where the most significant byte is stored at the lowest address. That matches how humans usually write numbers (0xABCD in memory as [0xAB, 0xCD]). But most mainstream desktop/server CPUs today are little-endian, so you mostly see the little-endian view.
hagbard_c•6h ago
> Yeah, that’s it. Now, 2308 may be slightly bloated because we link against musl instead of glibc, but the point still stands: There’s a lot of stuff going on behind the scenes here.
Slightly bloated is a slight understatement. The same program linked to glibc tops at 36 symbols in .symtab:
amitprasad•6h ago
More generally, I'm not surprised at the symtab bloat from statically-linking given the absolute size increase of the binary.