Linux is unusual in OS kernels in that direct system calls from arbitrary userspace code are supported and ABI-stable. This model has always been a terrible idea. It robs the system of an ability to intercept system calls in userspace before doing an expensive privilege-mode transition.
If, instead, as on OpenBSD, the kernel enforced the rule that all system calls had to go through libc (or perhaps a big ntdll.dll-like VDSO), then the whole problem the linked article tries in vain to solve would disappear. If you wanted to hook a system call, you'd just change the libc/VDSO dispatch. No need to rewrite any instructions.
If I were Linus, I'd make a new rule: starting today, all new system calls must go through VDSO. No exceptions. SYSCALL from anywhere else? SIGKILL.
This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
yjftsjthsd-h•1h ago
> This model has always been a terrible idea. It robs the system of an ability to intercept system calls in userspace before doing an expensive privilege-mode transition.
This model has always been a trade-off. It has downsides, but it also has upsides, including an immense boost in flexibility; decoupling from any particular userspace is useful.
> This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
Can you LD_PRELOAD in front of the vDSO? I was under the (possibly mistaken) impression that the kernel injects it directly.
Gualdrapo•53m ago
> If I were Linus, I'd make a new rule
Or, you know, just propose your idea to him
yjftsjthsd-h•20m ago
Based on https://www.phoronix.com/news/Linus-Torvalds-No-Random-vDSO , I had been under the impression that he wasn't fond of adding more use of vDSO. On rereading, I can't tell if that's a vDSO thing or a preference against fast randomness being provided by the kernel.
throwaway7356•20m ago
> all system calls had to go through libc (or perhaps a big ntdll.dll-like
Which makes containers crap on Windows and *BSD as they have to run the currect libc or equivalent. Thus you need to build a different container per OS version which sucks compared to Linux.
quotemstr•1h ago
If, instead, as on OpenBSD, the kernel enforced the rule that all system calls had to go through libc (or perhaps a big ntdll.dll-like VDSO), then the whole problem the linked article tries in vain to solve would disappear. If you wanted to hook a system call, you'd just change the libc/VDSO dispatch. No need to rewrite any instructions.
If I were Linus, I'd make a new rule: starting today, all new system calls must go through VDSO. No exceptions. SYSCALL from anywhere else? SIGKILL.
This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
yjftsjthsd-h•1h ago
This model has always been a trade-off. It has downsides, but it also has upsides, including an immense boost in flexibility; decoupling from any particular userspace is useful.
> This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
Can you LD_PRELOAD in front of the vDSO? I was under the (possibly mistaken) impression that the kernel injects it directly.
Gualdrapo•53m ago
Or, you know, just propose your idea to him
yjftsjthsd-h•20m ago
throwaway7356•20m ago
Which makes containers crap on Windows and *BSD as they have to run the currect libc or equivalent. Thus you need to build a different container per OS version which sucks compared to Linux.