Edit: Oh I guess it seems to be intentional, I clicked around and I like the rgbcube site map.
I knew, and I've known since reading the "C shell considered harmful" paper, which offhandedly mentioned that sh-based shells can use an arbitrary number of file descriptors (maybe they have to be one-digit integers though). csh can't, of course.
It's discussed in the first section here
I always wanted to ignore all errors form this (there was a lot of "permission denied"), but tcsh just didn't have a simple ability to do so. This taught me a valuable lesson about some software just being better than other. And to this day, I keep wondering you would people choose to use csh/tcsh voluntarily.
Also, file handle inheritance by default was such a big mistake.
Working dir, env vars, uid/gid, socket handles, file descriptors, (some) file locks, message queues. AFAIK the only exception is the argv, everything else is inherited on fork or exec.
Sometimes this makes sense, but programmers always forget about this, resulting in security incidents. Eventually most programming languages gave up and updated their stdlibs to set CLOEXEC when opening files and sockets, knowing that it would break POSIX compatibility and API compatibility on their stdlibs. Python is one example: https://peps.python.org/pep-0446/
The "inherit by default" behavior also makes it very difficult to evolve the shell interface. The nushell devs are looking for a reliable way to request JSON output/input on processes spawned by the shell (if supported by the program). Naively passing env vars or FDs to the process causes problems because if the process spawns any children of it's own, they too would also inherit those env vars or FDs.
environment (in a broader sense, not just environment variables, but also CWD, file handles, uid/gid, sec context, namespaces) is there for a reason: to use. if you dont want your children processes to read the stdin in place of you, dont give it to them. it's the parent process responsibility to set up the env for the children.
although subprocesses are invented to do (some of) the parent's job by delegating smaller steps and leave the details to them. for example a http server would read the request (first) line, then delegate the rest of the input to a subprocess (worker) depending on who is free, who handles which type of request, etc. this is original idea behind inheritance, IMO.
Oh for fuck's sake! Why are you using random file descriptors nobody told you about? Those open fds are there for a reason, thank you: I've put an end of an open pipe specifically so I could notice when it will become closed.
If the user set up the environment of your application in a specific way, that means he wants your application to run in such an environment. If you were invoked with 10 non-standard file descriptors open and two injected threads — you'll have to live with it. Because, believe it or not, your application's purpose is to serve the user's goals. So don't break composability that the user relies on, please.
Can't wait for scripts using this variable for something unrelated to break when they call my scripts.
This should be a parameter or argv[0]-based.
This is a larger concern I've started to see in a certain class of younger developer where existing conventions are just ignored without an attempt at understanding of why they exist. Things are only going to get worse as naive vibe coders start flinging more AI generated garbage out into the world. I pity the pole folks trying to maintain these systems a couple of decades from now.
Nushell's front page [1] shows an example of rounded, and here's an example of an even further customized version [2].
I think these are very readable. There is alignment too, but it's "local" alignment to cells in the same sub-table, not "global" to the entire table -- this is good for fitting more stuff into your terminal width without wrapping.
A supporting font is required though, yes.
int subprocess_stdin = open("/dev/null", O_RDONLY);
int subprocess_stdout = open("some_output", O_WRONLY);
int subprocess_stderr = STDERR_FILENO; // Let the subprocess use the same stderr as me.
int subprocess_fds[] = {subprocess_stdin, subprocess_stdout, subprocess_stderr};
posix_spawn_with_fds("my process", [...], subprocess_fds, 3);
Never understood why POSIX makes all of this so hard.I honestly can't say in this particular instance but always my (unpopular?) instinct im such a situation is to asdume there is a good reason and I just haven't understood it yet. It may have become irrelevant in the meantime, but I can't know until I understand, and it's served me well to give the patriarchs the benefit of the doubt in such cases.
You do after the fork() (or clone, on Linux) a for loop that closes every FD except the one you want to keep. In Linux there is a close_range system call to close a range of in one call.
POSIX is an API designed to be a small layer on the operating system, and designed to make as little assumption as possible to the underlying system. This is the reason why POSIX is nowadays implemented even on low resources embedded devices and similar stuff.
At an higher level it's possible to use higher level abstractions to manipulate processed (e.g. a C++ library that does all of the above with a modern interface).
> but it is always an bug to close FDs you don't know the origin of.
And I would agree. I'm replying to the poster above me, who is staking the claim that POSIX permits closing all open file descriptors other than a desired set.
So, I suppose it can, at a cost of a few thousand syscalls that'll all be pointless…
That said, it is trivial to write a loop that takes a set of known old and new fd numbers (including e.g. swapping) produces a set of calls to `dup2` and `fcntl` to give them the new numbers, while correctly leaving all open fds open.
#include <fcntl.h>
#include <spawn.h>
int
main(void) {
posix_spawn_file_actions_t file_actions;
posix_spawn_file_actions_init(&file_actions);
posix_spawn_file_actions_addopen(&file_actions, 0, "/dev/null", O_RDONLY, 0);
posix_spawn_file_actions_addopen(&file_actions, 2, "/dev/null", O_WRONLY, 0);
posix_spawnp(NULL, "ls", &file_actions, NULL, (const char *[]){"ls", "-l", "/proc/self/fd", NULL}, NULL);
posix_spawn_file_actions_destroy(&file_actions);
}
Also, to answer your question with a guess, I would suppose it’s because they wanted to use JSON and they wrote the feature.
UTF-8 is already a great wire format.
I've never found a "binary JSON" that's significantly better than this; I mean you can beat it, but you need awkward encodings (prefix indices & other weird shit). You end up burning nearly-byte for any particularly clever integer encoding.
Most data structures are just nested arrays of integers. If you need an integer keyed OBJECT you're SOL, but I just play fiddly games with astral plane UTF-8 characters. (Yeah yeah yeah ad hoc encodings are nasty news.)
If you've got a BUTT LOAD of data just fire up a compressing SQLite DB like a normal human.
This is the first time I hear about stddata though. Is this a thing that's going into a standard? Is there already? Or is it just a name someone gave to it and it's not a real thing?
FreeBSD has libxo[0] integrated into some of its tools:
I am especially confused by this:
> Surely, nothing will happen if I just assume that the existence of a specific file descriptor implies something, as nobody is crazy or stupid enough to hardcode such a thing?
Wait, what? But "you" (tree authors) just hardcoded such a thing. Do "you" have some special permission to do this nonsense?
Instead maybe we need new system calls that return dups of a hidden stddata FD or create/replace it.
NoboruWataya•21h ago
jamessb•21h ago
> As of version 2.0.0, in Linux, tree will attempt to automatically output a compact JSON tree on file descriptor 3 (what I call stddata,) if present
https://github.com/Old-Man-Programmer/tree/blob/d501b58ff9cb...
deathanatos•18h ago
stdout would be the canonical location for putting JSON output (and the "data" of a command, generally). Then things like `| jq` just work.