3-JSON

97•RGBCube•4d ago

Comments

NoboruWataya•21h ago

I've never heard of stddata. What distro/environment provides it?

jamessb•21h ago

Nor have I; I think it is just what the developer of tree has chosen to call file descriptor 3, rather than being a wider convention or standard thing provided by the environment.

> As of version 2.0.0, in Linux, tree will attempt to automatically output a compact JSON tree on file descriptor 3 (what I call stddata,) if present

https://github.com/Old-Man-Programmer/tree/blob/d501b58ff9cb...

deathanatos•18h ago

It's a local invention of TFA's, AFAIK. It's not "std".

stdout would be the canonical location for putting JSON output (and the "data" of a command, generally). Then things like `| jq` just work.

zbendefy•21h ago

Edit: Oh I guess it seems to be intentional, I clicked around and I like the rgbcube site map.

omnicognate•13h ago

gerikson•21h ago

> Okay, apparently the stddata addition is causing havoc (who knew how many scripts just haphazardly hand programs random file descriptors, that's surely not a problem.)

I knew, and I've known since reading the "C shell considered harmful" paper, which offhandedly mentioned that sh-based shells can use an arbitrary number of file descriptors (maybe they have to be one-digit integers though). csh can't, of course.

It's discussed in the first section here

https://harmful.cat-v.org/software/csh

theamk•16h ago

this brings memories - university, first Unix exposure, Sun Ray terminals, "tcsh" as default shell, and me doing "find / -name ..." a lot.

I always wanted to ignore all errors form this (there was a lot of "permission denied"), but tcsh just didn't have a simple ability to do so. This taught me a valuable lesson about some software just being better than other. And to this day, I keep wondering you would people choose to use csh/tcsh voluntarily.

layer8•16h ago

Tcsh originally was more user-friendly for interactive use. The rest is inertia.

mmastrac•20h ago

It's a shame that stdX streams were never spec'd as sockets, with appropriate handling available in the various shells.

Also, file handle inheritance by default was such a big mistake.

nulld3v•20h ago

Yeah, POSIX made choices that looked sane and even elegant at the time, but nowadays I think it is fair to say that they have not aged well. Like it's not just FDs getting inherited by default, almost everything gets inherited by default:

Working dir, env vars, uid/gid, socket handles, file descriptors, (some) file locks, message queues. AFAIK the only exception is the argv, everything else is inherited on fork or exec.

Sometimes this makes sense, but programmers always forget about this, resulting in security incidents. Eventually most programming languages gave up and updated their stdlibs to set CLOEXEC when opening files and sockets, knowing that it would break POSIX compatibility and API compatibility on their stdlibs. Python is one example: https://peps.python.org/pep-0446/

The "inherit by default" behavior also makes it very difficult to evolve the shell interface. The nushell devs are looking for a reliable way to request JSON output/input on processes spawned by the shell (if supported by the program). Naively passing env vars or FDs to the process causes problems because if the process spawns any children of it's own, they too would also inherit those env vars or FDs.

bandie91•18h ago

process inheritance was the best invention, because it models reality quite close. you dont have new things just sitting in an empty universe all alone and initialize everything themself from ... somewhere ... because everything is reset around them.

environment (in a broader sense, not just environment variables, but also CWD, file handles, uid/gid, sec context, namespaces) is there for a reason: to use. if you dont want your children processes to read the stdin in place of you, dont give it to them. it's the parent process responsibility to set up the env for the children.

although subprocesses are invented to do (some of) the parent's job by delegating smaller steps and leave the details to them. for example a http server would read the request (first) line, then delegate the rest of the input to a subprocess (worker) depending on who is free, who handles which type of request, etc. this is original idea behind inheritance, IMO.

smarx007•20h ago

This is long overdue. PowerShell has long supported passing structured output (objects) via pipes and this is the closest attempt to approximate that without breaking the world.

account-5•18h ago

I don't know, Nushell does a pretty good job.

https://www.nushell.sh/

williamcotton•20h ago

For this the key would be to eliminate serialization and deserialization between steps in the pipeline.

superdisk•20h ago

Tangential but I was surprised to see that tree(1), at least the popular implementation, is made in Terre Haute (which is where I'm from). Maybe I should invite the author for lunch or something :)

Joker_vD•20h ago

> who knew how many scripts just haphazardly hand programs random file descriptors, that's surely not a problem.

Oh for fuck's sake! Why are you using random file descriptors nobody told you about? Those open fds are there for a reason, thank you: I've put an end of an open pipe specifically so I could notice when it will become closed.

If the user set up the environment of your application in a specific way, that means he wants your application to run in such an environment. If you were invoked with 10 non-standard file descriptors open and two injected threads — you'll have to live with it. Because, believe it or not, your application's purpose is to serve the user's goals. So don't break composability that the user relies on, please.

listeria•15h ago

This is the first I've heard of using an open pipe to poll for subprocess termination. Don't get me wrong, I don't hate it, but you could just as easily have a SIGCHLD handler write to your pipe (or do nothing, since poll(2) will be fail with EINTR), and you don't have to worry about the subprocess closing the pipe or considering it some weird stddata fd like tree does here.

o11c•15h ago

`SIGCHLD` is extremely unreliable in a lot of ways, `pidfd` is better (but Linux-specific), though it doesn't handle the case of wanting to be notified of all grandchildren's terminations after the direct child dies early.

veltas•20h ago

The environment variable isn't much better, both are akin to using a global var in your reentrant code, but at least STDDATA_FD is less likely to collide than 3.

Can't wait for scripts using this variable for something unrelated to break when they call my scripts.

This should be a parameter or argv[0]-based.

bcrl•19h ago

That doesn't work reliably either. No existing code scrubs STDDATA_FD from their environment variables, and there's no way to know if anyone uses STDDATA_FD in the wild. Why not just use a command line parameter like everyone else? Different isn't better in a situation like this.

This is a larger concern I've started to see in a certain class of younger developer where existing conventions are just ignored without an attempt at understanding of why they exist. Things are only going to get worse as naive vibe coders start flinging more AI generated garbage out into the world. I pity the pole folks trying to maintain these systems a couple of decades from now.

veltas•19h ago

That's what I really meant by saying a parameter, it should be an option/flag that's given explicitly at invocation, or just a different program name.

kps•17h ago

Just go for `--json-output=filename` rather than playing games.

8n4vidtmkvmk•2h ago

Why filename? It doesn't need to know how to write files. That's what Greater than is for. Do --output=json

dodomodo•19h ago

every time I see the output of nushell I get so disappointed, they got the formatting so wrong, all the extra delimiters makes it hard to actually read the data. powershell got it right, using alignment. if you look at virtually all shell programs until the last few years you are going to see a similar, alignment based output. only recently, with the rise of the abuse of ligature, we started seeing this kind of incomprehensible blobs surrounding our text.

secret-noun•17h ago

The author states they're using nushell's `markdown` table style because of issues with their font rendering certain characters. `rounded` is the default and indeed, `markdown` looks truly horrible in comparison.

Nushell's front page [1] shows an example of rounded, and here's an example of an even further customized version [2].

I think these are very readable. There is alignment too, but it's "local" alignment to cells in the same sub-table, not "global" to the entire table -- this is good for fitting more stuff into your terminal width without wrapping.

A supporting font is required though, yes.

[1]: https://www.nushell.sh/

[2]: https://i.imgur.com/U4MnYLe.png

dodomodo•16h ago

nushell front page is exactly what I was referring to. Compare the legibility of the ls command in the front page to a regular ls command, it's insane how much more cluttered the nushell version is.

EdSchouten•19h ago

If only there was a variant of execve() / posix_spawn() that simply took a literal array of which file descriptors would need to be present in the new process. So that you can say:

    int subprocess_stdin = open("/dev/null", O_RDONLY);
    int subprocess_stdout = open("some_output", O_WRONLY);
    int subprocess_stderr = STDERR_FILENO; // Let the subprocess use the same stderr as me.
    int subprocess_fds[] = {subprocess_stdin, subprocess_stdout, subprocess_stderr};
    posix_spawn_with_fds("my process", [...], subprocess_fds, 3);

Never understood why POSIX makes all of this so hard.

Y_Y•19h ago

> Never understood why POSIX makes all of this so hard

I honestly can't say in this particular instance but always my (unpopular?) instinct im such a situation is to asdume there is a good reason and I just haven't understood it yet. It may have become irrelevant in the meantime, but I can't know until I understand, and it's served me well to give the patriarchs the benefit of the doubt in such cases.

alerighi•18h ago

It's something trivial to write (~20 lines of code), there is no point for standard library to provide that kind of functions in my opinion.

You do after the fork() (or clone, on Linux) a for loop that closes every FD except the one you want to keep. In Linux there is a close_range system call to close a range of in one call.

POSIX is an API designed to be a small layer on the operating system, and designed to make as little assumption as possible to the underlying system. This is the reason why POSIX is nowadays implemented even on low resources embedded devices and similar stuff.

At an higher level it's possible to use higher level abstractions to manipulate processed (e.g. a C++ library that does all of the above with a modern interface).

deathanatos•18h ago

… what POSIX API gets you the open FDs? (Or even just the maximum open FD, and we'll just cause a bunch of errors closing non-existent FDs.)

o11c•15h ago

That's `sysconf(_SC_OPEN_MAX)`, but it is always an bug to close FDs you don't know the origin of. You should be specifying `O_CLOEXEC` by default if you want FDs closed automatically.

deathanatos•12h ago

That won't returned the maximum open file descriptor. You could perhaps use that value in lieu of the maximum open file descriptor and loop through a crap ton more FDs than even my previous post implied, I suppose, and this is getting less efficient and more terribly engineered by the comment, which I think proves the point…

> but it is always an bug to close FDs you don't know the origin of.

And I would agree. I'm replying to the poster above me, who is staking the claim that POSIX permits closing all open file descriptors other than a desired set.

So, I suppose it can, at a cost of a few thousand syscalls that'll all be pointless…

o11c•15h ago

It is always a bug to call `closerange` since you never know if a parent process has deliberately left a file descriptor open for some kind of tracing. If the parent does not want this, it must use `O_CLOEXEC`. Maybe if you clear the entire environment you'll be fine?

That said, it is trivial to write a loop that takes a set of known old and new fd numbers (including e.g. swapping) produces a set of calls to `dup2` and `fcntl` to give them the new numbers, while correctly leaving all open fds open.

oguz-ismail•18h ago

It's not hard, just a bit too long:

    #include <fcntl.h>
    #include <spawn.h>
    
    int
    main(void) {
      posix_spawn_file_actions_t file_actions;
      posix_spawn_file_actions_init(&file_actions);
      posix_spawn_file_actions_addopen(&file_actions, 0, "/dev/null", O_RDONLY, 0);
      posix_spawn_file_actions_addopen(&file_actions, 2, "/dev/null", O_WRONLY, 0);
      posix_spawnp(NULL, "ls", &file_actions, NULL, (const char *[]){"ls", "-l", "/proc/self/fd", NULL}, NULL);
      posix_spawn_file_actions_destroy(&file_actions);
    }

js8•19h ago

Why not use a saner protocol than JSON, e.g. CBOR?

totallymike•18h ago

Is CBOR as popularly supported as JSON?

Also, to answer your question with a guess, I would suppose it’s because they wanted to use JSON and they wrote the feature.

thechao•17h ago

I do a lot of very low level programming with awful performance-maintenance trade-offs. Here's a great trick for a "binary" JSON: remove all of the extra whitespace, normalize your numbers, and the LZ4 the resulting string.

UTF-8 is already a great wire format.

I've never found a "binary JSON" that's significantly better than this; I mean you can beat it, but you need awkward encodings (prefix indices & other weird shit). You end up burning nearly-byte for any particularly clever integer encoding.

Most data structures are just nested arrays of integers. If you need an integer keyed OBJECT you're SOL, but I just play fiddly games with astral plane UTF-8 characters. (Yeah yeah yeah ad hoc encodings are nasty news.)

If you've got a BUTT LOAD of data just fire up a compressing SQLite DB like a normal human.

js8•16h ago

If you're interested in performance, what about all the number conversion (to decimals, presumably) that is incurred with JSON?

thechao•14h ago

If I'm interested in performance I'll build my data out of offset handles and lay everything down into a block and mmap() it around. That's parsing free, up to an htons() — but that's only a worst case scenario. Everything else is about not inventing something custom & being able to use easily vendored high-trust 3rd party tools. (In this case: a JSON library, LZ4, and/or SQLite.)

tonyarkles•18h ago

Do you have a CBOR implementation that you like? Ideally one with decent schema support? I was looking into CBOR as a replacement for Protobufs for an embedded system I work on and it's got a lot going for it but every implementation I looked at seemed to support a very different subset of the schema spec and it was brutal to try to find a pair of libraries (C for the embedded side, C++ for the host side) that could actually share a set of schema files.

crabbone•18h ago

The company I work for is guilty of abusing 3. We use it for debug output of user-supplied scripts that are meant to implement monitoring / metrics :'(

This is the first time I hear about stddata though. Is this a thing that's going into a standard? Is there already? Or is it just a name someone gave to it and it's not a real thing?

fvwmuser•17h ago

I wouldn't have said this is anything new.

FreeBSD has libxo[0] integrated into some of its tools:

[0] https://github.com/Juniper/libxo

theamk•16h ago

Except they went with --libxo command-line option, which is extremely unlikely to cause any problems in the existing scripts.

krick•16h ago

Can somebody explain what's going on here? It seems I'm missing some important piece of background info. Why don't they just add -J flag for everyone who wants to output JSON? Oh, wait, tree already has -J flag to output JSON. So WTF are they doing here?

I am especially confused by this:

> Surely, nothing will happen if I just assume that the existence of a specific file descriptor implies something, as nobody is crazy or stupid enough to hardcode such a thing?

Wait, what? But "you" (tree authors) just hardcoded such a thing. Do "you" have some special permission to do this nonsense?

cryptonector•14h ago

Sorry, but this is going to be very dangerous because much code will close unwanted FDs then open others. It's 50 years too late to add this convention.

Instead maybe we need new system calls that return dups of a hidden stddata FD or create/replace it.

Do not download the app, use the website

Open Sauce is a confoundingly brilliant Bay Area event

Turn any diagram image into an editable Draw.io file. No more redrawing

CCTV Footage Captures the First-Ever Video of an Earthquake Fault in Motion

It's time for modern CSS to kill the SPA

Show HN: Auto Favicon MCP Server

Users claim Discord's age verification can be tricked with video game characters

Simon Tatham's Portable Puzzle Collection

It's a DE9, not a DB9 (but we know what you mean)

Never write your own date parsing library

Windsurf employee #2: I was given a payout of only 1% what my shares where worth

Vanilla JavaScript support for Tailwind Plus

Efficient Computer's Electron E1 CPU – 100x more efficient than Arm?

Why MIT switched from Scheme to Python (2009)

Animated Cursors

Experimental surgery performed by AI-driven surgical robot

Why I Do Programming

What is X-Forwarded-For and when can you trust it? (2024)

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope

The future is not self-hosted

Developing our position on AI

A Union Pacific-Norfolk Southern combination would redraw the railroad map

CO2 Battery

Generic Containers in C: Vec

Women dating safety app 'Tea' breached, users' IDs posted to 4chan

Programming vehicles in games

Researchers value null results, but struggle to publish them

Steve Jobs' cabinet

Show HN: Apple Health MCP Server

Show HN: Open IT Maintenance Planner

Do not download the app, use the website

Open Sauce is a confoundingly brilliant Bay Area event

Turn any diagram image into an editable Draw.io file. No more redrawing

CCTV Footage Captures the First-Ever Video of an Earthquake Fault in Motion

It's time for modern CSS to kill the SPA

Show HN: Auto Favicon MCP Server

Users claim Discord's age verification can be tricked with video game characters

Simon Tatham's Portable Puzzle Collection

It's a DE9, not a DB9 (but we know what you mean)

Never write your own date parsing library

Windsurf employee #2: I was given a payout of only 1% what my shares where worth

Vanilla JavaScript support for Tailwind Plus

Efficient Computer's Electron E1 CPU – 100x more efficient than Arm?

Why MIT switched from Scheme to Python (2009)

Animated Cursors

Experimental surgery performed by AI-driven surgical robot

Why I Do Programming

What is X-Forwarded-For and when can you trust it? (2024)

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope

The future is not self-hosted

Developing our position on AI

A Union Pacific-Norfolk Southern combination would redraw the railroad map

CO2 Battery

Generic Containers in C: Vec

Women dating safety app 'Tea' breached, users' IDs posted to 4chan

Programming vehicles in games

Researchers value null results, but struggle to publish them

Steve Jobs' cabinet

Show HN: Apple Health MCP Server

Show HN: Open IT Maintenance Planner

3-JSON

Comments