Output is to standard out, or a file specified by the --outfile switch. Input is from either standard in, or from a file if using the --file switch
It looks like both the JavaScript version and the new Python C# wrapper have equivalent CLI tools as well.I somewhat regularly use this on Linux. I think it also works on OS X
It doesn’t align tables like FracturedJson, but it does format values on a single line where possible. The pretty printer is based on the classic A Prettier Printer by Philip Wadler; the algorithm is quite elegant. Any value will be formatted wide if it fits the target width, otherwise tall.
I can see potential usefulness of this is in debug mode APIs, where somehow comments are sent as well and are rendered nicely. Especially useful in game dev jsons.
country: no
As equivalent to a boolean falsy value: country: false
It is a relatively common source of problems. One solution is to escape the value: country: “no”
More context: https://www.bram.us/2022/01/11/yaml-the-norway-problem/ Roles: [editor, product_manager]
End tags, that I’m not sure what that is. But three dashes is part of the spec to delineate sections: something:
setting: true
---
another:
thing: falseIf I’m working with Java it’s indeed conceivable that I could update with some effort.
If I’m working with Node it’s conceivable that I could update with some effort.
If I working with YAML is it not conceivable that I could update with some effort?
PHP is stupid because version 3 did not support object oriented programming.
CSS is bad because version 2 did not support grid layouts or flexbox.
Why should I critique on these based on something that they have fixed a long time ago instead of working on updating to the version which contain the fix I am complaining about?
There is a gradient limit where the onus shifts squarely to one side once the spec has changed and a number of libraries have begun supporting the new spec.
E.g. kubernetes wrote about solving this only five months ago[1] and by moving from yaml to kyaml, a yaml subset.
[1]: https://kubernetes.io/blog/2025/07/28/kubernetes-v1-34-sneak...
Do you have a source? Afaik v1.1 didn’t introduce such a change, v1.0 specified the same behavior for quoted strings, i.e. in v1.0 a quoted “no” would remain a string “no” as well.
>>> import yaml
>>> yaml.safe_load("country: NO")
{'country': False}
Other people did not stop having this problem.It might be that there’s some setting that fixes this or some better library that everyone should be switching to, but YAML has nothing that I want and has been a repeated source of footguns, so I haven’t found it worth looking into. (I am vaguely aware that different tools do configure YAML parsing with different defaults, which is actually worse. It’s another layer of complexity on an already unnecessarily complex base language.)
Yaml - just say Norway
Stuff that would have been structurally impossible in XML will happen in yaml. And I don't even like XML.
Available and kept up-to-date. I found for Python, PHP:
https://github.com/j13k/yaml-lint
https://github.com/adrienverge/yamllint
Also .net:
https://github.com/aaubry/YamlDotNet
And NPM/js:
Both objects desugar to a sequence of segments (lines).
The result is that you can freely mix expression/assignment blocks & statements. Things like switch-case blocks & macro tables are suddenly trivial to format in 2d.
Because comments are handled as right floating, all comments nicely align.
I vibe coded the base layer in an hour. I'm using with autogenerated code, so output is manually coded based on my input. The tricky bit would be "discovering" tables & block. I'd jus use a combo of an LSP and direct observation of sequential statements.
There's an older pure Python version but it's no longer maintained - the author of that recently replaced it with a Python library wrapping the C# code.
This looks to me like the perfect opportunity for a language-independent conformance suite - a set of tests defined as data files that can be shared across multiple implementations.
This would not only guarantee that the existing C# and TypeScript implementations behaved exactly the same way, but would also make it much easier to build and then maintain more implementations across other languages.
Interestingly the now-deprecated Python library does actually use a data-driven test suite in the kind of shape I'm describing: https://github.com/masaccio/compact-json/tree/main/tests/dat...
That new Python library is https://pypi.org/project/fractured-json/ but it's a wrapper around the C# library and says "You must install a valid .NET runtime" - that makes it mostly a non-starter as a dependency for other Python projects because it breaks the ability to "pip install" them without a significant extra step.
And OK it's not equivalent to a formal proof, but passing 1,000+ tests that cover every aspect of the specification is pretty close from a practical perspective, especially for a visual formatting tool.
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-...
So I think under some computer science theory case for arbitrary functions its not possible, but for the actual shape of behavior in question from this library I think its realistic that a decent corpus of 'real' examples and then differential fuzzing would give you more confidence that anyone has in nearly any program's correctness here on real Earth.
When I hear guarantee, it makes me think of correctness proofs.
Confidence is more of a practical notion for how much you trust the system for a given use case. Testing can definitely provide confidence in this scenario.
There are only 8 32-bit Mersenne primes, 4 of which are byte-valued. Fuzzing might catch the bug, if it happened to hit one of the four other 32-bit Mersenne primes (which, in many fuzzers, is more likely than a uniform distribution would suggest), but I'm sure you can imagine situations where it wouldn't.
Or branch coverage for the lesser version, the idea is still to generate interesting cases based on each implementation, not based solely on one of them.
Sure you would. If the mutation tester mutates that lookup table. Which is quite easy to do, and which mutmut will do (if that lookup table is inside a function, because mutmut is based on mutant schemata).
More details on a sibling comment:
https://github.com/fcoury/fracturedjson-rs https://crates.io/crates/fracturedjson
Comment with details: https://news.ycombinator.com/item?id=46468641
{
foo: "bar",
ans: 42,
comments: {
ans: "Douglas Adams"
}
}If it's purely for machine consumption then I suspect you might be describing a schema and there are also tools for that.
multiply that for a long file... it takes a toll
---
also sometimes one field contains a lot of separate data (because it's straight up easier to deserialize into a single std::vector and then do stuff) - so you need comments between data points
That way, the original JSON file stays clean and isn’t polluted with extra data.
FracturedJson does not add any extra data; it only changes the formatting (it is a way of automatically formatting JSON data, not a new file format). However, the documentation mentions that in some cases it does reorder or rewrite things (such as the order of keys, the number of decimal places, etc).
If you set CommentPolicy=TreatAsError then programs that convert it to a canonical form (whether or not that canonical form is JSON or some binary format intended to be like JSON) should (hopefully) result in the same output with the original and the one converted by FracturedJson, depending on what things are considered to be significant. (I tested this with a program I wrote, which converts JSON to DER (which is a canonical form (and, in my opinion, usually the only good one) of ASN.1), and does not consider the order of keys or the representation of numbers to be significant (although the conversion of numbers does not lose any precision and is converted exactly, but e.g. "1.2" is considered the same as "1.200" and "80e1" is considered the same as "800").)
I know that LLMs are very familiar with JSON, and choosing uncommon schemas just to reduce tokens hurts semantic performance. But a schema that is sufficiently JSON-like probably won't disrupt model path/patterns that much and prevent unintended bias.
I suspect this happened because most of the pre-training corpus was pretty-printed JSON, and the LLM was forced to derail from likely path and also lost all "visual cues" of nesting depth.
This might happen here too, but maybe to a lesser extent. Anyways, I'll stop building castles in the air now and try it sometime.
I've also been working in the other direction, making JSON more machine-readable:
https://github.com/kstenerud/bonjson/
It has EXACTLY the same capabilities and limitations as JSON, so it works as a drop-in replacement that's 35x faster for a machine to read and write.
No extra types. No extra features. Anything JSON can do, it can do. Anything JSON can't do, it can't do.
Thanks for sharing your work!
I'm actually having second thoughts with Concise Encoding. It's gotten very big with all the features it has, which makes it less likely to be adopted (people don't like new things).
I've been toying around with a less ambitious format called ORB: https://github.com/kstenerud/orb
It's essentially an extension of BONJSON (so it can read BONJSON documents natively) that adds extra types and features.
I'm still trying to decide what types will actually be of use in the real world... CE's graph type is cool, but if nobody uses it...
Your extensions of JSON with comments, hexadecimal notation, optional commas, etc is useful though (my own program to convert JSON to DER does treat commas as spaces, although that is an implementation detail).
Unrelated JSON experience:
I worked on a serializer which save/load json files as well as binary file (using a common interface).
From my own use case I found JSON to be restrictive for no benefit (because I don't use it in a Javascript ecosystem)
So I change the json format into something way more lax (optional comma, optional colon, optional quotes, multi line string, comments).
I wish we would stop pretending JSON to be a good human-readable format outside of where it make sense and we would have a standard alternative for those non-json-centric case.
I know a lot of format already exists but none really took off so far.
It sucks, but we're stuck with JSON. So the idea here is to make it suck a little less by stopping all this insane text processing for data that never ever meets a human directly.
The progression I envisage is:
1. Dev reaches for JSON because it's easy and ubiquitous.
2. Dev switches to BONJSON because it's more efficient and requires no changes to their code other than changing the codec library.
3. Dev switches to a sane format after the complexity of their app reaches a certain level where a substantial code change is warranted.
If you need custom data types, you can use tagged elements, but that requires you to have functions registered to convert the data type to/from representable values (often strings).
It natively supports quite a bit more than JSON does, without writing custom data readers/writers.
I've found a more comprehensive documentation here. [1]
At first glance, I would say it's a bit more complex that it should for a "human readable" format.
As for FracturedJson, it looks great. The basic problem statement of "either minified and unreadable or prettified and verbose" isn't one I had put my finger on before, but now that it's been said I can't unsee it.
Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?
The spec on the GitHub says that it is banned to include NUL under a security stance, that someone that after parse someone might do strlen and accidentally truncate to a shorter string in C.
Which I think has some premise, but its a valid string contents in JSON (and in Utf8), so it is deliberately breaking 1:1 parity with JSON parity in the name of a security hypothetical.
Users can of course enable NUL in the rare cases where they need it, but I want safe defaults.
Actually, I'll make that section clearer.
Just focusing narrowly on the \0 part to explain why I say so: the spec proposed is that implementations have to either hard ban embedded \0 or disallow by default with an opt in. So someone comes with a dataset that has it, they can get support in this case only if they configure both the serializer and parser to allow it. But if you're willing to exert that level of special case extra control, I think all of the other preexisting binary-json implementations that exist do meet the top line definition you are setting as well. For some binary-json implementation which has additional types, if someone is in full end to end control to special case, then they could just choose not to use those types too, the mere existence of extra types in the binary format is no extra "problem" for 1:1 than this choice.
IMO the deliverable that a 1:1 mapping would give us "there is no bonjson data that won't losslessly round trip to JSON and vice versa". The benefit is when it is over all future data that you haven't seen yet, where the downside of using something that is not bijective is that you run for a long time suddenly you have data dependent failures in your system because you can't 1:1 map legal data.
And especially with this guarantee, what will inevitably happen is some downstream handling will also take as a given that they can strlen() since they "knew" the bonjson format spec banned it, so suddenly when you have it as in-bounds data you also won't be able to trivially flip the switch, instead you are stuck with legal JSON that you can't ingest in your system without an expensive audit because the reduction from 1:1 gets entrenched as an invariant into the handling code.
Note that my vantage point might be a bit skewed here: I work on Protobuf and this shape of ecosystem interoperability topics are top of mind for me in ways that they don't necessarily need to be for small projects, and I also recognize that "what even is legal JSON" itself is not actually completely clear, so take it all with a grain of salt (and again, I also do think it looks like a very nice encoding in general).
Friction? yeah, but that's just how it's gonna be.
For the invalid Unicode and duplicate key handling, I'll offer no quarter. The needs of the many outweigh the needs of the few.
But I'll still say it's 1:1 because marketing.
Isn't that lying? Marketing is when you help connect people who require a product or service (the market) with a provider of that product or service.
Nevertheless, I believe your claims are mostly accurate, except for a few issues with which things are allowed or not allowed, due to JavaScript and other things (although in some of these cases, the BONJSON specification allows options to control this). Sometimes rejecting certain things is helpful, but not always; for example sometimes you do want to allow mismatched surrogates, and sometimes you might want to allow null characters. (The defaults are probably reasonable, but are often the result of a bad design anyways, as I had mentioned above.) Also, the top of the specification says it is safe against many attacks, but these are a feature of the implementation, which would also be the case if you are implement JSON or other formats (although the specification for BONJSON does specify that implementations are supposed to check for these things to make them safe).
(The issue of overlong UTF-8 encodings in IIS web servers is another security issue, which is using a different format for validation and for usage. In this case there are actually two usages though, because one of these usages is the handling of relative URLs (using the ASCII format) and the other is the handling of file names on the server (which might be using UTF-16 here; in addition to that is the internal format of the file paths into individual pieces with the internal handling of relative file paths). There are reasons to avoid and to check for overlong UTF-8 encodings, although this is a different more general issue than the character encoding.)
Another issue is canonical forms; the canonical form of JSON can be messy, especially for numbers (I don't know what the canonical form for numbers in JSON is, but I read that apparently it is complicated).
I think DER is better. BONJSON is more compact but that also makes the framing more complicated to handle than DER (which uses consistent framing for all types). I also wrote a program to convert JSON to DER (I also made up some nonstandard types, although the conversion from JSON to DER only uses one of these nonstandard types (key/value list); the other types it needs are standard ASN.1 types). Furthermore, DER is already canonical form (and I had made up SDER and SDSER for when you do not want canonical form but also do not want the messiness of BER; SDSER does have chunking and does not require the length to be known ahead of time, so more like BONJSON in these ways). Because of the consistent framing, you can easily ignore any types that you do not use; even though there are many types you do not necessarily need all of them.
Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing. Falling into success rather than falling into failure.
BONJSON is a small spec, and easy to implement ( https://github.com/kstenerud/ksbonjson/blob/main/library/src... and https://github.com/kstenerud/ksbonjson/blob/main/library/src... ).
It's not the end-all-be-all of data formats; it's just here to make the JSON pipeline suck less.
JSON implementations can be made just as safe, but the issue is that unsafe JSON implementations are still considered valid implementations (and so almost all JSON implementations are unsafe because nobody is an authority on which design is correct). Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
Yes, I agree (if you want to use it at all, which as I have mentioned you should consider if you should not use JSON or something related), although some of the things that you specify as not having options will make it more restrictive than JSON will be, even if those restrictions might be reasonable by default. One of these is mismatched surrogates (although matched surrogates should always be disallowed, an option to allow mismatched surrogates should be permitted (but not required)). Also, I think checking for duplicate names probably should not use normalized Unicode. Furthermore, the part that says that names MUST NOT be null seems redundant to me, since it already says that names MUST be strings (for compatibility with JSON) and null is not a string.
> Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
OK, this is a valid point, although there is still the possibility of incorrect implementations (adding test cases would help with that problem, though).
I am writing this because I work on a related topic https://replicated.wiki/blog/args.html
What I like about fractured json is the middle ground between too-sparse pretty printing, and too-compact non-pretty printing, nu doesn't give me that by default.
One thing that neither fractured json nor nushell gives me, which I'd like, is the ability to associate an annotation with a particular datum, convert to json, convert back to the first language, and have that comment still be attached to that datum. Of course the intermediate json would need to have some extra fields to carry the annotations, which would be fine.
That, and the fact that it has enough bells and whistles to that there are yaml parser exploits out there.
But then I found it's in C#. And apparently the CLI app isn't even published any more (apparently nobody wanted it? Surprises me but ok). Anyway, I don't think I want this enough to install .NET to get it, so that's that. But I'd have liked a version in Go or Rust or whatever.
I plan to take a new look at that when I have the time. But a port to a more CLI-friendly platform could probably do a better job.
https://github.com/zaboople/bin/blob/master/mommyjson.groovy
(btw I would happily upvote a python port, since groovy is not so popular)
[0] - https://github.com/tomnomnom/gron [1] - https://github.com/ckampfe/jstream
Being a lazy slob, I never saw fit to make a dedicated repo (or even, directory) so I have no place for a readme. After all the whole thing fits in one script.
Gron is a great name and the output looks pretty good... I like the idea of outputting perfectly valid javascript (with semicolons, even...)
Let's see if HN will format this sample output from mommyjson right:
employees: [199] profile: projects: [0] tasks: [0] description: Optimized grid-enabled parallelism
employees: [199] profile: projects: [0] tasks: [0] taskId: T368
employees: [199] profile: projects: [0] tasks: [0] assignedTo: {
employees: [199] profile: projects: [0] tasks: [0] assignedTo: id: E00200
employees: [199] profile: projects: [0] tasks: [0] assignedTo: name: Timothy Mullins
employees: [199] profile: projects: [0] tasks: [0] assignedTo: skills: {
employees: [199] profile: projects: [0] tasks: [0] assignedTo: skills: primary: C++
employees: [199] profile: projects: [0] tasks: [0] assignedTo: skills: experience: {
employees: [199] profile: projects: [0] tasks: [0] assignedTo: skills: experience: years: 10https://github.com/fcoury/fracturedjson-rs
https://crates.io/crates/fracturedjson
And install with:
cargo install fracturedjson
> $ fjson --help
Rust port of FracturedJsonJs: human-friendly JSON formatter with optional comment support.
Usage: fjson [OPTIONS] [FILE]...
Arguments:
[FILE]... Input file(s). If not specified, reads from stdin
Options:
-o, --output <FILE>
Output file. If not specified, writes to stdout
-c, --compact
Minify output (remove all whitespace)
-w, --max-width <MAX_WIDTH>
Maximum line length before wrapping [default: 120]
-i, --indent <INDENT>
Number of spaces per indentation level [default: 4]
-t, --tabs
Use tabs instead of spaces for indentation
--eol <EOL>
Line ending style [default: lf] [possible values: lf, crlf]
--comments <COMMENTS>
How to handle comments in input [default: error] [possible values: error, remove, preserve]
--trailing-commas
Allow trailing commas in input
--preserve-blanks
Preserve blank lines from input
--number-align <NUMBER_ALIGN>
Number alignment style in arrays [default: decimal] [possible values: left, right, decimal, normalize]
--max-inline-complexity <MAX_INLINE_COMPLEXITY>
Maximum nesting depth for inline formatting (-1 to disable) [default: 2]
--max-table-complexity <MAX_TABLE_COMPLEXITY>
Maximum nesting depth for table formatting (-1 to disable) [default: 2]
--simple-bracket-padding
Add padding inside brackets for simple arrays/objects
--no-nested-bracket-padding
Disable padding inside brackets for nested arrays/objects
-h, --help
Print help
-V, --version
Print versionIt should even be possible to compile the dotnet library to a C-compatible shared library and provide packages for many other languages.
In this case they are formatting JSON in an easier to read way. It’s not an alternative to CRDT, it is a totally different issue.
Though, I guess, the only(?) great XML workflow is with C# LINQ
Personally, I don't spend much time looking at complex JSON; a binary format like Protobuf along with a typed DSL is often what you need. You can still derive JSON from Proto if you need that. In return, you get faster transport and type safety.
Also, on another note, tools like jq are so ubiquitous that any format that isn't directly supported by jq will have a really hard time seeing mass adoption.
this tool also looks super useful, I spend so much time at work looking at json logs that this will surely come in handy. It’s the kind of thing I didn’t even know I needed, but now that I saw it it makes perfect sense.
damnitbuilds•1mo ago
And BTW, thanks for supporting comments - the reason given for keeping comments out of standard Json is silly ( "they would be used for parsing directives" ).
Xymist•1mo ago
A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
nodja•1mo ago
I don't disagree with the choice, but seeing how things turned out I can't just help but look at the greener grass on the other side.
libria•1mo ago
Better not let me near your JSON files then. I pound in wall anchors with the bottom of my drill if my hammer is not within arms reach.
mystifyingpoi•1mo ago
While I admire his design goals, people will just work around it in a pinch by adding a "comment" or "_comment" or "_comment_${random_uuid}", simply because they want to do the job they need.
If your screwdriver bends like a rubber when prying, damn it, I'll just put a screw next to it, so it thinks it is used for driving screws and thus behaves correctly.
pixl97•1mo ago
speed_spread•1mo ago
damnitbuilds•1mo ago
I can not follow this law by making my API depend, say, the contents of a string value. Preventing APIs depending on the value of a comment is no different, so your argument is not a reason for not having comments.
patates•1mo ago
I also would have wanted comments, but I see why Crockford must have been skeptical. He just didn't want JSON to be the next XML.
frizlab•1mo ago
cromulent•1mo ago
> Insignificant whitespace is allowed before or after any token.
frizlab•1mo ago
Sohcahtoa82•1mo ago
damnitbuilds•1mo ago