Who are these hypothetical lunatics?
I've done it. We already had a YAML parser in an internal library I maintain since we were already ingesting YAML files for other reasons, so when we later added new files for a different reason that someone decided should be in JSON instead, it was easier and cleaner to keep using the existing YAML parser we already had incorporated rather than add a separate JSON parser along side it.
Imagine building some small app that reads external config file. You personally only care about yaml files, but your avid users ask for official json support as well because of a few of their files.
It's not high priority, but as you remember the old saying, json is just a subset, so you try a bunch of files, confirm it works well enough, and decide to pipe JSON files to your yaml parser as well. Done and done.
And it's not so big block of code or anything that jumps at the users, the dev just happens to share a parser between formats.
field: <%= File.read('/foo/bar').to_json %>
Though I have myriad criticisms of YAML, this article's arguments are not a concern at all if you can ensure that your YAML parser always uses 1.2 regardless of any %YAML directive.If all you do with YAML is serialize data, from one tool, to the exact same tool, it's fine. For all other purposes you should seek a different data format (if you don't want to deal with the eventual bugs).
(note that I don't mean parser/library, I mean tool. the tool using the library will often use or not-use certain options which increases the complexity of the interactions and leads to more possible failures)
GitHub Actions, Dependabot, and Docker Compose never complain.
YAML.load '[FI,NO,SE]'
=> ["FI", false, "SE"]
Ah yes, I remember that. %YAML 1.2
Absolutely no truck with this either. If you want another whitespace obsessed bug farm, you can give it a new name.Stay with XML. It's fine. I wrote a bunch earlier this evening, even though you're not really meant to edit it by hand, and that was fine too. Emacs totally understands what XML is.
XML grows on you. XSL transforms are _probably_ not a good idea but they also kind of grow on you. It turns into html rather easily as well.
Then a program can be made that writes it as "1.0e+2", which is also valid in JSON as well as YAML, regardless of what the reader expects. (However, some formats will not need numbers that will need the scientific notation anyways.)
It does not help if you are trying to use a YAML parser to parse a JSON file, but at least, it avoids a different problem.
If you are making your own which does not need to work with an existing format, then you do not necessarily need to use YAML nor JSON; you can choose the appropriate format for it. You can consider what numbers and what character sets you intend to use, and if you will use binary data, etc. If you like the structured data but do not need a text format than DER is another way.
In [1]: v = "\N{PILE OF POO}"
In [2]: yaml.load(json.dumps(v), yaml.SafeLoader) == v
Out[2]: False
The specification is woefully underspecified with regards to Unicode escapes. E.g., it uses "Unicode characters" practically throughout, a construct that doesn't exist in Unicode & (AFAICT) is not defined by YAML. A reasonable interpretation of that leads us to \uabcd
(which the spec says is "Escaped 16-bit Unicode character.") …decoding to the USV with value 0xabcd. But that's not compatible with JSON.(PyYAML is not the only library with that reading of the spec, either. Rust's will outright error on the input here, as its `str` type is equivalent to [USV], whereas Python's `str` is not. (The value Python decodes in the example above is a representable but illegal value.))
sshine•9h ago
If you assume that YAML 1.2 is the default, you don't need that nasty %YAML header.
This doesn't translate to arbitrary, open environments, but you can make that choice in closed environments.
While JSON numbers are grammatically simple, they're almost always distinct from how you'd implement numbers in any language that has JSON parsers, syntactically, exactness and precision-wise.
So while YAML is a lot more complex, you always need to limit yourself from what kinds of numbers you actually try to express in JSON. This is especially true for scientific numbers, big numbers, and numbers exact down to many digits.
jorams•8h ago
Indeed the YAML 1.2 spec says a document without a YAML directive should be assumed to be 1.2[1]:
> A version 1.2 YAML processor must accept documents with an explicit “%YAML 1.2” directive, as well as documents lacking a “YAML” directive. Such documents are assumed to conform to the 1.2 version specification.
It's only the YAML 1.2 spec that says it's a superset of JSON. The YAML authors weren't aware of JSON when publishing version 1.1[2]:
> The YAML 1.1 specification was published in 2005. Around this time, the developers became aware of JSON. By sheer coincidence, JSON was almost a complete subset of YAML (both syntactically and semantically).
> In 2006, Kyrylo Simonov produced PyYAML and LibYAML. A lot of the YAML frameworks in various programming languages are built over LibYAML and many others have looked to PyYAML as a solid reference for their implementations.
> The YAML 1.2 specification was published in 2009. Its primary focus was making YAML a strict superset of JSON. It also removed many of the problematic implicit typing recommendations.
The middle paragraph there is the reason this is a problem people keep running into: Most implementations are based on LibYAML, which is an implementation of YAML 1.1 that does not really support 1.2[3]. Indeed the last example from the post doesn't actually work for me on Ruby 3.4.4 with LibYAML 0.2.5. It produces the exact same output as the one before it.
[1]: https://yaml.org/spec/1.2.2/#681-yaml-directives
[2]: https://yaml.org/spec/1.2.2/#12-yaml-history
[3]: https://github.com/yaml/libyaml/issues/20
jmillikin•8h ago
By the time YAML 1.2 had been published and implementations written, greenfield projects were using either JSON5 (a true superset of JSON) or TOML.
For statically-typed languages the range and precision is determined by the type of the destination value passed to the parser; it's straightforward to reject (or clamp) a JSON number `12345` being parsed into a `uint8_t`.For dynamically-typed languages there's less emphasis on performance, so using an arbitrary-precision numeric type (Python's Decimal, Go's "math/big" types) provide lossless decoding.
The only language I know of that really struggles with JSON numbers is, ironically, JavaScript -- its BigInt type is relatively new and not well integrated with its JSON API[0], and it doesn't have an arbitrary-precision type.
[0] See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... for the incantation needed to encode a BigInt as a number.