JSON is not a YAML subset (2022)

https://john-millikin.com/json-is-not-a-yaml-subset

39•AndrewDucker•10h ago

Comments

sshine•9h ago

I wonder how widespread YAML 1.1 is.

If you assume that YAML 1.2 is the default, you don't need that nasty %YAML header.

This doesn't translate to arbitrary, open environments, but you can make that choice in closed environments.

While JSON numbers are grammatically simple, they're almost always distinct from how you'd implement numbers in any language that has JSON parsers, syntactically, exactness and precision-wise.

So while YAML is a lot more complex, you always need to limit yourself from what kinds of numbers you actually try to express in JSON. This is especially true for scientific numbers, big numbers, and numbers exact down to many digits.

jorams•8h ago

> If you assume that YAML 1.2 is the default, you don't need that nasty %YAML header.

Indeed the YAML 1.2 spec says a document without a YAML directive should be assumed to be 1.2[1]:

> A version 1.2 YAML processor must accept documents with an explicit “%YAML 1.2” directive, as well as documents lacking a “YAML” directive. Such documents are assumed to conform to the 1.2 version specification.

It's only the YAML 1.2 spec that says it's a superset of JSON. The YAML authors weren't aware of JSON when publishing version 1.1[2]:

> The YAML 1.1 specification was published in 2005. Around this time, the developers became aware of JSON. By sheer coincidence, JSON was almost a complete subset of YAML (both syntactically and semantically).

> In 2006, Kyrylo Simonov produced PyYAML and LibYAML. A lot of the YAML frameworks in various programming languages are built over LibYAML and many others have looked to PyYAML as a solid reference for their implementations.

> The YAML 1.2 specification was published in 2009. Its primary focus was making YAML a strict superset of JSON. It also removed many of the problematic implicit typing recommendations.

The middle paragraph there is the reason this is a problem people keep running into: Most implementations are based on LibYAML, which is an implementation of YAML 1.1 that does not really support 1.2[3]. Indeed the last example from the post doesn't actually work for me on Ruby 3.4.4 with LibYAML 0.2.5. It produces the exact same output as the one before it.

[1]: https://yaml.org/spec/1.2.2/#681-yaml-directives

[2]: https://yaml.org/spec/1.2.2/#12-yaml-history

[3]: https://github.com/yaml/libyaml/issues/20

jmillikin•8h ago

Among ecosystems based on YAML-formatted configuration defaulting to YAML 1.1 is nearly universal. The heyday of YAML was during the YAML 1.1 era, and those projects can't change their YAML parsers' default version to 1.2 without breaking extant config files.

By the time YAML 1.2 had been published and implementations written, greenfield projects were using either JSON5 (a true superset of JSON) or TOML.

  > While JSON numbers are grammatically simple, they're almost always distinct
  > from how you'd implement numbers in any language that has JSON parsers,
  > syntactically, exactness and precision-wise.

For statically-typed languages the range and precision is determined by the type of the destination value passed to the parser; it's straightforward to reject (or clamp) a JSON number `12345` being parsed into a `uint8_t`.

For dynamically-typed languages there's less emphasis on performance, so using an arbitrary-precision numeric type (Python's Decimal, Go's "math/big" types) provide lossless decoding.

The only language I know of that really struggles with JSON numbers is, ironically, JavaScript -- its BigInt type is relatively new and not well integrated with its JSON API[0], and it doesn't have an arbitrary-precision type.

[0] See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... for the incantation needed to encode a BigInt as a number.

pavel_lishin•9h ago

> Regardless of whether YAML 1.2 has been (or will be) widely adopted, it does not help those who want to parse a JSON document with a YAML parser.

Who are these hypothetical lunatics?

coderjames•8h ago

> those who want to parse a JSON document with a YAML parser.

I've done it. We already had a YAML parser in an internal library I maintain since we were already ingesting YAML files for other reasons, so when we later added new files for a different reason that someone decided should be in JSON instead, it was easier and cleaner to keep using the existing YAML parser we already had incorporated rather than add a separate JSON parser along side it.

randallsquared•8h ago

Anyone who wants to convert a piece at a time to yaml, but would prefer to just use a single parse step (as suggested by one of the quoted bits in the article). It's a niche case, but sensical.

zamalek•7h ago

You want to use a machine to create a document for something that accepts JSON. e.g. Something I was playing around with was having a github workflow that generates a github workflow. Or generating docker compose files.

makeitdouble•7h ago

It works in most trivial cases, so I'm assuming it's widely done irl.

Imagine building some small app that reads external config file. You personally only care about yaml files, but your avid users ask for official json support as well because of a few of their files.

It's not high priority, but as you remember the old saying, json is just a subset, so you try a bunch of files, confirm it works well enough, and decide to pipe JSON files to your yaml parser as well. Done and done.

And it's not so big block of code or anything that jumps at the users, the dev just happens to share a parser between formats.

MyOutfitIsVague•7h ago

Ruby on Rails has a databases.yml that is preprocessed by ERB. I often do something like:

    field: <%= File.read('/foo/bar').to_json %>

Though I have myriad criticisms of YAML, this article's arguments are not a concern at all if you can ensure that your YAML parser always uses 1.2 regardless of any %YAML directive.

0xbadcafebee•6h ago

As someone who has written a lot of YAML-compatible software, this is my biggest takeaway: every single YAML-parsing tool is incompatible with each other (in some way). If two tools work together, it's an accident. They will work for a while until someone tries to do something with one tool that the other tool doesn't like.

If all you do with YAML is serialize data, from one tool, to the exact same tool, it's fine. For all other purposes you should seek a different data format (if you don't want to deal with the eventual bugs).

(note that I don't mean parser/library, I mean tool. the tool using the library will often use or not-use certain options which increases the complexity of the interactions and leads to more possible failures)

osigurdson•6h ago

Kubernetes is a good example. Objects can be read / written as json or yaml interchangeably.

antonvs•6h ago

It’s not an example at all. It uses different parsers and renderers for each format.

ethan_smith•6h ago

Configuration systems like Kubernetes and Docker Compose use YAML parsers that accept JSON as input, allowing users to write in either format without changing the tooling.

williamjackson•5h ago

I don’t like writing YAML. I like writing Python. So I write Python that generates YAML config files. JSON is in the standard library but YAML isn’t. So I generate JSON and name the file “file.yaml”.

GitHub Actions, Dependabot, and Docker Compose never complain.

jaredklewis•40m ago

For real. You can find hardened, optimized JSON parsers for every runtime worth running. Hard to think of why you would want to avoid all those JSON parsers and use a YAML one to parse JSON.

JonChesterfield•8h ago

    YAML.load '[FI,NO,SE]'
    => ["FI", false, "SE"]

Ah yes, I remember that.

    %YAML 1.2

Absolutely no truck with this either. If you want another whitespace obsessed bug farm, you can give it a new name.

Stay with XML. It's fine. I wrote a bunch earlier this evening, even though you're not really meant to edit it by hand, and that was fine too. Emacs totally understands what XML is.

joaohaas•7h ago

YAML sucking is no excuse to keep using XML. JSON, JSON5 and TOML are all great alternatives for projects.

chowells•7h ago

On multiple occasions, I've wanted a standard format that allows large multi-line text blocks to be unquoted. JSON, JSON5, and TOML don't do that. You know what does? YAML and XML. I'm not really a fan of either of them, but where's the better option that still gives me large unquoted text blocks?

JonChesterfield•7h ago

XML is also game for blocks of text with clumsy <xsl:value-of select='$thing' /> scattered through it for an ad hoc string substitution in those unquoted text blocks. Lua has a nice large blocks of literal text notation too.

mdaniel•6h ago

I hate toml more than most people, but in fairness it does have two kinds of multiline strings: https://toml.io/en/v1.0.0#string

JonChesterfield•7h ago

Doubtful. JSON has javascript semantics all over it. TOML I've had to look up and seems vaguely fine as such things go, but doesn't have schemas.

XML grows on you. XSL transforms are _probably_ not a good idea but they also kind of grow on you. It turns into html rather easily as well.

zahlman•6h ago

I don't see anything preventing the definition of TOML schemas — in TOML, even. There just doesn't appear to be demand for it.

osigurdson•6h ago

YAML is better than all of those things imo. It is easier for me to read and write and works better with more complex configs when you have mixtures of other types of formats in a single file (e.g. xml, json, bash, etc., which is sometimes useful in Kubernetes).

zzo38computer•7h ago

> 1e2 is a valid JSON number, but YAML 1.1 requires it to be written as 1.0e+2

Then a program can be made that writes it as "1.0e+2", which is also valid in JSON as well as YAML, regardless of what the reader expects. (However, some formats will not need numbers that will need the scientific notation anyways.)

It does not help if you are trying to use a YAML parser to parse a JSON file, but at least, it avoids a different problem.

If you are making your own which does not need to work with an existing format, then you do not necessarily need to use YAML nor JSON; you can choose the appropriate format for it. You can consider what numbers and what character sets you intend to use, and if you will use binary data, etc. If you like the structured data but do not need a text format than DER is another way.

deathanatos•7h ago

Even if you have a YAML 1.2 parser, here's another one:

  In [1]: v = "\N{PILE OF POO}"

  In [2]: yaml.load(json.dumps(v), yaml.SafeLoader) == v
  Out[2]: False

The specification is woefully underspecified with regards to Unicode escapes. E.g., it uses "Unicode characters" practically throughout, a construct that doesn't exist in Unicode & (AFAICT) is not defined by YAML. A reasonable interpretation of that leads us to

  \uabcd

(which the spec says is "Escaped 16-bit Unicode character.") …decoding to the USV with value 0xabcd. But that's not compatible with JSON.

(PyYAML is not the only library with that reading of the spec, either. Rust's will outright error on the input here, as its `str` type is equivalent to [USV], whereas Python's `str` is not. (The value Python decodes in the example above is a representable but illegal value.))

temporallobe•6h ago

If you really must embed JSON in YAML, just encode it to base64.

WinUI OSS Update: Phased Rollout Toward Open Collaboration

How to make people give a damn

Termagotchi – A terminal-based Tamagotchi simulation written in Go

Unleashing potential energy in my EV

Show HN: Valitron – I built an AI that interviews and ranks job applicants

Meejah/shwim: Peer-to-peer terminal sharing

You're probably not learning with AI

Thinking in Crypto Security for Cyberpunk Individuals

China claims Nvidia built backdoor into H20 chip

Persona vectors: Monitoring and controlling character traits in language models

Primesweeper

C3 Programming Language 0.7.4 Release

Tesla loses Autopilot wrongful death case in $329M verdict

The Quintessential Urban Design of 'Sesame Street'

Way-secure: A helper to create Wayland security contexts via security_context_v1

Retrieval Embedding Benchmark (RTEB)

Filesystem for syncing notes to your calendar

Fathers plan legal action to get smartphones banned in England's schools

Valley of Despair

Show HN: I built a tool to make screenshots 10x better

Which States Lose the Most Money to Cybercrime?

Ladybird Browser July Update

Scientists Are Learning to Rewrite the Code of Life

Ranking the 25 Top Venture-Backed Cybersecurity Companies Growing Fast in 2025

OutRun: A new version of the game Out Run of 1986 for PC using SFML and C++

Parallel Programming Models

Langton's Ant

Terence Tao weighs in on the suspension of UCLA grants

Excelling in Excel: Inside the high-stakes world of competitive spreadsheeting

The Story Behind Michael Jackson Buying The Beatles’ Catalog