The lost art of XML

https://marcosmagueta.com/blog/the-lost-art-of-xml/

166•Curiositry•2w ago

Comments

shadowgovt•2w ago

XML was abandoned because we realized bandwidth costs money and while it was too late to do anything about how verbose HTML is, we didn't have to repeat the mistake with our data transfer protocols.

Even with zipped payloads, it's just way unnecessarily chatty without being more readable.

_heimdall•2w ago

That doesn't match my memory, though its been a while now!

I remember the arguments largely revolving around verbosity and the prevalence of JSON use in browsers.

That doesn't mean bandwidth wasn't a consideration, but I mostly remember hearing devs complain about how verbose or difficult to work with XML was.

johngossman•2w ago

Your memory is correct. Once compression was applied, the size on the wire was mostly a wash. Parsing costs were often greater but that's at the endpoints.

shadowgovt•2w ago

But one of those endpoints is a client on a mobile phone, which when we started with Internet on mobile devices wasn't a particularly powerful CPU architecture.

voidfunc•2w ago

OK, but XML is a pretty solid format for a lot of other stuff that doesn't necessarily need network transmission.

shadowgovt•2w ago

This is true, but if other formats work for those purposes and also network transmission, they'll start to edge out the alternative of supporting two different protocols in your stack.

cosmotic•2w ago

The article addresses this.

howdyhowdyhowdy•2w ago

if bandwidth was a concern, JSON was a poor solution. XML compresses nicely and efficiently. Yes it can be verbose to the human eyes, but I don't know if bandwidth is the reason it's not used more often.

adgjlsfhk1•2w ago

JSON absolutely isn't perfect, but it's a spec that you can explain in ~5 minutes, mirrors common PL syntax for Dict/Array, and is pretty much superior to XML in every way.

howdyhowdyhowdy•2w ago

Sure, but the argument is bandwidth which is what I’m comparing them to solutions to each other against.

_heimdall•2w ago

This is a debate I've had many times. XML, and REST, are extremely useful for certain types of use cases that you quite often run into online.

The industry abandoned both in favor of JSON and RPC for speed and perceived DX improvements, and because for a period of time everyone was in fact building only against their own servers.

There are plenty of examples over the last two decades of us having to reinvent solutions to the same problems that REST solved way back then though. MCP is the latest iteration of trying to shoehorn schemas and self-documenting APIs into a sea of JSON RPC.

locknitpicker•2w ago

Your comment doesn't sound well researched or thought all the way through. REST by definition is used nowhere at all, and virtually all RESTful APIs are RPC-over-HTTP that are loosely inspired in REST.

There is virtually zero scenarios where anyone at all ever said "This thing we're using JSON for would be easier if we just used XML".

JSON was the undisputed winner of a competition that never was in a great part because of the vast improvements over DX. I remind you that JSON is the necessary and sufficient subset of JavaScript that allowed to define data, and to parse it all anyone had to do was to pipe it to a very standard and ubiquitous eval(). No tooling, no third-party module, no framework. Nothing. There is no competition at all.

ivan_gammel•2w ago

>REST by definition is used nowhere at all

There exist plenty of people actually using REST. It can reduce complexity of SPAs.

locknitpicker•2w ago

Name one application which uses HATEOAS.

_heimdall•2w ago

Any server rendered HTML site or application?

pjmlp•2w ago

Never used it on ASP.NET or Java EE/Jakarta EE/Spring.

_heimdall•2w ago

Were you exclusively building SPAs with all of those frameworks? If you ever rendered state/content to HTML on the server you were using HATEOAS (and REST) principles.

ivan_gammel•2w ago

I have built a few. And of course there’s a lot of interest in the community around various HATEOAS specs. People build with HAL, Siren, JSON:LD etc.

_heimdall•2w ago

Your argument isn't researched either, if your metric is based on including sources.

You seem to be arguing that REST lost because if you look around today you will only find RPC. I agree. My point wasn't that REST won. Part of my point, though, was that REST lost and the industry has tried multiple times to bolt don't JSON RPC solutions to the same problems REST already addressed. If you would like to see some of those examples just look up Swagger, Open API, or MCP.

I agree JSON won, and I agree that it was picked based on arguments over DX. I'm not sure where you and k disagree here.

imtringued•1w ago

Swagger, OpenAPI have nothing to do with REST. In fact, they are the antithesis of REST, because they are the very out of band information that a REST client isn't allowed to depend on. A REST client for a pizza delivery restaurant isn't allowed to know that the "order" action is called "order", because that is out of band information. If the server is replaced by a hotel booking server with a completely different workflow, the client is supposed to continue working.

Only MCP is loosely related to REST and it's because one of the defining characteristics of REST is that the API can be discovered at runtime, which is useful if you have a pseudo human level intelligence such as an LLM, but not if you have a dumb static application that has finite capabilities.

_heimdall•1w ago

I was trying to say the same thing - Swagger and OpenAPI were attempts to solve problems that only exist because we didn't stick with REST. We wouldn't have needed Swagger if APIs were discoverable and self documenting.

striking•2w ago

I tried using XML on a lark the other day and realized that XSDs are actually somewhat load bearing. It's difficult to map data in XML to objects in your favorite programming language without the schema being known beforehand as lists of a single element are hard to distinguish from just a property of the overall object.

Maybe this is okay if you know your schema beforehand and are willing to write an XSD. My usecase relied on not knowing the schema. Despite my excitement to use a SAX-style parser, I tucked my tail between my legs and switched back to JSONL. Was I missing something?

mkozlows•2w ago

XML was designed as a document format, not a data structure serialization format. You're supposed to parse it into a DOM or similar format, not a bunch of strongly-typed objects. You definitely need some extra tooling if you're trying to do the latter, and yes, that's one of XSD's purposes.

froh•2w ago

that's underselling xml. xml is explicitly meant for data serialization and exchange, xsd reflects that, and it's the reason for jaxb Java xml binding tooling.

get me right: Json is superior in many aspects, xml is utterly overengineered.

but xml absolutely was _meant_ for data exchange, machine to machine.

mkozlows•2w ago

No. That use case was grafted onto it later. You can look at the original 1998 XML 1.0 spec first edition to see what people were saying at the time: https://www.w3.org/TR/1998/REC-xml-19980210#sec-origin-goals

Here's the bullet point from that verbatim:

  The design goals for XML are:

    XML shall be straightforwardly usable over the Internet.
    XML shall support a wide variety of applications.
    XML shall be compatible with SGML.
    It shall be easy to write programs which process XML documents.
    The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
    XML documents should be human-legible and reasonably clear.
    The XML design should be prepared quickly.
    The design of XML shall be formal and concise.
    XML documents shall be easy to create.
    Terseness in XML markup is of minimal importance.

Or heck, even more concisely from the abstract: "The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML."

It's always talking about documents. It was a way to serve up marked-up documents that didn't depend on using the specific HTML tag vocabulary. Everything else happened to it later, and was a bad idea.

froh•2w ago

please bear with me...

data exchange was baked into xml from the get go, the following predate the 1.0 release and come from people involved in writing the standard:

XML, Java, and the future of the Web Jon Bosak, *Sun Microsystems* Last revised *1997.03.10*

section on Database interchange: the universal hub

https://www.ibiblio.org/bosak/xml/why/xmlapps.htm

Guidelines for using XML for Electronic Data Interchange Version 0.04

*23rd December 1997*

https://xml.coverpages.org/xml-ediGuide971223.html

the origin of the latter, the edi/xml WG, was the successor of an edi/sgml WG which had started in the early 1990, and was born out of the desire to get a "universal electronic data exchange" that would work cross platform, vms, mainframes, unix and even DOS hehe, and to leverage the successful sgml doc book interoperability.

was it niche? yes. was it starting in sgml already? and baked into xml/xsd/xslt? I think so.

fsckboy•2w ago

to be fair

>XML shall be straightforwardly usable over the Internet.

is machine to machine communication

to me, XML is an example of worse is better, or rather, better is worse. it would never have come out of Bell Labs in the early 70s. Neither would JSON for that matter.

mkozlows•2w ago

And as for JAXB, it was released in 2003, well into XML's decadent period. The original Java APIs for XML parsing were SAX and DOM, both of which are tag and document oriented.

zarzavat•2w ago

You have to use the right tool for the job.

XML is extensible markup, i.e. it's like HTML that can be applied to tasks outside of representing web pages. It's designed to be written by hand. It has comments! A good use for XML would be declaring a native UI: it's not HTML but it's like HTML.

JSON is a plain text serialization format. It's designed to be generated and consumed by computers whilst being readable by humans.

Neither is a configuration language but both have been abused as one.

ahf8Aithaex7Nai•2w ago

> It's designed to be written by hand

Are you sure about that? I've heard XML gurus say the exact opposite.

This is a very good example of why I detest the phrase “use the right tool for the job.” People say this as an appeal to reason, as if there weren't an obvious follow-up question that different people might answer very differently.

zarzavat•2w ago

Perfectly sure. XML is eXtensible Markup Language, the generalized counterpart to Hypertext Markup Language.

XML, HTML, SGML are all designed to be written by hand.

You can generate XML, just like you can generate HTML, but the language wasn't designed to make that easy.

Computers don't need comments, matching </end> tags, or whitespace stripping.

There was a time, in the early-mid 2000s when XML was the hammer for every screw. But then JSON was invented and it took over most of those use cases. Perhaps those XML gurus are stuck in a time warp.

XML remains a good way to represent tree structures that need to be human editable.

unscaled•2w ago

SGML was designed for documents, and it can be written by hand (or by a machine). HTML (another descendant of SGML) is in fact written by hand regularly. When you're using SGML descendants for what they were meant for (documents) they're pretty good for this purpose. Writing documents — not configuration files, not serialized data, not code — by hand.

XML can still be used as a very powerful generic document markup language, that is more restricted (and thus easier to parse) than SGML. The problems started when people started using XML for other things, especially for configuration files, data interchange and even for programming language.

So I don't think GP is wrong. The authors of the original XML spec probably envisioned people writing this by hand. But XML is very bad for writing by hand the things that it eventually got used for.

locknitpicker•2w ago

> It's designed to be written by hand.

This assertion is comically out of touch with reality, particularly when trying to describe JSON as something that is merely "readable by humans". You could not do anything at all with XML without having to employ half a dozen frameworks and tools and modules.

g-b-r•2w ago

You can do everything you can do with JSON by just knowing the basic syntax (<element attribute=""></element>).

The complexity about XML comes from the many additional languages and tools built on top of it.

Many are too complex and bloated, but JSON has little to nothing comparable, so it's only simple because it doesn't support what XML does.

ahf8Aithaex7Nai•2w ago

> The complexity about XML comes from the many additional languages and tools built on top of it.

It's not just that, is it? There are also attributes versus child elements, dealing with white space including the xml:space attribute, namespaces, schemas, integration of external document fragments with xinclude:include or &extern;. Each of these is a huge can of worms in its own right. There are probably more that I'm not even aware of right now.

A few years ago, I wrote a fully functional parser for JSON that is easy to verify for correctness and that isn't just lying around somewhere as a toy, but is actually used (by me) in various projects time and again. Overall, building this parser was almost trivial. With XML, I'm not even sure I would be able to write a correct and complete parser.

But I agree with you that XML-based languages and XML tools make things even worse. I had to work with XML a lot over ten years ago. I still get annoyed when I think about XSLT, or dealing with schemas, or the challenge of finding usable tools that are reasonably compliant with standards.

You can only have a positive view of XML when you think of something like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <booklist>
      <book>
        <title>Example Book</title>
        <author>Max Mustermann</author>
        <year>2025</year>
      </book>
      <book>
        <title>Second Book</title>
        <author>Erika Musterfrau</author>
        <year>2026</year>
      </book>
    </booklist>

And at that level, I have (almost) no problem with XML. But as soon as things get more demanding and you really take the various aspects of XML's value proposition seriously, you enter a world of pain and despair. At least, that's how it was for me back then. Maybe I would see things differently today, but I'm not really interested in finding out.

g-b-r•2w ago

First, you're describing the parsing side, while the message I was replying to claimed that it can't be written by hand.

Anyhow, schemas, XInclude and even namespaces are what I was referring to as additional languages of tools.

In your application you use them if you want, they're not really part of XML.

Of course even a parser for plain XML is a lot more complex than one for JSON, but people usually use libraries for that...

In any case, in your application nothing prevents you from using a dumbed-down version of XML, without entities, white space handling, and even only looking at elements and attributes; there were some applications that did that.

That already gives you a format that's easier to read and write manually than json.

I had more to say about "attributes versus child elements", but it's taking me too much time, I'll probably do that tomorrow.

ahf8Aithaex7Nai•2w ago

I think I understand your point. I only brought parsing into play to illustrate that XML is complicated, not because it's my general focus. I wouldn't classify namespaces, etc. as additional languages and tools, but that's beside the point.

> in your application nothing prevents you from using a dumbed-down version of XML

That's right. And if XML were exactly that, then there wouldn't be so many people frustrated with it. Unfortunately, in a professional work context, you don't always have control over whether it stays within this manageable subset. Sometimes the less pleasant aspects simply come into play, and then you have to deal with the whole complicated mess.

froh•2w ago

there were tools that derive the schema from sample data

and relaxng is a human friendly schema syntax that has transformers from and to xsd.

g947o•2w ago

Is there anything new on this topic that has never been said before in 1000 other articles posted here?

I didn't see any.

rerdavies•2w ago

What's new is that they WANT to revert to the horror of XML. :-P

mkozlows•2w ago

This is performance art, right? The very first bullet point it starts with is extolling the merits of XSD. Even back in the day when XML was huge, XSD was widely recognized as a monstrosity and a boondoggle -- the real XMLheads were trying to make RELAX NG happen, but XSD got jammed through because it was needed for all those monstrous WS-* specs.

XML did some good things for its day, but no, we abandoned it for very good reasons.

froh•2w ago

xslt was a stripped down dsssl in xml syntax.

dsssl was the scheme based domain specific "document style semantics and and specification language"

the syntax change was in the era of general lisp syntax bashing.

but to xml syntax? really? that was so surreal to me.

WorldMaker•2w ago

Also as someone else pointed out the same complaints that JSON Schema "isn't in the standard, it's a separate standard" apply to XSD. It is still a different standard even though during the height of XML mania it sometimes seemed like XSD was inseperable. XML did have DTD baked in, and maybe the author meant DTD in that section, but that was even worse than XSD (and again both were why RELAX NG happened).

PantaloonFlames•4d ago

XSD was (is) not so easy to adopt, but I don't agree that it's a monstrosity.

Schema are complicated. XSD is a response to that reality.

The XML ecosystem is messy. But people don't need to adopt everything. Ignore Relax-NG, ignore DTD, use namespaces sparingly, adopt conventions around NOT using attributes. It generally works quite well.

It's a challenge to get comfortable with XSD but once that happens, it's not a monstrosity. Similarly, XSLT. It requires a different way of thinking, and once you get that, you're productive.

in_a_society•2w ago

Smells like an article from someone that didn’t really USE the XML ecosystem.

First, there is modeling ambiguity, too many ways to represent the same data structure. Which means you can’t parse into native structs but instead into a heavy DOM object and it sucks to interact with it.

Then, schemas sound great, until you run into DTD, XSD, and RelaxNG. Relax only exists because XSD is pretty much incomprehensible.

Then let’s talk about entity escaping and CDATA. And how you break entire parsers because CDATA is a separate incantation on the DOM.

And in practice, XML is always over engineered. It’s the AbstractFactoryProxyBuilder of data formats. SOAP and WSDL are great examples of this, vs looking at a JSON response and simply understanding what it is.

I worked with XML and all the tooling around it for a long time. Zero interest in going back. It’s not the angle brackets or the serialization efficiency. It’s all of the above brain damage.

mkozlows•2w ago

The part where it favorably mentioned namespaces also blew my mind. Namespaces were a constant pain point!

riffraff•2w ago

Namespaces are a cool idea that didn't really seem to pan out in practice.

masklinn•2w ago

Namespaces were fun! But mostly used for over engineering formats and interacted with by idiots who do not give a toss. Shout out to every service that would break as soon as elementtree got involved. And my idiot colleagues who work on EDI.

pjmlp•2w ago

Nope, they were great.

Our AOLServer like clone in 2000 used them to great effect in our widget component library.

VerifiedReports•2w ago

I took an XML class as it neared its heyday, and even the teacher was rolling his eyes at the inclusion of namespaces.

Amateur hour.

aezart•2w ago

We use Mulesoft where I work, and XML namespaces are a constant issue. We never managed to define an API spec in such a way that the RAML compiler and the APIKit validator would both accept the same payload. In the end we just had to turn off validations in APIkit.

Mikhail_Edoshin•1w ago

Namespaces give you human readable GUIDs as element names. This is important. I agree their implementation and integration is a bit inconvenient.

bornfreddy•2w ago

You managed to convey my thoughts exactly, and you only used term "SOAP" once. Kudos!

SOAP was terrible everywhere, not just in Nigeria as OP insinuates. And while the idea of XML sounds good, the tools that developed on top of it were mostly atrocious. Good riddance.

locknitpicker•2w ago

> I worked with XML and all the tooling around it for a long time. Zero interest in going back. It’s not the angle brackets or the serialization efficiency. It’s all of the above brain damage.

I remember a decade ago seeing job ads that explicitly requested XML skills. The fact that being able to do something with XML was considered a full time job requiring a specialist says everything there is to be said about XML.

g-b-r•2w ago

They probably didn't mean "doing something with XML", but knowing a lot of its complex ecosystem

nine_k•2w ago

XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea. Entities were a so-so idea, which became unapologetically terrible when URLs and file references were allowed. CDATA was an interesting idea but an error-prone one, and likely it just did not belong.

OTOH namespaces, XSD, XSLT were great, modulo the noisy tags. XSLT was the first purely functional language that enjoyed mass adoption in the industry. (It was also homoiconic, like Lisp, amenable to metaprogramming.) Namespaces were a lifesaver when multiple XML documents from different sources had to be combined. XPath was also quite nice for querying.

XML is noisy because of the closing tags, but it also guarantees a level of integrity, and LZ-type compressors, even gzip, are excellent at compacting repeated strings.

Importantly, XML is a relatively human-friendly format. It has comments, requires no quoting, no commas between list items, etc.

Complexity killed XML. JSON was stupid simple, and thus contained far fewer footguns, which was a very welcome change. It was devised as a serialization format, a bit human-hostile, but mapped ideally to bag-of-named-values structures found in basically any modern language.

Now we see XML tools adopted to JSON: JSONSchema, JSONPath, etc. JSON5 (as used in e.g. VSCode) allows for comments, trailing commas and other creature comforts. With tools like that, and dovetailing tools like Pydantic, XML lost any practical edge over JSON it might ever have.

What's missing is a widespread replacement for XSLT. Could be a fun project.

sparqlittlestar•2w ago

XSLT 3.0 does JSON

https://www.w3.org/TR/xslt-30/#json

sam_lowry_•2w ago

XSLT ended at 1.1 for me. Everything that was "designed-by-committee" later was subverted to serve the bottom line of Michael Kay enterprises, although I hesitate to attribute to malice the grueling incompetence of the working group at the time.

wombatpm•2w ago

Don’t forget the whole DOM vs SAX processing mess. Big documents would routinely kill parsers by running out of Memory.

XSLT was cool. Too bad XSL and Apache-FOP never took off.

jjkaczor•2w ago

If-I-Recall-Correctly, it was typically a 10x memory load to open an XML file in a DOM parser. Which could get really ugly, really fast when you were dealing with many files.

downsplat•2w ago

It still works well in the appropriate settings. LibreOffice (nee OpenOffice) uses ODF, an XML format, for its document files, and it has been working nicely enough for a long time.

nine_k•2w ago

MS Office's own XSLX and DOCX formats are trees of XML files, zipped.

smitty1e•2w ago

> Complexity killed XML. JSON was stupid simple

I say "the ditt-ka-pow" for The Dumbest Thing That Could Possibly Work (DTTCPW).

debugnik•2w ago

> and it brought from it a bunch of things that are useless outside a markup language

It is a markup language. The mistake was trying to use it for anything else.

johnthescott•2w ago

amen

panick21_•2w ago

I really like Clojure EDN. Its very simply, but adds just enough on-top that make a difference. Namespaces, a few more types and a way to add costume stuff in a reasonable standard way.

downsplat•2w ago

> XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea.

That's exactly what I wanted to say. The author talks as if XML was well designed to represent structured data, but it was not, it grew out of the idea of marking up text, which is a completely different problem. The hilarious part is that he doesn't recognize the problem when he gives his example of "or with attributes".

The other thing, is that the JSON model doesn't just give you a free parser/serializer in JavaScript. It actually maps to the basic data model of the entire generation of dynamic languages that the Web grew on: perl, Python, JS, PHP and Ruby. Arrays and maps are the basic way to represent structured data in these languages, and JSON just serializes that. Which means that getting data in and out of your language is just a single line.

The author seems to think that XML maps a proper conceptual model and JSON doesn't, but the model of "nodes with attributes and content" is a worse match for structured data than JSON's model of "arrays and maps of values".

Other than that, it's really a question of how much tooling you want to use. Both JSON and XML grew entire ecosystems of it, and nowadays if you want to read your JSON according to a schema into typed objects, you can, and for any good-sized project, you probably should.

Also: > There are cases where other formats are appropriate: small data transfers between cooperating services and scenarios where schema validation would be overkill.

That's actually most of the cases for your average web dev!

inkyoto•2w ago

> XML grew from SGML […]

… as an effort to simplify SGML which was deemed to be too complex.

Oh, the irony.

riwsky•2w ago

> What's missing is a widespread replacement for XSLT

jq says hello!

somat•2w ago

> OTOH namespaces, XSD, XSLT were great

I don't know, the few times I have had to XML, I went "This is not so bad, I don't know what all the fuss is about" until I hit namespaces. I don't know if I was just using an inferior library but namespaces sucked. The minute namespaces came into the picture all the joy left the project. And XSLT... I only ever did one thing with it "use the browser to turn demarc XML records into a webpage" and that was pretty cool. but it also firmly convinced me that XML is very much the wrong form factor for a programing language.

My personal thought is that css is not a sgml-like as a sort of rebellion against the way XML was taking over the world. It feels like author had written one too many XSLT's and said "Nope, it ends here, we are not doing that again." Because really, it is very weird that css does not use an XML syntax.

On the topic of the wrong form factor for a programing language. Another good contender is ansible when you try to use it's YAML looping constructs.

cxr•2w ago

CSS predates XML.

wvenable•2w ago

I read the article and my first thought was it was entirely missing the complexity of XML. It started out relatively simple and easy to understand and most people/programs wrote simple XML that looked a lot like HTML still does.

But it didn't take long before XML might well be a binary format for all it matters to us humans looking at it, parsing it, dealing with it.

JSON came along and it's simplicity was baked in. Anyone can argue it's not a great format but it forcefully maintains the simplicity that XML lost quite quickly.

mickael-kerjean•2w ago

> It started out relatively simple and easy to understand ....

when the specs for a data representation format evolved with XML bombs abilities, it has gone too far in trying to please everyone, and that is probably why JSON won in the long run, it's not perfect but stable and simple without crazy issues you have to worry about when parsing it. If XML had a Torvaldish kind of dictator who can afford to say no, I doubt JSON would have won

tolciho•2w ago

And of course XML libraries haven't had any security issues (oh look CVE-2025-49796) and certainly would not need to make random network requests for a DTD of "reasonable" complexity. I also dropped XML, and that's after having a website that used XML, XSLT rendering to different output forms, etc. There were discussions at the time (early to mid 2000s) of moving all the config files on unix over to XML. Various softwares probably have the scars of that era and therefore an XML dependency and is that an embiggened attack surface? Also namespaces are super annoying, pretty sure I documented the ughsauce necessary to deal with them somewhere. Thankfully, crickets serenade the faint cries of "Bueller".

The contrast with only JSON is far too simplistic; XML got dropped from places where JSON is uninvolved, like why use a relational database when you can have an XML database??? Or those config files on unix are for the most part still not-XML and not-JSON. Or there's various flavors of markdown which do not give you the semi-mythical semantic web but can be banged out easily enough in vi or whatever and don't require schemas and validation or libraries with far too many security problems and I wouldn't write my documentation (these days) using S-expressions anyhow.

This being said there probably are places where something that validates strictly is optimal, maybe financial transactions (EDIFACT and XML are different hells, I guess), at least until some cheeky git points out that data can be leaked by encoding with tabs and spaces between the elements. Hopefully your fancy and expensive XML security layer normalizes or removes that whitespace?

ivan_gammel•2w ago

>First, there is modeling ambiguity, too many ways to represent the same data structure. Which means you can’t parse into native structs but instead into a heavy DOM object and it sucks to interact with it.

I don’t get this argument. There exist streaming APIs with convenient mapping. Yes, there can exist schemas with weird structure, but in practice they are uncommon. I have seen a lot of integration formats in XML, never had the need to parse to DOM first.

pjmlp•2w ago

I used it, and agree 100% with the author.

Hence why in 2026, I still hang around programming stacks, like Java and .NET, where XML tooling is great, instead of having to fight with YAML format errors, Norway error, or JSON without basic stuff like comments.

sam_lowry_•2w ago

Indeed. XML should be compared with YAML, not JSON.

While they equal each other in complexity, YAML does not even have namespaces )

downsplat•2w ago

I guess one thing we can agree with the author is that YAML is technically a piece of crap.

EdwardDiego•2w ago

> YAML format errors

This is why I hate (HATE. LET ME TELL YOU HOW MUCH I'VE COME TO HATE YAML SINCE I BEGAN TO WORK WITH K8S) working with Helm charts. As an example from the Helm docs...

https://helm.sh/docs/chart_template_guide/yaml_techniques#in...

> Note how we do the indentation above: indent 2 tells the template engine to indent every line in "myfile.txt" with two spaces. Note that we do not indent that template line. That's because if we did, the file content of the first line would be indented twice.

So you end up with YAML that looks weird, and heaven help you if you refactor and now have to adjust all the `indent N` functions to a new value of N.

That said, Helm's approach of "YAML, but with Go templating" is the main source of my hatred - why they didn't take the "It's a tree, and this child node is designated to be replaced" approach is something that's always baffled me.

imtringued•1w ago

YAML is meant to be written by humans. If some lunatic insists on some expression evaluation system, then it should be ${expression} where the YAML file is parsed as is without a template, and the software that reads the YAML file interprets the expression.

If you hate YAML instead of Helm for their insane choices, then enjoy barking at the wrong tree for the rest of your life.

I've been using application.yml files for 9 years without experiencing a single YAML related issue and I see no reason to switch to any other format.

https://www.baeldung.com/spring-boot-yaml-vs-properties

EdwardDiego•1w ago

Did you miss this bit in my comment?

> That said, Helm's approach of "YAML, but with Go templating" is the main source of my hatred

pkphilip•2w ago

For most data that is structured in JSON now, you could have easily done the same in XML using a simple text editor.

I agree with the author that XML is very similar to S expressions but with the brackets replaced by closing tags.

Parsing XML wasn't complex either. There have been many good libraries for it in pretty much most languages

jklowden•2w ago

If only there was one good library. libxml2 is the leading one, and it has been beleaguered by problems internal and external. It has had ABI instability and been besieged by CVE reports.

I agree it shouldn’t be hard. On the evidence, though, it is. I suspect the root problem is lack of tools. Lex and yacc tools for Unicode are relatively scarce. At least that’s what’s set me back from rolling my own.

lenkite•2w ago

What is wrong with xml-rs ?

badgersnake•2w ago

I had great experiences with XSD as a contract in systems integration scenarios, particularly with big systems integrators. It's pretty clear whose fault it is when somebodys XML doesn't validate.

inkyoto•2w ago

The issue is that XSD came along much later, and its use did not become binding in XML validation scenarios, hence partial success, even when the XSD-based validation tooling was available at the time.

XSD provides a clean abstraction for the technical validation that sits separately from the application / business / processing layers and dramatically increases the chances of a «clean» request reaching the aforementioned layers without having to roll multiple defensive checks in there.

Granted, an XSD can become complex very quickly, especially if indulged in too much, but it does not have to be.

jancsika•2w ago

> First, there is modeling ambiguity, too many ways to represent the same data structure.

Boy, are you telling me!

Boy are you a person, one of whose attributes is telling me!

Boy are you a person whose telling-me attribute is set to true!

Boy-who-is-telling-me, this space is left intentionally blank!

Out of all the key value pairs, you are the boy key and your adjacent sibling string type value is "Telling Me!"

Edit: fixed a CVE

codeduck•2w ago

This is both painfully hilarious and hilariously painful. It might even be hilarious, but my JVM ran out of memory while trying to build the DOM model.

VerifiedReports•2w ago

He also mentions namespaces as a plus, so credibility is pretty low.

unscaled•2w ago

> JSON has no such mechanism built into the format. Yes, JSON Schema exists, but it is an afterthought, a third-party addition that never achieved universal adoption.

This really seems like it's written by someone who _did not_ use XML back in the day. XSD is no more built-in than JSON Schema is. XSD was first-party (it was promoted by W3C), but it was never a "built-in" component of XML, and there were alternative schema formats. You can perfectly write XML without XSD and back in the heyday of XML in the 2000s, most XML documents did not have XSD.

Nowadays most of the remaining XML usages in production rely heavily on XSD, but that's a bit of a survivorship bias. The projects that used ad-hoc XML as configuration files, simple document files or as an interchange format either died out, converted to another format or eventually adopted XSD. Since almost no new projects are choosing XML nowadays, you don't get an influx of new projects that skip the schema part to ship faster, like you get with JSON. When new developers encounter XML, they are generally interacting with long-established systems that have XSD schemas.

This situation is purely incidental. If you want to get the same result with JSON, you can just use JSON Schema. But if we somehow magically convince everybody on the planet to ditch JSON and return to XML (please not), we'll get the same situation we have had with JSON, only worse. We'll just get to wear we've been in the early 2000s, and no, this wasn't good.

hinkley•2w ago

See, I was just gonna say, “what art?” but you put far more wood behind that arrowhead.

- XML-DSIG survivor.

imtringued•1w ago

SOAP is the biggest mindfuck I've ever interacted with. You have a complex and verbose XML based format where you repeat the tag name, but then you realize that the implementations don't care what the parameters are named and just read them in a fixed order. Coming from JSON based APIs this took me extremely long to realize.

kenforthewin•2w ago

> This is insanity masquerading as pragmatism.

> This is not engineering. This is fashion masquerading as technical judgment.

The boring explanation is that AI wrote this. The more interesting theory is that folks are beginning to adopt the writing quirks of AI en masse.

Tiberium•2w ago

At least I'm not the only one who noticed. It's genuinely weird and unsettling how such AI-written blog posts nowadays get to the top of HN easily.

tliltocatl•2w ago

I feel more like AI have adopted some preexisting disagreeable writing styles from the beginning and now we associate these with AI.

kitku•2w ago

The way I like to phrase this sentiment is "This guy is the training data."

dqv•2w ago

Yeah, this is why I have transitioned from "this seems like it was written with AI" to "this is full of clichés." Maybe it was only written by a human or maybe it was written entirely with AI or somewhere in between, but in any case, clichés make it tiresome to read.

kennethallen•2w ago

The fundamental reason JSON won over XML is that JSON maps exactly to universal data structures (lists and string-keyed maps) and XML does not.

tilt_error•2w ago

This article/blog post [1] has been on HN several times before, but it is well worth a reminder.

[1] https://seriot.ch/software/parsing_json.html

lighthouse1212•2w ago

XML was designed for documents; JSON for data structures. The 'lost art' framing implies we forgot something valuable, but what actually happened is we stopped using a document format for data serialization. That's not forgetting - that's learning. XML is still the right choice for its original domain (markup, documents with mixed content). It was never the right choice for API payloads and config files.

locknitpicker•2w ago

> XML was designed for documents; JSON for data structures.

JSON wasn't even designed for anything. It's literally the necessary and sufficient part of JavaScript that you could pass to an eval() to get a data structure out. It required zero tooling and even third-party module to hit the ground running.

wvenable•2w ago

As Douglas Crockford says: JSON was discovered, not invented.

wvenable•2w ago

I think XML for documents lost to markdown.

Between markdown and HTML, there is no need for XML in that domain anymore either.

benrutter•2w ago

XML is still the implementation tool for Microsoft Office and Open Office docs. I wouldn't hold those up as the gold standard or anything, but it's hard to see how Markdown could capture everything that XML does for, say, powerpoint or excel.

wvenable•2w ago

> XML is still the implementation tool for Microsoft Office and Open Office docs.

It is and that is a good thing. I can't tell you the number of times that an application storing it's data in XML has made it possible for me to do things that would otherwise be impossible.

But nobody authors these documents in XML. It's just an application storage format. It could just as easily be Sqlite.

small_scombrus•2w ago

Unfortunately Word documents are XML. Microsoft has a LOT of customisation going on, but at the core it's very ugly, incredibly complex xml :(

Source: https://learn.microsoft.com/en-us/office/open-xml/word/worki...

ablob•2w ago

There's also HTML, LaTeX and Typst for documents. I don't think that there is a clear winner here.

Const-me•2w ago

> It was never the right choice for API payloads and config files

Partially agree about API payloads; when I design my APIs I typically use binary formats.

However, IME XML is actually great for config files.

Comments are crucial for config files. Once the complexity of the config grows, a hierarchy of nested nodes becomes handy, two fixed levels of hierarchy found in old Windows ini files, and modern Linux config files, is less than ideal, too many sections. Attributes make documents easier to work with due to better use of horizontal screen space: auto-formatted JSON only has single key=value per line, XML with attributes have multiple which reduces vertical scrolling.

com2kid•2w ago

I remember spending hours just trying to properly define the XML schema I wanted to use.

Then if there were any problems in my XML, trying to decipher horrible errors determining what I did wrong.

The docs sucked and where "enterprise grade", the examples sucked (either too complicated or too simple), and the tooling sucked.

I suspect it would be fine now days with LLMs to help, but back when it existed, XML was a huge hassle.

I once worked on a robotics project where a full 50% of the CPU was used for XML serialization and parsing. Made it hard to actually have the robot do anything. XML is violently wordy and parsing strings is expensive.

badgersnake•2w ago

There are a lot of good arguments against the XML ecosystem, but "I'm too lazy or dumb to understand it" is not one of them.

com2kid•2w ago

It is called DevUx and it is certainly a thing.

If the tooling sucks and the entire ecosystem is hard to understand, people won't adopt a technology.

XML was forced down everyone's throat for a decade! The second something else came along literally everyone who could jumped ship.

rcbdev•2w ago

Interestingly, I've never heard the term 'DevUx' before. I suspect it's the same concept as Developer Experience, which I also find supremely important and historically underappreciated. Companies like JetBrains for example make a killing by being a company that really takes this aspect seriously.

On the other hand I've had a fellow developer laugh at me when trying to explain how this is important, so I'm unsure this is as important to others as it is to me.

com2kid•2w ago

Yeah devux is just short for developer experience.

The apple app store had an amazing initial devux, vs the blackberry app store which famously was a huge pain just to apply to and all the tooling was horrible.

acabal•2w ago

XML lost because 1) the existence of attributes means a document cannot be automatically mapped to a basic language data structure like an array of strings, and 2) namespaces are an unmitigated hell to work with. Even just declaring a default namespace and doing nothing else immediately makes your day 10x harder.

These items make XML deeply tedious and annoying to ingest and manipulate. Plus, some major XML libraries, like lxml in Python, are extremely unintuitive in their implementation of DOM structures and manipulation. If ingesting and manipulating your markup language feels like an endless trudge through a fiery wasteland then don't be surprised when a simpler, more ergonomic alternative wins, even if its feature set is strictly inferior. And that's exactly what happened.

I say this having spent the last 10 years struggling with lxml specifically, and my entire 25 year career dealing with XML in some shape or form. I still routinely throw up my hands in frustration when having to use Python tooling to do what feels like what should be even the most basic XML task.

Though xpath is nice.

matkoniecz•2w ago

> even if its feature set is strictly inferior

and often having less bizarre and overly complex features is a feature by itself

small_scombrus•2w ago

Base JSON not supporting comments is a sometimes annoying 'feature' because without it no-one can use the comments to try and add extra functionality into their JSON file using comment tags so you don't end up with a million JSON+ custom formats.

masklinn•2w ago

> Plus, some major XML libraries, like lxml in Python, are extremely unintuitive in their implementation of DOM structures and manipulation.

Lxml, or more specifically its inspiration ElementTree is specifically not a (W3C) DOM or dom-style API. It was designed for what it called “data-style” XML documents where elements would hold either text or sub-elements but not both, which is why mixed-content interactions are a chore (lxml augments the API by adding more traversal axis but elementtree does not even have that, it’s a literal tree of elements). effbot.org used to have a page explaining its simplified infoset before Fredrik passed and registration lapsed, it can be accessed through archive.org.

That means lxml is, by design, not the right tool to interact with mixed-content documents. But of course the issue is there isn’t really a right tool for that, as to my knowledge nobody has bothered building a fast DOM-style library for Python.

If you approach lxml as what ElementTree was designed as it’s very intuitive: an element is a sequence of sub-elements, with a mapping of attributes. It’s a very straightforward model and works great for data documents, as well as fits great within the langage. But of course that breaks down for mixed content documents as your text nodes get relegated to `tail` attributes (and ElementTree straight up discards comments and PIs, though lxml reverted that).

culebron21•2w ago

XML was a product of its time, when after almost 20 years of CPUs rapidly getting quicker, we contemplated that the size of data wouldn't matter, and data types won't matter (hence XML doesn't have them, but after that JSON got them back) -- we expected languages with weak type systems to dominate forever, and that we would be working and thinking levels above all this, abstractly, and so on.

I remember XML proponents back then argued that it allows semantics -- although, it was never clear how a non-human would understand it and process.

The funny thing about namespaces is that the prefix, by the XML docs, should be meaningless -- instead you should look at the URL of the namespace. It's like if we read a doc with snake:front-left-paw, and ask how come does a snake have paws? -- Because it's actually a bear -- see the definition of snake in the URL! It feels like mathematical concepts -- coordinate spaces, numeric spaces with different number 1 and base space vectors -- applied to HTML. It may be useful in rare cases. But few can wrap their heads around it, and right from the start, most tools worked only with exactly named prefixes, and everyone had to follow this way.

g-b-r•2w ago

> right from the start, most tools worked only with exactly named prefixes, and everyone had to follow this way

What tools? Namespaces being defined by their urls is sure not the reason XML is complex, and the tools I remember running into supported it well

culebron21•2w ago

Ok, I remember people complaining of this, so I have got it wrong.

Mikhail_Edoshin•2w ago

Semantic in machine processing is actually very simple: if a machine has an instruction to process an element and we know what it does, then the element is semantic.

So, for example, <b> and <i> have perfect semantic, while <article> not so much. What does the browser do with an <article>? Or maybe it is there for an indexing engine? I myself have no idea (nor that I investigated that, I admit).

But all that was misunderstood, very much like XML itself.

zzo38computer•2w ago

The <article> command in HTML can be useful, even if most implementations do not do much with it. For example, a browser could offer the possibility to print or display only the contents of a single <article> block, or to display marks in the scrollbar for which positions in the scrollbar correspond to the contents of the <article> block. It would also be true of <time>; although many implementations do not do much with it, they could do stuff with it. And, also of <h1>, <h2>, etc; although browsers have built-in styles for them, allowing the end user to customize them is helpful, and so is the possibility of using them to automatically display the table of contents in a separate menu. None of these behaviours should need to be standardized; they can be by the implementation and by the end user configuration etc; only the meaning of the commands will be standardized, not their behaviour.

Mikhail_Edoshin•2w ago

"Meaning" has a rather vague meaning, but behavior is exact. If I know the behavior, it becomes a tool I can employ. If I only know supposed behavior, I cannot really use that. E.g. why we have so much SEO slop and so little "semantic" HTML? Because the behavior of search engines is real and thus usable, even when it is not documented much.

zzo38computer•2w ago

> data types won't matter (hence XML doesn't have them, but after that JSON got them back)

JSON does not have very much or very good data types either, but (unlike XML) at least JSON has data types. ASN.1 has more data types (although standard ASN.1 lacks one data type that JSON has (key/value list), ASN.1X includes it), and if DER or another BER-related format is used then all types use the same framing, unlike JSON. One thing JSON lacks is octet string type, so instead you must use hex or base64, and must be converted after it has been read rather than during reading because it is not a proper binary data type.

> The funny thing about namespaces is that the prefix, by the XML docs, should be meaningless -- instead you should look at the URL of the namespace. It's like if we read a doc with snake:front-left-paw, and ask how come does a snake have paws? -- Because it's actually a bear -- see the definition of snake in the URL!

This is true of any format that you can import with your own names though, and since the names might otherwise conflict, it can also be necessary. This issue is not only XML (and JSON does not have namespaces at all, although some application formats that use it try to add them in some ways).

stmw•2w ago

There were efforts to make XML 1. more ergonomic and 2. more performant, and while (2) was largely successful, (1) never got there, unfortunately - but seem https://github.com/yaml/sml-dev-archive for some history of just one of the discussions (sml-dev mailing list).

edbaskerville•2w ago

Worse is better. Because better, it turns out, is often much, much worse.

cgio•2w ago

Not convincing. I was hoping it would go down the xslt path, which is a lost art. I despised and loved xslt at the same time, and there’s no question it was an artful enterprise using it.

masklinn•2w ago

XSLT I see as a tragedy. The match / patch processing model is so elegant, but the programming langage built around it is such a disaster (the XML, various langage semantics e.g. the implicit context, the gimped semantics, and the development environment or lack thereof).

I think a simplified Haskell-ish script host (à la Elm) with a smattering of debugging capabilities would have been amazing.

Mikhail_Edoshin•2w ago

I like XML and I use it for myself daily. E.g. all documentation is XML; it is just the perfect tool for the task. Most comments that denigrate XML are very superficial. But I disagree with this article too.

My main point is that the very purpose of XML is not to transfer data between machines. XML use case is to transfer data between humans and machines.

Look at the schemas. They are all grammatical. DTD is a textbook grammar. Each term has a unique definition. XSD is much more powerful: here a term may change definition depending on the context: 'name' in 'human/name' may be defined differently than 'name' in 'pet/name' or 'ship/name'. But within a single context the definition stays. As far as I know Relax NG is even more powerful and can express even finer distinctions, but I don't know it too well to elaborate.

Machines do not need all that to talk to each other. It is pure overhead. A perfect form to exchange data between machines is a dump of a relational structure in whatever format is convenient, with pretty straightforward metadata about types. But humans cannot author data in the relational form; anything more complex than a toy example will drive a human crazy. Yet humans can produce grammatical sequences in spades. To make it useful for a machine that grammatical drive needs only a formal definition and XML gives you exactly that.

So the use case for XML is to make NOTATIONS. Formal in the sense they will be processed by a machine, but otherwise they can be pretty informal, that is have no DTD or XSD. It is actually a power of XML that I can just start writing it and invent a notation as I go. Later I may want to add formal validation to it, but it is totally optional and manifests as a need only when the notation matures and needs to turn into a product.

What makes one XML a notation and another not a notation? Notations are about forming phrases. For example:

    <func name="open">
      <data type="int"/>
      <args>
        <addr mode="c">
          <data type="char"/>
        </addr>
        <data type="int"/>
        <varg/>
      </args>
    </func>

This is a description of a C function, 'open'. Of course, a conventional description is much more compact:

    int open(char const*, int, ...)

But let's ignore the verbosity for a moment and stay with XML a bit longer. What is grammatical about this form? 'func' has '@name' and contains 'data' and 'args'. 'data' is result type, 'args' are parameters. Either or both can be omitted, resulting in what C calls "void". Either can be 'data' or 'addr'. 'data' is final and has '@type'; addr may be final (point to unknown, 'void') or non-final and may point to 'data', 'func' or another 'addr', as deep as necessary. 'addr' has '@mode' that is a combination of 'c', 'v', 'r' to indicate 'const', 'volatile', 'restrict'. Last child of 'args' may be 'varg', indicating variable parameters.

Do you see that these terms are used as words in a mechanically composed phrase? Change a word; omit a word; link words into a tree-like structure? This is the natural form of XML: the result is phrase-like, not data-like. It can, of course, be data-like when necessary but this is not using the strong side of XML. The power of XML comes when items start to interact with each other, like commands in Vim. Another example:

    <aaaa>
      <bbbb/>
    </aaaa>

This would be some data. Now assume I want to describe changes to that data:

    <aaaa>
      <drop>
        <bbbb/>
      </drop>
      <make>
        <cccc/>
      </make>
    </aaaa>

See those 'make' and 'drop'? Is it clear that they can enclose arbitrary parts of the tree? Again, what we do is that we write a phrase: we add a modifier, 'make' or 'drop' and the contents inside it get a different meaning.

This only makes sense if XML is composed by hand. For machine-to-machine exchange all this is pure overhead. It is about as convenient as if programs talked to each other via shell commands. It is much more convenient to load a library and use it programmatically than to compose a command-line call.

But all this verbosity? Yes, it is more verbose. This is a no-go for code you write 8 hours a day. But for code that you write occasionally it may be fine. E.g. a build script. An interface specification. A diagram. (It is also perfect for anything that has human-readable text, such as documentation. This use is fine even for a 8-hour workday.) And all these will be compatible. All XML dialects can be processed with the same tools, merged, reconciled, whatever. This is powerful. They require no parsing. Parsing may appear a solved problem, but to build a parser you still must at least describe the grammar for a parser generator and it is not that simple. And all that this description gives you is that the parser will take a short form and convert it into an AST, which is exactly what XML starts with. The rest of the processing is still up to you. With XML you can build the grammar bottom up and experiment with it. Wrote a lot of XML in some grammar and then found a better way? Well, write a script to transform the old XML into the new grammar and continue. The transformer is a part of the common toolset.

shmerl•2w ago

For machine to machine communication use Protobuf, not JSON.

JodieBenitez•2w ago

Another compile step, just what I needed.

shmerl•1w ago

Otherwise you'll pay with poor performance. Totally what your users needed.

JodieBenitez•1w ago

Nope, performance is fine, users are happy.

brunoborges•2w ago

XML and XSD were not meant to be edited by hand, by humans. They thrived when we used proper XML/XSD editing tools.

Although ironically there are less production-time human mistakes when editing an XML that is properly validated with a XSD than a YAML file, because Norway.

bni•2w ago

Developers (even web developers!) were familiar with XML for many years before JSON was invented.

Also "worse is better". Many developer still prefer to use something that is similar to notepad.exe, instead of actual tools that understand the formats on a deeper level.

Mikhail_Edoshin•2w ago

Another thing I disagree with is the idea that JSON uses fewer characters. This is not true: JSON uses more characters. Example:

    <aaaa bbbb="bbbb" cccc="cccc"/>
    {"bbbb":"bbbb","cccc":"cccc"}

See that the difference is only two characters? Yet XML also has a four-character element name, which JSON lacks. And JSON is packed to the limit, while XML is written naturally and is actually more readable than JSON.

coffeebeqn•2w ago

This is an extremely cherry picked example. One liner with only attributes?

benrutter•2w ago

I work in the UK energy sector and have been exposed to more than my fair share of bad, crufty APIs. I don't know the reason, but those returning XML are, practically speaking, much worse.

I've seen a bunch of times where an API returns invalid XML that has to be manipulated before parsing but never that for JSON.

I think that's the real sell for JSON. A lot of APIs are terrible, and JSON being simpler, terrible JSON beats terrible XML.

ajxs•2w ago

I'm just not convinced by this article. XSLT was a great technology in its time, but these days if you need to transform data into markup, modern templating engines are just way easier to use. I've said it before on HN: Being able to transform data into markup natively in the browser with a declarative language is still a neat idea. I enjoy thinking about an 'alternate future' where the web evolved in this direction instead.

apimade•2w ago

I spent the better half of my first professional decade writing RESTful abstractions over SOAP services and XML RPC monstrosities. I’ve done it for probably upwards of 2 or 300 systems (not interfaces, systems).

There’s one improvement XML had over JSON; and that’s comments.

The author laments about features and functionality that were largely broken, or implemented in a ways that countered their documentation. There were very few industries that actually wrote good interfaces and ensured documentation matched implementation, but they were nearly always electrical engineers who’d re-trained as software engineers through the early to late 90s.

Generally speaking namespaces were a frequent source of bugs and convoluted codepaths. Schemas, much like WSDL’s or docs, were largely unimplemented or ultimately dropped to allow for faster service changes. They’re from the bygone era of waterfall development, and they’re most definitely not coming back.

Then there’s the insane XML import functionality, or recursive parsing, which even today results in legacy systems being breached.

Then again, I said “author” at the start of this comment, but it’s probably disingenuous to call an LLM an author. This is 2026 equivalent of blogspam, but even HN seems to be falling for it these days.

The AI seems to also be missing one of the most important points; migration to smaller interfaces, more meaningful data models and services that were actually built to be used by engineers - not just a necessary deliverable as part of the original system implementation. API specs in the early 2000’s were a fucking mess of bloated, Rube-Goldbergesque interdependent specs, often ready to return validation errors with no meaningful explanation.

The implementation of XML was such a mess it spawned an an entire ecosystem of tooling to support it; SoapUI, parsers like Jackson and SAX (and later StAX), LINQ to XML, xmlstarlet, Jing, Saxon..

Was some of this hugely effective and useful? Yes. Was it mostly an unhinged level of abstraction, or a resulting implementation by engineers who themselves didn’t understand the overly complex features? The majority of the time.

zzo38computer•2w ago

Different formats are good for different purposes. XML does have some benefits (like described in there), as well as some problems; the same is true of JSON. They do not mention ASN.1, although it also has many benefits. Also, the different formats have different data types, different kind of structures, etc, as well.

XML only has text data (although other kinds can be represented, it isn't very good at doing so), and the structure is named blocks which can have named attributes and plain text inside; and is limited to a single character set (and many uses require this character set to be Unicode).

XML does not require a schema, although it can use one, which is a benefit, and like they say does work better than JSON schema. Some ASN.1 formats (such as DER) can also be used without a schema, although it can also use a schema.

My own nonstandard TER format (for ASN.1 data) does have comments, although the comments are discarded when being converted to DER.

Namespaces are another benefit in XML, that JSON does not have. ASN.1 has OIDs, which have some of this capability, although not as much as XML (although some of my enhancements to ASN.1 improve this a bit). However, there is a problem with using URIs as namespaces which is that the domain name might later be assigned to someone else (ASN.1 uses OIDs which avoids this problem).

My nonstandard ASN1_IDENTIFIED_DATA type allows a ASN.1X data file to declare its own schema, and also has other benefits in some circumstances. (Unlike XML and unlike standard ASN.1, you can declare that it conforms with multiple formats at once, you can declare conformance with something that requires parameters for this declaration, and you can add key/value pairs (identified by OIDs) which are independent of the data according to the format it is declared as.)

(I have other nonstandard types as well, such as a key/value list type (called ASN1_KEY_VALUE_LIST in my implmentation in C).)

XSLT is a benefit with XML as well, although it would also be possible to make a similar thing with other formats (for databases, there is SQL (and Tutorial D); there is not one for ASN.1 as far as I know but I had wanted such a thing, and I have some ideas about it).

The format XML is also messy and complicated (and so is YAML), compared with JSON or DER (although there are many types in DER (and I added several more), the framing is consistent for all of them, and you do not have to use all of the types, and DER is a canonical form which avoids much of the messiness of BER; these things make it simpler than what it might seem to some people).

Any text format (XML, JSON, TER, YAML, etc) will need escaping to properly represent text; binary formats don't, although they have their own advantages and disadvantages as well. As mentioned in the article, there are some binary XML formats as well; it seems to say that EXI requires a schema (which is helpful if you have a schema, although there are sometimes reasons to use the format without a schema; this is also possible with ASN.1, e.g. PER requires a schema but DER does not).

Data of any format is not necessarily fully self-descriptive, because although some parts may be self-described, it cannot describe everything without the documentation. The schema also cannot describe everything (although different schema formats might have different capabilities, they never describe everything).

> When we discarded XML, we lost: ...

As I had mentioned, other formats are capable of this too

> What we gained: Native parsing in JavaScript

If they mean JSON, then, JSON was made from the syntax of JavaScript, although before JSON.parse was added into standard JavaScript they might have used eval and caused many kind of problems with that. Also, if you are using JavaScript then the data model is what JavaScript does, although that is a bit messy. Although JavaScript now has a integer type, it did not have at the time that JSON was made up, so JSON cannot use the integer type.

> I am tired of lobotomized formats like JSON being treated as the default, as the modern choice, as the obviously correct solution. They are none of these things.

I agree and I do not like JSON either, but usually XML is not good either. I would use ASN.1 (although some things do not need structured data at all, in which case ASN.1 is not necessary either).

(Also, XML, JSON, and ASN.1 are all often badly used; even if a format is better does not mean that the schema for the specific application will be good; it can also be badly designed, and in my experience it often is.)

g-b-r•2w ago

The core of the article is at the bottom:

> the various XML-based "standards" spawned by enterprise committees are monuments to over-engineering. But the core format (elements, attributes, schemas, namespaces) remains sound. We threw out the mechanism along with its abuses.

It's mostly only arguing for using the basic XML in place of the basic JSON.

I largely agree to that, although I wouldn't consider the schemas among its core, go read the Schema specifications and tell me when you come out.

But I agree that a good part of XML's downfall was due to its enterprise committees: no iteration, and few incentives to make things lean and their specifications simple; a lot of the companies designing them had an interest in making them hard to implement.

zombot•2w ago

Is XML Turing-complete yet? I need something to run Doom on.

zerkten•2w ago

This is a better article than other recent ones on XML vs JSON. "The S-Expression Connection" is something that resonates having been in the .NET space where Don Box was active and whole bunch of web services things (good and bad) overlapped.

Devasta•2w ago

Its really bizarre, you talk about Rust or TypeScript and everyone understands how doing a little bit extra planning up front yields to great results as everyone can work from solid foundations, but you suggest they do the same for your data by using XML and its wailing and gnashing of teeth, bringing up anecdotes about SOAP and DTDs like we're all still living in 2003, concatenating strings together for our XML and trying to find answers to problems on forums or on ExpertSexChange.

The vast, vast majority of devs today have never known anything except JSON for their React frontends, but honestly if they gave XML a try and weren't working from second hand horror stories from 20 years ago I think a lot more people would like it than you expect.

cbondurant•2w ago

> the mapping is direct ... > or with attributes

so it isn't direct? That's what you're saying. You're saying that there's two options for how to map any property of structured data. That's bad, you know that right? There's no reason to have two completely separate, incompatible ways of encoding your data. That's a good way to get parsing bugs. That's just a way to give a huge attack surface for adversarially generated serialized documents.

Also, self documentation is useless. A piece of data only makes sense within the context of the system it originates from. To understand that system, I need the documentation for the system as a whole anyway. If you can give me any real life situation where I might be handed a json/xml/csv/etc file without also being told what GENERATED that file, I might be willing to concede the point. But I sure can't think of any. If I'm writing code that deserializes some data, its because I know the format or protocol I'm interested in deserializing already. You cant write code that just ~magically knows~ how its internal representation of data maps to some other arbitrary format, just because both have a concept of a "person" and a concept of a "name" for that person.

The problem with tags in XML isn't that they are verbose its that putting the tag name in the closing tag makes XML a context-sensitive grammar which are NIGHTMARES to parse in comparison to context-free grammars.

Comments are only helpful when I'm directly looking at the serialized document. and again, that's only gonna happen when I'm writing the code to parse it which will only happen when I also have access to the documentation for the thing that generated it.

"tooling that can verify correctness before runtime" what do you even mean. Are you talking like, compile time deserialization? What serialized data needs to be verified before runtime? Parsing Is Validation, we know this, we have known this for YEARS. Having a separate parsing and validation step is the way you get parsing differential bugs within your deserialization pipeline.

bob1029•2w ago

I would encourage anyone who thinks that XML is strictly inferior to attempt integration with certain banking vendors without use of their official XSD/WSDL sources. I've generated service references that are in the tens of megabytes. This stuff is not bloat. There are genuinely this many types and properties in some business systems. There is no way you could hand code this and still get everything else done.

The entire point of heavy-handed XML is to 1:1 the type system across the wire. Once I generate my service references, it is as if the service is on my local machine. The productivity gains around having strongly typed proxies of the remote services are impossible to overstate. I can wire up entirely new operations without looking at the documentation most of the time. Intellisense surfaces everything I need automatically as I drill into the type system.

JSON can work and provide much of the same, but XML has already proven to work in some of the nastiest environments. It's not the friendliest or most convenient technology, but it is an extremely effective technology. I am very confident that the vendors I work with will continue to use XML/WCF/SOAP into 2030.

tomjen3•2w ago

Openapi can do that too. But the real benefit is that it forces a simplification of the interface. XML has too many outs for architectural astronauts. JSON has close to none.

vee-kay•2w ago

UAE Central Bank has launched its Digital transformation initiatives: AANI and Jaywan.

UAE's AEP (Al Etihad Payments) launched AANI (It is actually based on India's phenomally successfully "UPI" - its technology stack was licensed to UAE) as digital payments platform.

Jaywan is UAE's domestic cards scheme (in competition to Visa, MasterCard, etc.) (It is actually based on India's successfully RuPay technology stack, licensed to UAE).

And Jaywan uses XML for its files!

So these brand new banking initiatives in Middle East, use XML as the primary file format, because those Banks know that all the thousands of fields/columns in the CBS (Core Banking System) and upstream and downstream system, need a strict file format specification for file loading, processing, Reconciliations, Settlement, Disputes/Chargeback, etc.

unscaled•2w ago

You must have been very lucky. Every SOAP service I had the (dis)pleasure to integrate with was a wholly different nightmare-ish can of worms. Even when we get to the very binding of WSDL, there are way too many variations on SOAP: RPC-Encoded? RPC-Literal? Document-Literal? Wrapped Document-Literal?

The problem is part of the same myth many people (like the OP author) have about XML and SOAP: There was "One True Way™" from the beginning, XML schemas were always XSD, SOAP always required WSDL service definition and the style was always wrapped document-literal, with everything following WS-I profiles with the rest of the WS-* suite like WS-Security, WS-Trust, etc. Oh, and of course we don't care about having a secure spec and avoiding easy-to-spoof digital signatures and preventing XML bombs.

Banking systems are mature and I guess everybody already settled and standardized they way they use soap, so you don't have to get into all this mess (And security? Well, if most banks in the world were OK with mandatory maximum password lengths of 8 characters until recently, they probably never heard about XMLdDsig issues or the billion laughs attack).

But you know what also gives you auto-generated code that works perfectly without a hitch, with full schema validation? OpenAPI. Do you prefer RPC style? gRPC and Avro will give you RPC with 5% of the wire bloat that XML does. Message size does matter some times after all.

All of the things that you mentioned are not unique to XML and SOAP. Any well-specified system that combines an interchange format, a schema format, an RPC schema format and an RPC transport can do the achieve the same thing. Some standards had all of this settled from day one: I think Cap'n Proto, Avro and Thrift fit this description. Other systems like CORBA or Protocol Buffers missed some of the components or did not have a well-defined standard[1].

JSON is often criticized by XML-enthusiasts for not having a built-in schema, but his seems like selective amnesia (or maybe all of these bloggers are zoomers or younger millennials?). When XML was first released, there was nothing. Yes, you could cheat and use DTD[2]. But DTD was hard to use and most programmers eschewed writing XML schemas until XSD and Relax-NG came out. SOAP was also very basic (and lightweight!) when it first came out. XSD and WSDL quickly became the standard way to use SOAP, but it took at least a decade to standardize the WSDL binding style (or was it ever standardized)? Doing RPC in JSON now is still as messy as SOAP has been, but if you want RPC instead of REST, you wouldn't be going to JSON in the first place.

---

[1] IIRC, Protocol Buffers 2 had a rudimentary RPC system which never gained traction outside outside of Google and has been entirely replaced by gRPC after version 3 was released.

[2] DTD wasn't really designed for XML, but since XML was a subset of SGML, you could use the SGML DTD. But DTD wasn't a good fit for XML, and it was quickly replaced by XSD (and for a while - Relax-NG) for a reason.

sebazzz•2w ago

Even XBRL caved to JSON with XBRL-JSON. XBRL, of all standards.

_micheee•2w ago

We do XML processing, albeit with XQuery, as a small business.

It is a very niche solution but actually very stable and quite handy for all kinds of data handling; web-based applications and APIs as it nicely integrates with all kinds of text-based formats such as JSON, CSV or XML.

Yet I can easily comprehend how people get lost in all kinds of standards, meta-standards, DTDs, schemas, namespaces, and modeling the whole enterprise in SOAP.

However, you can do simple things simply and small, but in my experience, most tools promised to solve problems with ever-layered complexities.

Little disclaimer, I am probably biased, as I am with BaseX, an open-source XQuery processor :-)

cyocum•2w ago

I am a BaseX user and I really appreciate it! I actually do not mind XML at all. XQuery and BaseX makes searching large numbers of XML file or just one large XML file really easy.

hackrmn•2w ago

I am one of those people who will call out those patronisingly asserting something like "Oh, god, XML! So happy we could finally evolve past _that_".

And yeah, XML wasn't perfect -- people harping on it are literally flogging a dead horse. Had the horse been given a pasture, it would have recovered. Instead we have very tiny pig-horses like JSON and YAML and three dozen other "weekend project candidates to someone's claim to fame and CS history" which haven't got half of XML's _useful_ features -- like namespaces, being one.

YAML has anchors, which is a useful feature in itself -- so no, we don't just regress or reinvent the wheel, there's room for improving XML. The tragedy is throwing the baby with the bathwater, or so it seems to me that we have.

Giving XML largely the collective boot was the wrong decision. That's my firm opinion. Tools like XSLT haven't got an equal today -- these for better and for worse need XML in some capacity, and are much more extensible (no pun intended) than abominations like Jinja or what have you. XSLT was _designed_ while Jinja for one, appears to have been grown in a petri dish of sorts.

The hipster-like obsession with every new thing on the hill gave us HTML 5, with its weird context-sensitive parser rules where some tags can be closed, some must be closed and some must not be closed and so on. On top of it it mandates some forgiving behaviour on part of the parser, making best-effort assumptions that kind of get it to render the document but not the one you wanted -- add modern scripting and you are sitting there debugging subtly but by-design hidden errors -- instead of what was the case with XML that demanded you had the basic capacity to write the forward slash in the right places. But no, that was apparently too hard.

Also, really love the choice quotes in the article:

> They are the result of path dependence and fashion, not considered engineering judgment.

_Fashion_ is the word that comes to my mind every time I have to hear people half my age try to sell me JSON or YAML. Like, what basis do you have to argue on bare mention of something you haven't even worked on, just essentially repeating the person on your left? That's _cargo-cult programming_ again. The fact that mention of XML often draws use of that very term, "old-fashioned", speaks enough of the level of the conversation here -- we're apparently occupied by _fashion_ in choices of systems that by and large do the same thing their predecessors have done since the 60's.

> We value familiarity over rigor. We value the appearance of simplicity over actual simplicity, which is the simplicity that comes from clear rules and consistent structure.

Just the cherry on the cake, frankly. The entire "The Final Point" section really nails it for my part. I spend considerable amount of time at work trying to hammer into rookies essentially the equivalent of:

> Formality in data representation prevents entire classes of errors.

But it would appear history repeats itself as every generation has to learn too late the same mistakes that someone in the previous generation could have written a large book about, as a _warning_. Just the other day, for example, one of my let's say less rigorous colleagues said outright that "`null` is great in a [programming] language" (the exact wording was something along of "I love null!"), following up with the dubious clarification that this also includes SQL. I am not sure they even comprehend the size of the hole such statement makes.

imtringued•1w ago

JSON and YAML are almost as old as XML (20+ years) so saying "fashion" as a counter argument just makes you look out of touch. XML was fashionable when it came out and arguably, it relied more on its fashion status than JSON or YAML.

heliumtera•2w ago

We are passed that. Absolutely. Thank god. Sucks to be wrong, I guess. But we are definitely passed the point of XML relevance.

bytefish•2w ago

What I miss the most about the XML ecosystem is the tooling. And I think, this is what most people are sentimental about. There was a time it was so easy to generate contracts using XSDs and it made it easy to validate the data. OpenAPI slowly reaches parity to what I worked with in 2006.

But what I do not miss is the over-engineering that happened in the ecosystem, especially with everything SOAP. Yes, when it worked, it worked. But when it didn’t work, which was often the case when integrating different enterprise systems, then well… lord have mercy on me.

Sometimes I still use XSD to define a schema for clients, because in some areas there’s still better tooling for XML. And it gives me the safety of getting valid input data, if the XML couldn’t be validated.

And in the enterprise world, XML is far from being dead anyways.

fsckboy•2w ago

>What I miss the most about the XML ecosystem is the tooling.

yes, me too! when using XML it rendered all the flexability and power of the unix tools pretty useless, and I missed them.

w10-1•2w ago

Comparing the XML ecosystem to JSON is like comparing railroads to bicycles.

The main difference is that with enterprise companies and consultancies pushed complex XML solutions that differentiated them and created a moat (involving developer tools and compliance). JSON has always just been a way to sling data around, with a modicum of sanity. Hence the overbuilt/underbuilt split.

XML saved our axx. We had both internal and external API's with complex objects in JSON which failed constantly with mismatching implementations, causing friction with clients. Switching both to XML with schema solved that forever. But this was for complex B2B. We still used json for trivial web UI interactions.

dfabulich•2w ago

I think the industry settled on pretty good answers, using lots of XML-like syntax (HTML, JSX) but rarely using XML™.

1. Following Postel's law, don't reject "invalid" third-party input; instead, standardize how to interpret weird syntax. This is what we did with HTML.

2. Use declarative schema definitions sparingly, only for first-party testing and as reference documentation, never to automatically reject third-party input.

3. Use XML-like syntax (like JSX) in a Turing-complete language for defining nested UI components.

Think of UI components as if they're functions, accepting a number of named, optional arguments/parameters (attributes!) and an array of child components with their own nested children. (In many UI frameworks, components literally are functions with opaque return types, exactly like this.)

Closing tags like `</article>` make sense when you're going to nest components 10+ layers deep, and when the closing tag will appear hundreds of lines of code later.

Most code shouldn't look like that, but UI code almost always does, which is why JSX is popular.

hirvi74•2w ago

My favorite quote about XML was something along the lines of:

"XML is a lot like violence. If it's not getting the job done, then you aren't using enough of it."

basetwojesus•2w ago

I def get it, but it just seems like a fight not worth fighting anymore, at least not in the areas I work I guess. I work in my languages' type system first and rely on things like serde when some sort of conversion is necessary. If I suddenly got told to switch from JSON to XML for some upstream API it would be annoying but still firmly in the "solved problem" territory. I guess I'm saying use whatever interchange formats you have to and maintain the primacy of your internal type definitions

I'm sure there are plenty of arenas where this doesn't make as much sense but I suspect it's common

rf15•2w ago

Schemas? Oh you mean the one where Double doesn't inherent from Decimal? Year numbers with attached timezones? Author has not looked at this insanity for more than five seconds.

amadeuspagel•2w ago

> And in that victory, we collectively agreed to pretend that a format designed for human readability in a REPL was suitable for machine-to-machine communication, for configuration, for anything requiring rigor. We relinquished the logical formalism for convenience with our tools.

This is an odd qualifier. Is human readability in a REPL different from human readability in an editor? What could be more important in a format -- as long as machines are able to parse it at all -- then human readability? Machines can parse both JSON and XML, so the only way to compare them is how well humans are able to read (and write) them.

The article admits that JSON has answers for many of the problems it points out, like schemas and comments (JSONC) but dismisses them as not widely used. Compared to what? Total JSON usage? Fair enough. But more people probably use JSONC then XML for config files at this point.

bazoom42•2w ago

The fundamental problem is XML was designed for textual markup formats but ended up getting used mostly for structured data. Many of the features like element/attribute distinction and mixed content is necessary for markup but unnecessary complexity for structured data.

JSON is perhaps an accident of history rather than deliberately designed, but for structured data interchange it is better because it is simpler.

Just like XML, JSON os getting used outside of its area. Using JSON for configuration files is absurd, since it doesn’t allow comments.

neopointer•2w ago

It's ironic that he mentioned "comments" while we live on the sad age where "comments are evil", "smells like bad code"; while the reality of people screaming these sentences are exactly those who deliver garbage code.

xg15•2w ago

> JSON has no such mechanism built into the format. Yes, JSON Schema exists, but it is an afterthought, a third-party addition that never achieved universal adoption.

I don't see how this is in any way more of an afterthought than XML Schema was (except that it was designed by the same group as XML)

> Namespaces. XML allows you to compose documents from multiple schemas without collision.

It "allows" it in the extremely narrow sense that you can write a file with elements from different namespaces, parse it into a DOM with a schemaless parser and can still distinguish the elements.

It does not define any semantics about what an interaction between different namespaces means or which namespaces you can and which you can't combine.

orian•2w ago

I’ve once got a contract to write tiny system that had to integrate with 20-30y old system, then written in some version of c# / Microsoft framework, it was only speaking xml. I had problem with timestamp, cause depending on some internal state, it was returned different way. I had to go and read over 100pages of date+time in xml spec and implement it. Also, found that a lib they were using had bugs, so my lib had to deal with it.

I hate xml.

AtlasBarfed•2w ago

Xml fundamentally does not communicate when information under a tag is a list or a map structure, a data structure indication that is critical for deserialization of information into a format usable by programs.

In addition, it has tons and tons and tons of cruft, specification bloat, dogma.

The best parts of XML were probably XPath, and some aspects of document validation.. and that's it

erlkonig•1w ago

While XML was imperfect from overcomplication, JSON is imperfect by falling short of even basic database use, and somehow despite its alleged simplicity it manages to be unstandardized almost as badly as Markdown. JSON and YAML both fail to have comments that survive processing, something it's easy to regret since XML does have comments that appeared in the parsed objects.

A saner subset of XML, possibly run through some over-caffeinated developers to lighten its redundant syntactic feeling, would have given us something FAR better than JSON's failure and YAML's gratuitously hypercomplicated syntax.

Developers Are Stupid - developer.

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Isolating AI-generated code from human code | Vibe as a Code

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Isolating AI-generated code from human code | Vibe as a Code

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

The lost art of XML

Comments