Microsoft is just dominant and exporting its 40 year old legacy codebase as a spec. LibreOffice team is frustrated that the for-profit model is beating the OSS model and crying foul over mostly necessary complexity. If LibreOffice started from scratch they’d probably appreciate how much Microsoft serializes because a sufficiently complicated document saved to .docx basically provides a reference implementation.
We do need for-profit alternatives to Word, and I’m working on one in legal.
[edit: I hope to put some real thoughts on this down soon, but most of the wonkiness emanates from evolving functionality and varying trends in best practices over the decades. I’ve implemented a fair bit of the spec here: https://tritium.legal, but most of the hard part is providing for bidi language support, fonts, real-time editing and re-rendering, UI and annotations like spellchecking and grammar, not conforming to the markup spec. Spec conformance is just polish and testing. A performant modern word processor of any spec, however, is a technological achievement on the order of a web browser.]
> Thus, the primary goal for this new format wasn’t to be elegant, universal, or easy to implement; it was to placate regulators while preserving Microsoft’s technological and commercial advantages.
That sounds quite anti-competitive to me
Google completely flipped the game and then cloud collaboration became everything.
Wow, big undertaking!
What we really need, though, is a for-profit alternative to Excel, that's not Google. I think Excel is more of the Killer App than Word has ever been.
But very little of this complexity is necessary for a standard interoperable document file format. The background was that the EU started pushing for a standardized document exchange format, and several governments started implementing regulations requiring the use of this format — Microsoft now had some very big customers which urgently needed a feature: a standard document file format. Microsoft _could_ have implemented and submitted a new format that doesn't include slavishly reflect their in-memory object graph and legacy issues. Or they even could have just adopted ODF (shudder). But they've chosen the easy way, because, frankly, they probably just didn't have the time. They took the accidental complexity which was the hot mess Microsoft Office internals (like a buggy date format) and serialized it to disk. It was never an ideal solution, but this was quick to implement.
That's just a classic case of technical debt: Microsoft needed to deliver a feature fast, and they were willing to make compromises. The crazy political shenanigans Microsoft had executed to standardize their technical debt are ironically just another form of accidental complexity.
No, it's just an ordinary conspiracy. Everywhere in the spec you see shit that says "Do it like Word95 does" or "Do it like Word97 does" is an intentional aspect of the standard that makes it unreasonably difficult for anyone who wishes to faithfully read or write documents in this format to do so.
It is inappropriate for an open standard to define behavior in terms of an undocumented proprietary black box. The primary reason for an open standard to exist is to permit interoperability. Anyone who has read nontrivial portions the standard would argue that ISO shouldn't have standardized OOXML as it was. It's a damn shame that Microsoft acted in bad faith to exploit ISO's rules [0] in order to ram a very poorly-specified standard through. It's always sad when people and organizations that should be acting pro-socially choose to do the opposite.
[0] By paying money to stack the organization with a bunch of entities whose only interest was to vote "yes" for the ratification of this standard, natch. IIRC, ISO had to modify their rules again after the OOXML vote because they couldn't get quorum due to those one-issue voters refusing to show up for future business.
Let's take a look at this "for-profit model" - is it just higher price outweighed by better product? lol:
Microsoft, after getting beat up in the press for making propietary extensions to the Kerberos protocol, has released the specifications on the web -- but in order to get it, you have to run a Windows .exe file which forces you agree to a click-through license agreement where you agree to treat it as a trade secret, before it will give you the .pdf file. Who would have thought that you could publish a trade secret on the web? - https://slashdot.org/story/00/05/02/158204/kerberos-pacs-and...
Back in 2001, Be, Inc. managed to get BeOS pre-installed on one computer model from Hitachi. Just one. On the entire PC market. Microsoft forced Hitachi to drop the bootloader entry to hide BeOS from customers buying it. They enforced their monopoly over the only possible niche BeOS could find on the PC market, crushing Be, Inc. in the process. - https://www.haiku-os.org/blog/mmu_man/2021-10-04_ok_lenovo_w...
So why aren't there any dual-boot computers for sale? The answer lies in the nature of the relationship Microsoft maintains with hardware vendors. More specifically, in the "Windows License" agreed to by hardware vendors who want to include Windows on the computers they sell. This is not the license you pretend to read and click "I Accept" to when installing Windows. This license is not available online. This is a confidential license, seen only by Microsoft and computer vendors. You and I can't read the license because Microsoft classifies it as a "trade secret." The license specifies that any machine which includes a Microsoft operating system must not also offer a non-Microsoft operating system as a boot option. In other words, a computer that offers to boot into Windows upon startup cannot also offer to boot into BeOS or Linux. The hardware vendor does not get to choose which OSes to install on the machines they sell -- Microsoft does. - https://birdhouse.org/beos/byte/30-bootloader/
What do you mean though? Libreoffice wrote their application from scratch, did they not? And they managed to implement a superior serialization format, did they not? And they managed to get that format standardized without bribing and cheating, did they not?
What you're saying is akin to "those residents of banana republics are just frustrated capitalism (and a little help from the CIA) is beating democracy"
> We do need for-profit alternatives to Word
Why does it have to be for profit?
For all the hate people gave CSS, it was/is fantastic at its job. Word documents are an example of how you don't design a document, and how when a for profit org designs a thing (instead of standards and market pressures), you get a technological monstrosity...
To be clear, I don't think LibreOffice is great. Part of their issue, they were built as a way to "not pay" for office, and it turns out that no, volunteers don't really do a better job at implementing 1000 pages of nonsense that the people who came up with that spaghetti code in the first place...
We don't need that software anymore, though. If you use it, know we are looking at you like you are pulling out a physical paper phonebook to store your numbers in, or a less hurtfully but just as topically, a record or CD player...it is dinosaur technology that pretty much has no place in todays world...
So, they have a point, I don't disagree with them, however it probably would be better just to "admit defeat", get MS to open source their code for compat reasons, and work on something new that's not trying to write viruses on your computer better than paragraphs...
The use of namespaces is also incredibly annoying in so far as I can tell in every xml library I can find they really aren't well supported for that "human" readable component.
When you crack open the file it feels like you are going to be able to find everything you need with an xpath like //w:t but none of the xml parsers I've found cope well with the namespaces.
In Python, the `find`, `findall`, etc. methods take a namespace dictionary. E.g.
result = doc.findall("//w:t", namespaces={"w": "..."})
In C# you can do: var navigator = doc.Root!.CreateNavigator();
nsManager = new XmlNamespaceManager(navigator.NameTable);
nsManager.AddNamespace("w", "...");
var results = doc.Root?.XPathSelectElements("//w:t", nsManager);
In Java you need to enable a namespace-aware flag in the settings to get namespaces to work. I can't recall off-hand how to do that.But then their manager needs to sell this project to the higher-ups, who have read BillG's memo about how "One thing we have got to change in our strategy – allowing Office documents to be rendered very well by other people's browsers is one of the most destructive things we could do to the company. We have to stop putting any effort into this and make sure that Office documents very well depend on proprietary IE capabilities. Anything else is suicide for our platform. This is a case where Office has to avoid doing something to destroy Windows." and took it to heart. So what does he do? Why, he spins a tale that since it's XML, they'll be able to standardize it, and everyone else will still be forced to interoperate with MS Office anyhow, because it will be the de-facto reference implementation (by the virtue of being there first, and widely deployed), and the spec is going to be an absolute PITA to implement decently — and that manager too will be absolutely correct!
I know when I had to deal with a LOT of excel in 2008-2013, somewhere in that range I gave up on trying to parse the XML (admittedly with the then-rudimentary tools, to say nothing of nascent state of nuget at the time) and just learned how to do VSTO (Visual Studio Tools for Office) as we all had excel installed anyway, and it led to less overall code for the tasks we had to do that involved Excel...
If you take the idea that it is "artificially complex, because they actively added complexity", then I can see how that isn't quite right. But "artificially complex" can also allow for "because they actively avoided the effort to remove complexity." In which case, we are back to the same spot? But in agreement this time?
OOXML is an extremely detailed spec that lists minute details of the Office documents, with uncountable features. While it could have used some "standard" features, there weren't that many usable standards when OOXML was being developed.
In comparison, OASIS OpenDocument spec is horribly ambiguous and has all the same issues (like units not being used consistently). It got better over the years, but it's still not at all great. And its size is now comparable to OOXML, when all the referenced specs are incorporated.
It's essentially a serialization of the binary format to XML.
ODF 1.4 is around 1,100 pages across all 4 parts whereas OOXML is over 6,000.
[1] https://stephesblog.blogs.com/my_weblog/2007/08/microsofts-f...
[2] https://ooxmlisdefectivebydesign.blogspot.com/2007/08/micros...
[3] https://www.robweir.com/blog/2007/01/how-to-hire-guillaume-p...
Yeah, sure, whatever. You'll never see these kinds of documents in real life. And the specified quirks were minor. If you don't implement them, you'll get subtle formatting issues in documents imported directly from Word97.
MS could have just put them into a "vendor-specific" extension and not documented them at all.
> ODF 1.4 is around 1,100 pages across all 4 parts whereas OOXML is over 6,000.
LOL, no. SVG spec alone is 800 pages. ODF formula spec is 200 pages alone, and is still underspecified.
You can see it for yourself here (in Part 4): https://ecma-international.org/publications-and-standards/st...
When i was a kid,making cool wordart headers for school projects was like 50% of what we used office for.
[0]: https://support.microsoft.com/en-us/office/insert-wordart-c5...
It might be a Swedish thing, but I always laugh when I see them. Not nearly as common today as ten years ago, but I see them a couple of times a year.
Very, very few people care about openness. Maybe a few hundred. Tens of millions care about docx capturing exactly what their doc files had.
Microsoft made the correct choice.
I do have strong memories of OOXML and the scandals that were with it when it became a standard through MS allegedly buying/stacking/influencing votes:
https://chatgpt.com/share/68bf5e11-4e10-8003-ac9d-d4d10f7951...
That doesn't count the various times where it behaved weird, inconsistently had fields/tables that were impossible to edit, etc. I've had to completely recreate everything a couple times over the years. That's just one document, for one guy that I don't really touch that often.
Say what you will about Firefox vs Chrome in terms of usability, compared to MS Word using LibreOffice is worse than early betas of Netscape Navigator 4.0. It's both impressive and upsetting. OnlyOffice at least looks nicer, even if it doesn't really function any better. MS's online version of Word in the browser operates more consistently than either.
But I have used Word for work.
The Shakespeare example is a good one where the sentence is split into multiple spans to apply style rules yet the bare text content could be extracted by just removing all XML tags. Whereas the ODF variant is actually less recommendable as it relies on an unneccesarily complex formatting and text addressing language on top of XML.
The article says
> Even at a glance [ODF's markup] is more intelligible. Strip the text: namespaces and it’s nearly valid HTML. The only thing that needs explaining is that ODF doesn’t wrap To be with a dedicated “bold” tag. Instead, it applies an auto-style named T1 to a <text:span>, an act of separating content and presentation that mirrors established web practices.
but this definitely makes things more complex for data exchange compared to OOXML.
> zero consideration given to third-party apps and segment formats
The reality is the opposite. COM serialization was specifically built to allow for composing components (and serializations thereof) that didn't know about each other into a single document. That's why it leans so heavily on GUIDs for names: they avoid collisions without needing coordination. That's a laudable goal, not pointless bloat. And the COM people implemented it pretty efficiently too!
> C++ data structures
What gives you that idea? Yes, the OLE stream thing was a binary format, but so is DER for ASN.1. Every webpage you load goes over a binary tagged object format not too different from OLE/COM's.
But due to a persistence of myths from the 90s, people still think of the Office binary format as "horrible" when it's actually quite elegant, especially considering the problems the authors had to solve and their constraints in doing so.
In many ways, we've regressed.
> Markup
The author of the article nails it when he says ODF is meant to be a markup language and OOXML is the serialization of an object graph. So what? Do people write ODF by hand? There are countless JSON formats just as inscrutable as MSO's legacy streams.
Anyway, the idea that the MSO binary format was crap because it was binary, lazy, and represented a "memory dump" is an old myth that just won't die. It wasn't a memory dump, it wasn't lazy, and it wasn't crap. Yes, there are real problems with some of the things people put inside the OLE container, but it's facile and wrong to blame the container or the OLE stream composition model for the problem.
A system managing opaque streams with handler apps registered via GUIDs is pretty much antithetical to open formats for data exchange.
Or is it that you just really hate UUIDs? Me too, man. Should have gone with reverse DNS. It's a technical and aesthetic quibble though.
I don't think anyone cares about debating the word "artificial," I don't think that was anyone's point. It's just not a standard. It was, as is made clear here, a way to head off a standard that would be possible to competitors to implement with a fake standard that Microsoft couldn't even implement.
I also don't think that it is "a counterproductive reflex that’s common in open-source circles: scolding users for accepting proprietary tech." I don't even know wtf that's supposed to mean. People are stuck with it because of corruption, they're not being scolded for using it.
edit: "LibreOffice itself, as ODF’s flagship, still suffers from rough edges in design, interaction, and performance. As a result, even as Office hobble itself with bloat, most people still find it easier."
Yeah, it'd be a lot easier if they didn't every have to deal with OOXML and could just work on their own product.
The author only provides arguments for "self-interested negligence". He provides no counterarguments to the claim that OOXML complexity was "a plot to block third-party compatibility". Therefore, he cannot compare "negligence" and "a plot". Therefore, his claim that "negligence" is a better explanation for OOXML complexity than "a plot" cannot follow.
To restate:
> If we dig into the context of OOXML’s creation, it can be argued that harming competitors was not Microsoft’s primary aim.
The author provides no evidence to support this claim. At most, the evidence provided in this section at most supports the claim that "negligence" played a role in OOXML complexity. From this evidence alone, no conclusions can be drawn about the "primariness" of "negligence" vs "harming competitors".
The author is just implicitly appealing to Occam's razor here, as people often in face of accusations of a plot. They can show that Microsoft has backed the ANSI accreditation of ODF[1] and eventually implemented support for ODF import and export in Office, but that's not enough to prove there was no conspiracy.
Instead, the article just provides a very plausible explanation for the complexity in OOXML. Does this explanation thoroughly disprove the accusations of a plot? Clear not. Is it more plausible than a great plot to crush a bunch of competitors that had no market share and kill a better standard document format that Microsoft did end up implementing in Office? Yes. This is probably as far as we can get.
[1] https://news.microsoft.com/source/2007/05/16/microsoft-votes...
I'm not saying they shouldn't do that as a company maximizing shareholder value. But we should all collectively groan every time the topic comes up, not applaud them.
https://en.wikipedia.org/wiki/Object_Linking_and_Embedding
where you could embed an Excel spreadsheet inside a Word document or actually embedded any of a large range of COM objects into a Word document which on one hand is a really appealing vision but on the other hand means you have to have and be able to run all the binaries for all the objects that live in a document which ties the whole thing to Windows.
PDF is a different sort of document format which privileges viewing over editing but it is also really about serializing an object graph when it comes down to it and then having various sorts of filters and transformations and a range of objects defined in the spec as opposed to open ended access to an object library.
This kind of system has a lot of overlap with the serdes problem you get with RPC frameworks that used to be under the files "Sun RPC sucks", "DCOM Sucks", "CORBA Sucks" and "WS-* Sucks" Those things are mostly forgotten these days because well... they sucked, and now the usual complaint is "protobuf sucks" but you rarely hear "JSON sucks" because it gave up on graphs for trees, if you don't have a type system people can't say the type system sucks, and the only thing that really sucks about it is that people won't just use ISO 8601 dates but you can always rise above that by just using ISO 8601 dates without asking for permission. But we all agree YAML sucks.
That points to any flexible document format sucking but also sucks because it has lots of poorly specified and obscure features that amount to "format this the same way Word 95 formatted it if you used a certain obscure option".
From a glass is half empty perspective it sucks because it's close to impossible to make a Microsoft Office replacement that renders 100% of documents 100% correctly.
From a glass is half empty perspective it rules because if you want to make a Python script that writes an Excel script with formulas it is easy. If you want to extract the images out of a Word document it is easy because a Word document is just a ZIP file. If you want to do anything with an OOXML document short of writing an Office replacement it's actually a pretty good situation.
Except it also spawned a thousand custom formats that include $ref support of some type, so we are right back to having graphs. :-D
It was a pretty big deal when OpenOffice.org's 2.0 release came with OpenDocument as the default file format. Very easy for someone to misread this MSOffice screen and click on OOXML expecting it to mean OO.o.
I have to wonder what sort of psychologists they employ who come up with ideas like aligning the “Word, Excel, PowerPoint” word column in the first selection with “Open” in the second selection so you read that word first and backtrack left to “Office”. Or maybe it's just a happy accident lol
That sound exactly like it is an anti-competitive format.
Keeping the own advantage sums pretty all anti-competitive behavior.
(I wonder what the specification-pages-to-man-years ratio is...)
ISO/IEC 29500 should be open to evolution, no? Just like all the open collaboration on it before it was confirmed as a standard.
A better format would have made us geeks a lot happier, but the average user just wants things to work the way they always have.
The XML version likely carries a lot of baggage having to be compatible with that.
Binary MS Office format is a phenomenal piece of engineering to achieve a goal that's no longer relevant: fast save/load on late-80's hard drives. Other programs took minutes to save a spreadsheet, Excel took seconds. It did this by making sure it's in-memory data structures for a document could be dumped straight to disk without transformation.
But yes, this approach carries a shitton of baggage. And that achievement is no longer relevant in a world where consumer hardware can parse XML documents on the fly.
I have heard it argued, though, that the "baggage" isn't the file format. It's actually the full historical featureset of Excel. Being backwards-compatible means being able to faithfully represent the features of old Excel, and the essential complexity of that far outweighs the incidental complexity of how those features were encoded.
I absolutely do not agree.
Not only is the standard overly complex, Microsoft also indulged in all sorts of unscrupulous activities to corrupt various National Standards Organisations to get it approved through the ISO <https://en.wikipedia.org/wiki/Standardization_of_Office_Open...>, which is clear evidence of malicious intent.
This is a quote from Richard Stallman:
> The specifications document was so long that it would be difficult for anyone else to implement it properly. When the proposed standard was submitted through the usual track, experienced evaluators rejected it for many good reasons. Microsoft responded using a special override procedure in which its money buy the support of many of the voting countries, thus bypassing proper evaluation and demonstrating that ISO can be bought.
OOXML is complex because it has to be. It has to losslessly round trip through an open format every single feature of Office. That's a lot of features.
Yes, it's complex. Should Microsoft have cut features of Office just to make OOXML simpler? That's ridiculous. What about users who relied on those cut features?
It was fair to ask Microsoft to open the file format. It wasn't fair to expect them to cut features and compatibility. The complaints about complexity from RMS and others represent outsiders seeing the sausage factory and realizing that the sausage making is complicated and needs a lot of moving parts. Maybe life wasn't as simple as the Slashdot "Micro$oft" narrative would suggest. Maybe the complexity of the product was downstream of the shit ton of complexity and sweat and thought that had gone into it.
But admitting that would have been hard. Easier to come up with conspiracy theories.
... which are either public, in which case people complain that the spec+extensions is too long instead of that the spec is too long, or
... which aren't public, in which case people complain that there's no interoperability.
You can't win.
> impossible for anyone else to implement
Except for all the people who did implement it?
It was never fully implemented. LibreOffice has been trying since then and there are always problems.
What it didn't have to be is sections upon sections of "this behaviour is as seen in Word 95", "this behaviour is as seen in Word 97" without any further specification or context.
The main struggle for independent implementors was reverse engineering all the implicit and explicit assumptions and inner workings of MS Office software.
> But admitting that would have been hard. Easier to come up with conspiracy theories.
I actually read through a lot of that spec at the time. A lot of it was just lip service to open standards at a time when MS was under a lot of regulatory pressure.
I expect most people posting on Hacker News would not be able to write a satisfactory specification for their own software if they are working a large legacy code base.
They do. Or they did at the time. They literally had things like "save as Word 95" in their office suite.
> Given the huge effort that would have gone into producing this thousand plus page specification, is understandable why the spec writers would have given up at times.
Given the huge effort to produce it in unreasonable timeline they forced themselves into due to regulatory pressure, sure.
The whole OOXML came about only because some large governments said "well, we don't want to be beholden to black box document formats, and we might want a selection of vendors in the future, so ODF looks like a nice proposition compared to Word, actually".
So it was literally rushed through Ecma. MS submitted 2000 pages in December 2005, the spec grew to 6000 pages over the course of the yer, and got standardised in December 2006. So, only a year to significantly increase the spec and standardize it.
And then it was rushed through the ISO standards track which included things like "Swedish vote declared invalid, accusing MS of manipulating votes" https://www.linux-magazine.com/Online/News/Swedish-OpenXML-V... or "Netherlands automatically abstains from voting due to Microsoft" https://archive.ph/20120711220944/http://isoc.nl/michiel/nod... or "near unanimous 'No with comments' turned into 'Abstain' from Malaysia" https://web.archive.org/web/20090726171905/http://www.openma... or...
Google said it best: https://www.csun.edu/~hcmth008/odf/google_ooxml.pdf
--- start quote ---
In developing standards, as in other engineering processes, it is a bad idea to reinvent the wheel. The OOXML standard document is 6546 pages long. The ODF standard, which achieves the same goal, is only 867 pages. The reason for this is that ODF references other existing ISO standards for such things as date specifications, math formula markup and many other needs of an office document format standard. OOXML invents its own versions of these existing standards, which is unnecessary and complicates the final standard.
If ISO were to give OOXML with its 6546 pages the same level of review that other standards have seen, it would take 18 years (6576 days for 6546 pages) to achieve comparable levels of review to the existing ODF standard (871 days for 867 pages) which achieves the same purpose and is thus a good comparison.
Considering that OOXML has only received about 5.5% of the review that comparable standards have undergone, reports about inconsistencies, contradictions and missing information are hardly surprising.
--- end quote ---
Do not for a second assume that anything about OOXML was done in good faith. Well, apart from the thankless work that people assembling the standard did.
And what do you think that setting did? Forked execution down an alternative no longer maintained codepath instead of the rewritten version that wasn't quite compatible.
And if that's the case, why was that specified in OOXML?
The office relies on behaviour in windows itself "a lot". Even office mac or office web they made themselves isn't a 1:1 replica of the office on windows.
Let alone describe it as a standard.
"this behaviour is as seen in Word 95" sounds sloppy, but it is indeed the closest they can get.
Or what else can you do? You can't just also ship a installation media of word 95 and windows into the ISO standard, right?
That's what they almost literally did. The spec is littered with "behavior of this program that has no specification and to see it you need to install it and run it"
And that's on top of re-inventing a bunch of specs in MS-only and MS-specific manner (like dates, for example)
> First, OOXML was, in material part, a defensive posture under intensifying antitrust and “open standards” pressure. Microsoft announced OOXML in late 2005 while appealing an adverse European Commission judgment centered on interoperability disclosures. Thus, it was only a matter of time before Office file compatibility came under the regulatory microscope. (The Commission indeed opened a probe in 2008.)
> Meanwhile, the rival ODF matured and became an ISO standard in May 2006. Governments, especially in Europe, began to mandate open standards in public procurement. If Microsoft did nothing, Office risked exclusion from government deals.
So... maybe they weren't directly asked to open their file format, but what then? Adopt ODF which is surely incompatible with their feature set, and... just corrupt every .doc file when converting into the new format? And also have to reimplement all their apps?
Here's what they shouldn't have done: Undermine ISO's credibility by ramming a hastily-constructed, not-yet-implemented spec through a fast-track process intended for mature specs by stuffing national bodies. I see no reason to place Microsoft's short term profits over the integrity of international standards bodies, nor do I see one to excuse Microsoft for doing so.
Why on earth would they want to do that? Because they hate having money? Because they suddenly decided that opening the market to competition would be more important than the billions they stood to lose?
These standards determine the tools people use to communicate with tax offices and other government institutions. Thanks to their efforts (supported by as much corruption as necessary), Microsoft didn't have to invent a new file format and would let people just use the file format everyone was already using for official business.
Office allows saving as ODF already and has supported it for ages. It was never about supporting open standards. This is all about corporate interests.
I can't think of a single "open" format designed by a large corporation that isn't "open" as a way to make more money.
> because companies and governments around the world were going to prioritise that in their purchases.
Governments are the largest revenue stream of pretty much every large software company starting from IBM/Xerox to OpenAI. MS is well known to indulge in all sort of legally grey practices to win such contracts.
The Transitional variant which is entirely backwards compatible is not fully defined in a way that others can implement without reverse engineering how Microsoft Office does things.
The Strict variant isn't totally compatible with all older binary formats but is fully defined.
Guess which one is the standard file format?
But people got blindsided by the new Microsoft propaganda.
This is sadly true. I tried to warn many young folks about VSCode, Copilot and whatnot, and they all laughed at me.
Now, they're not laughing either.
They didn't want a standard other people could adapt easily nor do the work to make Word adhere to one and it had to happen fast. By doing it the way they did they got everything they wanted and only needed to buy ISO.
So the question is whether it was actually a loss.
OOXML was the other way around: Microsoft had a standard and tried to enshrine into a standard and force others to waste time and resources to be compatible.
That is why I explicitly made references to specific versions as turning points, as I expected the usual FOSS advocacy replies.
Quite often I find that if people stopped holding fundamentally broken dynamics together and just let the thing fail and fail hard, the overall long term outcome would be better off. Much to the opposite of your suggestion.
It's just that turns out, things being properly bodied or properly broken take coordinated action. People deciding one by one, one way or the other, is what actually enables and sustains pathological dynamics like this.
But then how does one single out any specific decision? Well, nohow, not with any rigor for sure.
Why not both? You didn't provide any arguments against it.
Standardizing it as if it were an actual designed, open standard, was, however, very much an act of sabotage.
That's my read, anyway.
They completely rejected what standardisation processes _do_, which is to subject the format to scrutiny, criticism and change, to make it universally useful and implementable.
Microsoft absolutely did not do that. They rammed through their proprietary bullshit and slapped an "open standards!" label on it.
https://www.consortiuminfo.org/opendocument-and-ooxml/the-co...
> 2.15.3.26 footnoteLayoutLikeWW8 (Emulate Word 6.x/95/97 Footnote Placement)
> This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 6.x/95/97) when determining the placement of the contents of footnotes relative to the page on which the footnote reference occurs. This emulation typically involves some and/or all of the footnote being inappropriately placed on the page following the footnote reference.
> [Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]
> Typically, applications shall not perform this compatibility. This element, when present with a val attribute value of true (or equivalent), specifies that applications shall attempt to mimic that existing word processing application in this regard.
The format was _written_ to include specifics that only matter for one product - Microsoft Office - and don't even reveal in that format how those specifics should be interpreted faithfully. This is of ZERO use to anyone looking to make interoperable software that can make use of this standard. And that's the point - it's NOT an open standard, it's quite deliberately Microsoft's proprietary and closed bullshit with "open" shat on top of it, and a paid-for endorsement by a standards body that completely detonated its own credibility by approving it.
We are agreeing, I think. I was saying that the format was not developed as an act of sabotage. Ramming that format through standardization (without, as you note, doing any of the things standardization *should do*) so it could plausibly be labeled an open standard was the act of sabotage, IMO.
If you can then confirm that I have no intention to stop (even though I know I'm hurting you) and all I have to say in my defense is, "Actually I do exactly whatever I want and just don't care about you at all," what is the difference to me at that point?
In practice total indifference is even more toxic than hate, because it denies engagement. I owe no extra charitability for callous indifference being the root cause of the actions taken. Company or person, society reserves the right to judge you on the effects your force of will brings forth on others. They used theirs to kill their competitors.
> An Ars Technica article sources Groklaw stating that at Portugal's national body TC meeting, "representatives from Microsoft attempted to argue that Sun Microsystems, the creators and supporters of the competing OpenDocument format (ODF), could not be given a seat at the conference table because there was a lack of chairs."[55]
Sure, yeah, that's not deliberate Sabotage /s
> Google stated that "the ODF standard, which achieves the same goal, is only 867 pages" and that
If ISO were to give OOXML with its 6546 pages the same level of review that other standards have seen, it would take 18 years (6576 days for 6546 pages) to achieve comparable levels of review to the existing ODF standard (871 days for 867 pages) which achieves the same purpose and is thus a good comparison.
Considering that OOXML has only received about 5.5% of the review that comparable standards have undergone, reports about inconsistencies, contradictions and missing information are hardly surprising.[118]
In contrast to ODF specification that is long, complex and written in such a terse way that it really does only specify what is a valid ODF file and not in any way what it means. Good luck implementing that without just copying whatever LibreOffice does.
Complexity alone would just make it laborious to implement, but the underspecification and subtle deviations of Microsoft’s implementation makes it virtually impossible to achieve full compatibility.
To be fair, we're talking about a product line with over 35 years of history here. Cruft in the format builds up but can never be removed, so long as you commit to strong backwards compatibility - which Microsoft has always done.
Fun trivia: many of the old binary formats use a meta-format called OLE2 (Object Linking and Embedding). The file format is a FAT12 filesystem packed into a single file, with a FAT filesystem chain, file blocks aligned to a specific power-of-two size, etc. This made saving files very fast, but raised the possibility of internal fragmentation (where individual sub-files are scattered over many non-contiguous blocks); hence, users were recommended to "Save As..." periodically for large/complex files to optimize the internal storage.
https://learn.microsoft.com/en-us/openspecs/windows_protocol...
Wikipedia has an article on the file format [1]. It was quite nice. It works like an uncompressed zip file with transactional updates.
Earlier Word document formats were much worse. They were a dump of Word's memory contents. Saving and loading was very quick though!
[1]: https://en.wikipedia.org/wiki/Compound_File_Binary_Format
"OK we will standardize our serialization format"
It's... I guess malicious compliance, though also if you don't care about interop you're not going to try to abstract away your internal application structures, are you!
I appreciate the standard existing rather than it not existing. Trying to have the standard exist in this way has always felt like an uphill battle, and at least now there's _something_.
Just you will have a better time if you emulate how Office does things. But you have a bit more documentation to go along with it.
Another XML standard from MS that also seems relatively simple is XPS, a PDF alternative. But it uses Open Packaging and that is somewhat hard to read.
So did you somehow contribute to it in the end?
However, if your API ever interfaces with users in a corporate environment, parsing simple comma-separated UTF-8 CSV is suddenly quite beyond the reach of however is nibbling at your endpoint, so why not code up a simple little reusable bit of code where you can write any simple tabular data (string, numbers, and dates, in one or more sheets of data made up of rows and columns) that lets you choose the output format? A zip-archive of CSV-files (one per sheet), JSON, ODS, or XLSX; pick your poison.
I did just that, and while it is perfectly doable, any low-level, low-resources, low-dependency approach will mean actually touching the XML in LibreOffice's ODS (fine), and Microsoft's OOXML (…).
This is how you write a date in a cell in both.
ODS:
<table:table-row table:style-name="ro1">
<table:table-cell office:value-type="date" office:date-value="2021-04-10T12:34:56" calcext:value-type="date">
<text:p>10/4/2021, 12:34</text:p>
</table:table-cell>
</table:table-row>
OK, a bit verbose, but trivial to implement. Format the date however you like — you'll probably use two different formatters on the same datetime instant.XLSX (OOXML):
<row r="1" ht="12.8">
<c r="A1" s="1" t="n">
<v>39448.5</v>
</c>
</row>
Obviously, as you can all plainly see, the date here is 2008-01-01T12:00:00…And of course it makes perfect sense to hardcode the cell coordinate there. It's not like you would dynamically generate a bunch of cells (…).
Excel can directly ingest a CSV file served over an URL as data source, with the Accept header manually set to text/csv.
I wrote a backend once that supported this feature so that management could pull whatever data they wanted off an internal application without pestering me. They could literally take the URL of a page and pull it as a CSV file as-is.
Anybody who knows a bit of Excel can pull that data themselves by following a set of simple instructions.
That is very much possible. It is also completely impossible when you live in a country where Microsoft decreed that the C in CSV stands for semicolon; as far as Excel is concerned (no, seriously). Welcome in the Netherlands!
Now whether or not Excel can open a CSV file depends on the locale of the user, which will inevitably vary, and of course, whether they are using Excel at all.
So yes, you could offer just CSV, but not if your user is a spreadsheet jockey and you would like to stay on good terms with your support staff.
During this I saw just about every variant of CSV and character encoding known to man, often inside the same file. Once I had a file that had UTF-8, MARC-8, Latin1, and (yes really) VT100 control codes. All in one file.
All in all, I'd prefer something that actually could be validated for some sort of correctness (this said, another time I got an XML export from some software that was invalid XML, so...)
Other than that, the difference is pretty minor. ODS is very verbose and stores the content of the cell twice for some reason, but the XML trees are essentially the same.
The best way for corporate interaction is to export to whatever the hell Microsoft Excel accepts as an external data source, because .xslx files can natively import remote data that way. Hope your customers' computers are all configured for en_US mode, though, because CSVs aren't as universal as people pretend they are.
Oh, and while 39448.5 is fine, 39448.0 makes Excel throw an error and refuse the whole document. Midnight January 1st 2008 is just 39448. The parser cannot handle 39448.0.
At a previous job I'd been tasked with developing support for importing Numbers files along side our existing Excel and CSV support. After a couple days we rightly gave up as the tiny fraction of people who actually wanted to import Numbers files was outweighed by it's massive complexity.
We ended up just adding instructions for Numbers users to export to CSV
No one genuinely interested in document openness—be it for document workflows, publishing automation, content archiving, or future-proofed documents—would have done it that way.
Maybe it was as simple as dumping a simplistic re-encoding of its legacy binary format into XML and ramming it through standards organizations. But yes, there was malice aforethought and a classic Microsoft playbook in motion: embrace, extend, extinguish.
This says it all: https://cdn.imgpile.com/p/RppGj1l
Direct quote from the article.
I've probably got the details wrong, but that was the gist of it. I'd love to rediscover the analysis, but my searches have not yielded it.
This is Microsoft. Don't get distracted.
3cats-in-a-coat•20h ago
gitonup•19h ago
I worked on the MS Word core team for a little over three years from 2010-2014, and de-facto owned a significant part of implementing ODF / OOXML Strict support.
The binary format was a liability for Microsoft to begin with, because of decades of cruft lining up with actual memory alignment. During my tenure there I ran into code my GM had written as an intern and was still intact -- he had 20+ years of tenure (mostly on Word) when I joined the team.
The translation of the file format to XML involved a significant amount of performance degradation if you weren't careful. Hundreds of millions of people use the app monthly, and MS still tries to maintain backwards compatibility. Given that open APIs were a relatively late development for the app, I really don't think in the current reality of what's expected by boards of directors for the companies they oversee that _anyone_ would take years to:
a) define a spec that maintained that backwards compatibility
b) reach whatever nebulous simplicity metric today's HN article wants
c) not get whoever greenlit the project fired for taking that many engineering hours for a and b