Erik Naggum's wonderful rant about XML

michael_dorfman · on June 20, 2009

A classic rant.

"In many ways, the current [2002] American presidency and XML have much in common. Both have clear lineages back to very intelligent people. Both demonstrate what happens when you give retards the tools of the intelligent."

Erik will be sorely missed.

CodeMage · on June 21, 2009

A brief summary, then: Remove the syntactic mess that is attributes. (You will then find that you do not need them at all.) Enclose the /element/ in matching delimiters, not the tag. These simple things makes people think differently about how they use the language. Contrary to the foolish notion that syntax is immaterial, people optimize the way they express themselves, and so express themselves differently with different syntaxes. Next, introduce macros that look exactly like elements, but that are expanded in place between the reader and the "object model".

Maybe I'm weird, but to me (this part) sounds like LISP.

wvenable · on June 21, 2009

"Enclose the /element/ in matching delimiters"

I don't understand this part? What is he proposing this looks like?

functional-tree · on June 21, 2009

I don't understand it either. You somehow have to differentiate the tag from the content; maybe he wanted to reduce the redundancy of the open/close tags. Something like CSS:

    h1 { this is a header }
    p  { this is the content. }

HTML5-style optional ending tags might also work:

    <h1>This is a header
    <p>This is some content.

(Though I remain confused about which closing tags are optional in the HTML5 spec. I think h1 tags have to be closed.)

blasdel · on June 21, 2009

That's not "HTML5-style"! HTML5 has nothing to do with this. HTML has always had optional closing tags when dealing with non-nesting elements. Purge the nonsense XML propaganda from your brain.

Your example doesn't work though, as heading elements can contain paragraphs -- it's identical to "<h1>This is a header <p>This is some content.</p></h1>"

A better example would be: <h1>This is a header</h1> <p>This is some content. <p>This is more content. <ul> <li>Red <li>Blue </ul>

CodeMage · on June 21, 2009

Both of those are valid examples. A LISP-like syntax would also work:

{h1 This is a header}

DougBTX · on June 21, 2009

I think h1 tags have to be closed.

Assuming they are much the same as the HTML 4 tags:

http://www.w3.org/TR/html4/index/elements.html

olavk · on June 21, 2009

He is proposing turning XML into s-expressions. Eg. <p>Hello world!</p> turns into (p Hello world!)

olavk · on June 21, 2009

This is the worst kind of internet rant. It uses all kinds of elaborate similes to make the author (and presumably the sympathetic reader) feel smug and superior, but have very little technical content to justify it.

A reasoned criticism of XML could note that XML is a quite well-designed syntax for its intended purpose (domain-specific structured document formats for interchange on the internet), but that it has grown to be used outside of this niche for purposes like (non-document) data interchange, RPC's, configuration files and so on, where the advantages of the XML syntax for its intended domain turns into disadvantages.

For example, the distinction between elements and attributes is very useful when marking up documents, like in (X)HTML:

    <a href="http://harmful-cat">A <em>wonderful</em> rant</a>

However, if you want to markup a data record, which is not intended as a readable document, like:

    name: Justin
    address: Copenhagen
    phone: (12)34-56

Then the distinction between elements, attributes and content just becomes superfluous, and the XML syntax needlessly verbose. This kind of data is much clearer marked up with YAML, JSON or s-expressions.

On the other hand, the link-markup above would become pretty convoluted and error prone to write using any of these formats.

The "verbose" end-tags like </p>, </body> are very helpful syntax when manually editing large documents (which may have deeply nested structures spanning several screenfulls). However for simple and compact data structures a simple ")" (or even "}") is easier and clearer. Of course, if the content is never edited by hand anyway it doesn't make any difference, and you might as well chose the format that is simplest to parse (or in the very rare circumstance where bandwidth is the bottleneck, you could choose the format with highest content to markup ratio).

So if XML-syntax is better for structured documents, YAML for configuration files, and s-expressions for data-structures and code, which format is "best" in general? Should you always choose the optimal format, or does it make sense to chose the same format everywhere for consistency? For example a Lisp-based system might choose to use s-expressions for documentation even if it is a pain to edit, and conversely an XML-based publishing system might choose XML for configuration also, even if YAML would be easier to edit. This is just trade-off decisions.

But reasoned trade-off decisions are not glamorous and don't make you into an internet hero. If you want to be an internet hero you should write rants that provide the reader a conceptual framework which allows the reader to feel smart and superior. In this case, technical details detract from the purpose, since an informed reader might disagree with technical details, which might undermine the ego-boost the reader is supposed to feel.

But Erik goes beyond the common smugness, and introduces the concept of the stupid, moronic (XML-using?) masses which somehow reigns over and suppresses the few intelligent (presumably s-expression using) persons. Thereby Erik tabs into the deeply rooted insecurities (and consequently delusions of grandeur coupled with persecution complex) of many socially-challenged geeks.

jamesbritt · on June 21, 2009

'The "verbose" end-tags like </p>, </body> are very helpful syntax when manually editing large documents (which may have deeply nested structures spanning several screenfulls). However for simple and compact data structures a simple ")" (or even "}") is easier and clearer.'

The verbose end tags also make it easier the write consistent robust parsers. One complaint about SGML was that it hard to find a tool that correctly implemented the entire spec. The XML spec is 11 pages.

XML came from a desire to have SGML on the Web. As you've pointed out, people have used XML were it likely didn't belong.

To be fair, though, once the world had a choice of decent XML parsers and tools it made sense to use XML for many things, even where the syntax itself was less than ideal for the given task. The proliferation of JSON parsers will likely fix a lot of this abuse moving forward.

Still, berating XML for how people misuse it would be like saying Git is crap because some people use it as a general purpose database, and there are better ways to design relational databases.

fauigerzigerk · on June 21, 2009

An 11 page XML spec? That would be surprising. But you're right, XML itself is pretty simple and useful for document processing. Where things really went completely awry is XML Schema.

I have read (and implemented) a lot of weird specs in my life, but XML Schema has to be the worst. What makes it stand out is that it's incredibly convoluted and completely unfit for purpose at the same time.

jamesbritt · on June 22, 2009

" Where things really went completely awry is XML Schema."

I think XML worked out OK for its intended purpose because there was a lot of experience with SGML, HTML, and ad-hoc attempts at "re-purposing" HTML. Folks could say, well, we tried this and that, and this works and that is painful. And since it was not assured to be a success, there were fewer major vendors clamoring to get their fingerprints all over it.

But after XML caught on there was interest from tool vendors to beef things up, largely with abstractions that had yet to see real-world testing, and with things that just so happened to require massive IDE support.

The worst may have been the schema stuff, but there's a lot of competition.

BTW, this page http://www.w3.org/TR/REC-xml/ gives me 40 pages of print preview. A good chunk consists of appendices, but the main part runs more than 11 pages. I don't recall where I got that number from.

I'll just blame Tim Bray, for lack of a real excuse. :)

fauigerzigerk · on June 22, 2009

What really surprises me in XML Schema is not so much all the half baked stuff they put in and not even the crazy nesting of complex types for instance. What surprises me is what XML Schema cannot do.

One thing it cannot do is probably the most frequently used structure in all structured documents I have seen. It is to specify that a particular set of quantified elements can occur in any order.

The reason they gave for not supporting this is that validators would have to be more than contextless state machines. That's insane. They have created a schema language that doesn't support the most important schema constraint of them all for performance reasons.

tybris · on June 21, 2009

People severely underestimate the damage XML does. Example:

You just added a data overhead of 500% over using a 4-byte integer, and an even bigger parsing overhead. Let me guess, now you need to "scale"?

jauco · on June 21, 2009

Serialising a single number that way is indeed worthy of a dailywtf mention, however none of that has anything to do with xml,

    json:
      {number:9012853}
    Yaml:
      number: 9012853

It's bad code, no matter the language.

jamesbritt · on June 21, 2009

Pick your tools for the task. If time and space constraints are critical, use ASN.1 or something terser.

If you need Joe Tech to pop open a wire trace in Notepad and quickly check some value, maybe XML wins.

If you want to view the results as an expandable tree in a some Web browser, perhaps apply a style sheet, maybe XML wins.

If you want free, robust, off-the-shelf parsers with a well-documented API, maybe XML wins there, too.

XML solves a certain set of problems. If those aren't your problems, maybe XML isn't the right choice.

antipax · on June 21, 2009

Did he compare XML to rape?

diN0bot · on June 21, 2009

wish he'd get to the point a little faster. the was pretty good for the first 20 minutes.

trezor · on June 21, 2009

Am I the only one who finds it ironic that his main complains about XML is 1. that it is verbose and 2. requires too much resources to process. And then he proceeds to writing 10 pages trying to express this?

BerislavLopac · on June 21, 2009

Boring. All these problems have already been solved by JSON.