I would like to ask a question about XML and S-expressions(-ish) notation. S-expressions are pretty old; they are also really simple. We could consider two forms that are equal in meaning, different in syntax:
(xml code taken from Polish wikipedia)
<?xml version="1.0" encoding="UTF-8"?> <ksiazka-telefoniczna kategoria="bohaterowie książek"> <!-- komentarz --> <osoba charakter="dobry"> <imie>Ambroży</imie> <nazwisko>Kleks</nazwisko> <telefon>123-456-789</telefon> </osoba> <osoba charakter="zły"> <imie>Alojzy</imie> <nazwisko>Bąbel</nazwisko> <telefon/> </osoba> </ksiazka-telefoniczna>
(:version "1.0" :encoding "utf-8") (ksiazka-telefoniczna :category "bohaterowie książek" ; komentarz(a comment) (osoba :charakter "dobry" (imie Ambroży) (nazwisko Kleks) (telefon 123-456-789)) (osoba :charakter "zły" (imie Alojzy) (nazwisko Bąbel) (telefon)))
The S-Expression version is much more concise. We avoid redundancy by using simple list notations, yet we still can define syntax to include things that we want to have(e.g. properties). Of course, this is just an example, and the actual standard could have been better or simply different; however, it's shorter and easier to parse. Why did XML win?
Personally, I think the best part about XML is the well-defined schema capabilities, rather than its syntax. The schema mechanism allows users to publish their document format to share what they consider a valid document. There are also automated validators. Plus, types and schemas created by one user can be extended by other users.
As far as I know no one has made anywhere near the effort to standardize a general purpose schema mechanism for s-expression, except for the LISP language itself (which the sample in the OP's question isn't using).
XML were initially designed to support markup languages like HTML, which are authored manually and contains mixed content (text intermingled with elements with metadata).
Markup text documents are often longer than a screenful. If you see a
) and you can't see the beginning of the structure, you are pretty lost, you dont know if the was a chapter or a sidebar which just ended. The redundancy of repeating the tagname in endtags in XML like
</sidebar> makes this much easier for the human writer. It also makes it more robust - if you accidentally delete an end tag, you can often infer which end-tag is missing.
SGML (the predecessor to XML) allowed you to optionally shorten the end-tag to a single character, but this feature were left out of XML for simplicity.
So in short, XML is more verbose by design, because it is designed to support human-editable document. Today XML is used for a wide variety of purposes, also for pure machine-to-machine communication, where this redundancy is not needed.
Your suggested syntax would not support mixed content very well. Take this example in HTML:
<p>Hi! <a href="example.com">Click here</a>!</p>
How would you express this in your syntax? You would need some kind of additional delimiters to distinguish between attributes and text content. Suddenly it it not so concise anymore.
Angle brackets are much rarer in ordinary text than parentheses and colon.
XML is not S-Expressions says:
- XML treats random characters as text, not as markup, so you don't have to wrap everything in quotation marks.
- XML does not use standard human-punctuation characters as markup, reducing the need for escape sequences. The only characters that it cares about are angle brackets.
- XML is verbose, but it is also more robust in the face of errors. Because XML specifies start and end tags by name, it is easier to discover errors, and easier for editors to reason about.