The Future: HTML or XHTML

The discussion of XHTML versus HTML has popped up again, and until now I’ve managed to resist the urge to throw in my 2¢. Well, no longer will I sit on the side line while the same arguments get rehashed again and again, which will not get us anywhere. In this article, which I originally published in my blog, I’ll attempt to answer this question: does the future of the Internet lie with HTML or XHTML?

Firstly, I’m just going to set a few ground rules. This is not going to be another version of XHTML as text/html is considered harmful or there are no real benefits to use XHTML or an XHTML isn’t even supported kind of article. I’m going to get straight to the facts, so here goes…


HTML is all but dead. It’s been getting beaten to death ever since the early versions of Netscape and IE. It’s been on life support and holding on by a thread (albeit a particularly strong, yet very much frayed, thread) ever since IE5/Mac threw it a lifeline called DOCTYPE sniffing. Yet no attempt to revive it has been, or will ever be, successful in prolonging its life more than a few years past its use-by-date and it is almost time to let it rest in peace.

I know what you’re all thinking. I’m either insane or late for April Fools. How could, arguably, the most successful document format in the history of the web, and computing in general, have been so irreparably damaged to be this close to death?

The answer and the reason for my temporary insanity, which has lead to these rather shocking and completely outrageous yet incredibly accurate claims, all comes down to the question of what HTML is supposed to be, compared with the mind numbingly deformed representation we all know and love today, and how it can and cannot be improved in the future.

What is HTML Supposed to Be?

From its humble beginnings as a small, light-weight, non-proprietary, easy-to-use document format designed for the publication and distribution of scientific documents (created by the mastermind who is aptly titled the inventor of the World Wide Web and whom we all know as Tim Berners-Lee) closely resembled the international standard, ISO:8879 – Standard Generalised Markup Language (SGML).

While HTML was not originally based on SGML, the similarities in syntax and the lack of formal parsing rules for HTML led to the decision to resolve the differences and formalise HTML 2.0 as an application of SGML. This was eventually published by the IETF as RFC 1866 in November 1995. Martin Bryan provides a relatively short summary of how HTML began, and the process to convert it into an application of SGML.

What is HTML Now?

Sadly, by the time HTML was formalised as an application of SGML, the irreparable damage to the language (which would eventually lead to the coining of the term tag soup by Dan Connolly) had already been done. None of the HTML browsers that were implemented prior to HTML 2.0 contained conforming SGML parsers, few have ever done so since, and no mainstream browser ever will.

As a result, browsers don’t read DTDs. Instead they have all known elements, attributes and their content models essentially hard coded, and basically ignore any element they have never heard of. For this reason it is widely believed that DTDs serve absolutely no purpose for anything other than a validator, and DOCTYPEs are for nothing but triggering standards mode in modern browsers.

There are many intentionally broken features in existing HTML parsers that directly violate both the HTML recommendation and SGML standard that will never be fixed. The reason is the simple fact that to do so would break millions of legacy documents, which would only end up affecting the user’s ability to access them. See HTML 4.01 Appendix B for a brief, yet very incomplete, summary of unsupported SGML features.

How Can HTML Be Improved?

The simple answer is not much at all. The ability of HTML to progress and improve is severely limited by the aforementioned non-conforming parsers and millions of legacy documents that would break if any serious improvements were to be made. As Hixie put it: we can at best add new elements when it comes to the HTML parser.

The element content models for many existing elements cannot be changed much. (e.g. The p element cannot be updated to allow nested lists, tables or blockquotes, the title element cannot be updated to contain any semantic inline-markup, etc.) Much of the quirky non-conformant behaviour exhibited by existing browsers will have to be inherited by any future implementations. In fact, such behaviour is being retroactively standardised by Ian Hickson and the WHAT Working Group.

There is even speculation about whether or not HTML should retain the pretence of being an application of SGML. Other than the benefits of validation with SGML DTDs, and the triggering of standards mode with an SGML DOCTYPE, there is little reason to do so. However, the extensive conformance criteria expressed within the WHAT Working Group drafts that simply cannot be expressed within a DTD would make validation – as a quality assurance or conformance tool – limited, at best.

Not only that, but any serious attempt at retaining backwards compatibility with existing browsers is expected to require an extensive library of hacks (like Dean Edward’s IE7) to make existing browsers do anything useful with the new extensions. Not even style sheets will have any effect on the new elements without this library of hacks, as the new elements will be essentially ignored.

The question is: do we really want to hold onto a dying language any longer than we need to, with any and all progressions and enhancements being so extremely limited; or should we really start pushing to move to a much more flexible and beneficial alternative?


Despite all prior claims of XHTML having no benefit whatsoever, when it comes to extending the language with new elements, attributes and content models, the benefits far out weigh the negatives. In fact, all claims that XHTML has no benefits over HTML only apply to XHTML 1.0 because the semantics of both document formats are identical.

What is XHTML Supposed to be?

XHTML is supposed to be an application of XML with very strict parsing rules. Do I really need to continue? I will assume we all know what XML and XHTML are, so no need for me to reiterate it all. For anyone that doesn’t, that’s what search engines are for. :-)

What is XHTML Now?

Unfortunately, most XHTML on the web is nothing more than tag soup, or is at least not well-formed, served as text/html. As previous surveys have shown, a majority of sites claiming to be XHTML don’t even validate, and most would end up with browsers choking on them if the correct MIME type were used.

Some of the other problems are: that XHTML is not implemented by IE, incremental rendering for XHTML in Gecko doesn’t yet work, scripts written for tag-soup often won’t work in real XHTML, style sheets need to be fixed, etc., etc… Most of this stuff is discussed in Ian Hickson’s document Sending XHTML as text/html is Considered Harmful (which I’m sure everyone has read by now) and elsewhere on the web.

However, the major benefit of XHTML over HTML is that we do already have (mostly) very strictly conforming XML parsers. While these do still have a few bugs, they can be fixed without any detrimental effect on legacy content. This fact alone allows much greater room for enhancement than HTML ever will.

How Can XHTML Be Improved?

With a proper understanding of how to use XML and XHTML, there are really no limitations on how far XHTML can progress. We will not be held up by extreme browser bugs and limitations; there’s no non-conformant behaviour that will have to be replicated by future implementations, element content models can be changed for existing elements, and new elements can be added and supported very easily. And at least with full style sheet support they will not be rendered totally useless (as in HTML without a library of hacks) in existing XHTML UAs.

It is completely true that, if you are not using any of the XML only features such as mixed namespace documents (e.g. XHTML+MathML), there are almost no benefits to be gained from using XHTML 1.0. However, there will be benefits in using either XHTML 2.0 or the WHAT Working Group’s (X)HTML Applications, including Web Forms 2.0, Web Apps 1.0 and Web Controls 1.0, which I think should be collectively known as HAppy 1.0 (for HTML Applications), not (X)HTML 5.0.

By using the XHTML variant of HAppy 1.0 (if that’s what it gets called – with or without the uppercase A – let me know what you think) backwards compatibility with existing XHTML UAs will be much easier, because at least style sheets will work and the new elements will simply behave like divs and spans. Backwards compatibility with IE and other legacy UAs will require a bit more work, though: you will need to arrange for your XHTML document to be converted into HTML, as serving this new version of XHTML as text/html will be strictly forbidden.


Category: html Time: 2005-04-14 Views: 1

Related post

  • HTML vs Xhtml; which one to start with 2011-03-14

    hey people i am starting to learn making websites, now i want to know what is the difference b/w HTML and XHTML; kindly keep it simple and jargon free pls. :) also, what mark up language i shall use between these two to start with? --------------Solu

  • News Wire: The Future of HTML and IE 2006-11-09

    Pyjamas: Pythons answer to GWT Just as the Google Web Toolkit lets you write JavaScript-powered web UIs by writing Java code, Pyjamas lets you write JavaScript-powered web UIs by writing Python code. Still early days for this framework, but worth a l

  • Review - HTML and XHTML, The Definitive Guide 2003-01-06

    Do you want to learn HTML? If the answer is yes, this book is it. If you have always thought that learning HTML would be an overwhelming experience, give this book a chance. Sure, HTML is a large and complex language, but anyone can learn it. It isn'

  • Six Months Later: The New HTML Working Group 2007-05-10

    The following is republished from the Tech Times #164. Because I just wasn't getting enough email (ha!), I joined the W3C's new HTML Working Group last month. Nearly six months ago, now, Tim Berners-Lee announced that the W3C would form a new working

  • SVG Is The Future Of Application Development 2008-12-22

    With the recent release of Google's Chrome browser, I've been thinking a lot about the possibilities the growing capabilities of the Web gives us. Almost everything about the new Google browser, from the process-per-page sandbox to the application wi

  • Which should I be learning, HTML or XHTML? 2011-03-25

    I'm just learning HTML, but some say XHTML is better. The W3C recommends XHTML, and I know W3C is more credible, right? --------------Solutions------------- XHTML was a reformulation of HTML using XML semantics (i.e., allow an "HTML" document to

  • Mac vs PC and the Future of the Web 2007-09-17

    The following is republished from the Tech Times #173. Are you a Mac person or a PC person? Having recently dipped my toes in the Apple pool, I have a theory about why people seem to fall in love with the Mac when they try it. Believe it or not, it h

  • Dave Greiner on the Future of Email 2007-10-01

    At the Web Directions South conference last week, I managed to corner Dave Greiner from Freshview (the company responsible for Campaign Monitor and MailBuild). We chatted about HTML email, the push to promote support for standards in email, and how t

  • HTML or XHTML: Does it Really Matter? 2008-07-09

    I could say that HTML has come a long way since its first standard specification, HTML 2.0, was published in 1995. I would like to say that – but it wouldn't be true. It only took another four years for the next specification to be published – HTML 4

  • Common optimizations to reduce HTML or XHTML page size? 2010-07-09

    What are some common optimizations performed to reduce HTML or XHTML page size? Some that come to mind are: removing comments, removing extraneous whitespace, moving repetitive inline styles to a CSS stylesheet, etc. What are some others? Which offer

  • SitePoint Podcast #85: Back to the Future 2010-10-30

    Episode 85 of The SitePoint Podcast is now available! This week your hosts are Patrick O'Keefe (@iFroggy), Stephan Segraves (@ssegraves), Brad Williams (@williamsba) and Kevin Yank (@sentience). They are joined by special guest Kristen Holden (@khold

  • HTTP: How to be deleted from search engines at a certain point in time in the future? 2011-05-20

    Is there a way to tell search engines, that a page they crawl should be included in the search results now, but have to be deleted at a certain time in the future? I have a website where hundreds of publications happen each day and I want them to be

  • Relation and differences between SGML, XML, HTML and XHTML 2011-07-16

    I was wondering what "profile" means in Wikipedia: XML is a profile of an ISO standard SGML, and most of XML comes from SGML unchanged. According to HTML is a subset of

  • BuildMobile: The Future of WebOS 2011-09-01

    Not very long ago when we were planning the launch of our humble magazine BuildMobile, which you are reading right now, the content strategy included coverage of the nebulous WebOS mobile operating system. Come launch time, there wasn't enough tracti

  • InnoDB "log sequence in the future!" crashing, won't start 2011-11-06

    I stopped mysql only to find that it wouldn't come back up, /etc/init.d/mysql start only outputs . . . . . . failed. I've narrowed it down to an issue with InnoDB. The database starts when innodb_force_recovery = 5 and nothing lower. When I "check ta

  • Any better way out of MySQL InnoDB log "in the future"? 2011-11-17

    I've got this InnoDB error in MySQL 5.0. Mysqld was stopped cleanly, but I managed to lose ib_logfile0 & ib_logfile1 afterward. Now after a clean startup, InnoDB has done its "crash recovery". I went through the innodb_force_recovery=4 busin

  • Barracuda tagging email with 'date grossly in the future.' 2012-01-06

    I have a Barracuda 300. Granted it's running on some old firmware and the Energize updates are quite outdated. I'd like to update them, but, not my financial decision to do so. Starting around the first of the year, our Barracuda began tagging a rath

  • Keep it simple now, or program with the future in mind? 2012-01-17

    I'm currently coding a new application for my company that is rather involved. To meet the deadline, the functionality has been toned down quite a bit so that we can have something ready to go for launch. I've been given the task of getting version 1

  • Thoughts on web development architecture through integrating C++ in the future to a web application 2012-03-23

    I'm looking to build a website (it's actually going to be a commercial startup) I saw this question and it really shed some light on a few things that I was hoping to understand (kudos to the op). After seeing that, it would make sense that, unless t

iOS development

Android development

Python development

JAVA development

Development language

PHP development

Ruby development


Front-end development


development tools

Open Platform

Javascript development

.NET development

cloud computing


Copyright (C), All Rights Reserved.

processed in 2.709 (s). 13 q(s)