A Brief History of HTML

From its simple start as an online subset of SGML to its current piecemeal—but growing—compatibility, hypertext markup language has weathered a storm of growth, abuse, and innovation.

The history of hypertext markup language is a strange and interesting tale. From its simple start as an online subset of SGML through political maneuverings of the huge browser companies to its current piecemeal—but growing—compatibility, the language has weathered a storm of growth, abuse, and innovation. Recently, the battle for control of the standard has focused on functionality. Microsoft and Netscape are both touting W3C compliance as a crucial marketing advantage. And the work being done on the latest HTML draft shows there may be light at the end of the tunnel.
But it wasn't always this rosy.

The idea behind HTML was a modest one. When Tim Berners-Lee was putting together his first elementary browsing and authoring system for the Web, he created a quick little hypertext language that would serve his purposes. He imagined dozens, or even hundreds, of hypertext formats in the future, and smart clients that could easily negotiate and translate documents from servers across the Net. It would be a system similar to Claris XTND on the Macintosh, but would work on any platform and browser.

The problem, however, turned out to be in the simplicity of Berners-Lee's language. Since it was text-based, you could use any editor or word processor to create or convert documents for the Web. And there was just a handful of tags—anyone could master HTML in an afternoon. The Web flourished. Everyone started publishing. The rest is history.

But as more and more content moved to the Web, those creating browsers realized the simple markup language needed much improvement. How should the innovation take place? Tim Berners-Lee certainly wasn't going to be the sole developer of HTML—he never intended to be. So the developers, in the long-held tradition of the Internet, implemented new features in their browsers and then shipped them. If the Web community liked them, they stayed. If not, they were removed.

Look, for example, at the addition of images to the Web. Early browsers were simply text-based, and there was an immediate desire to display figures and icons inline on a page. In 1993, a debate was exploding on the fledgling HTML mailing list, and finally a college student named Marc Andreessen added <img> to his Mosaic browser. People objected, saying it was too limited. They wanted <include> or <embed>, which would allow you to add any sort of media to a Web page with the much-touted content negotiation used on the client. That was too big a project, according to Marc, and he need to ship ASAP. Mosaic went with <img>, and it would be years before including media in a page using <embed>, or <applet>, or <object> would come to the surface again.

Mosaic shipped with <img>, Tim went off to the nascent World Wide Web Consortium, and Marc left for California to start a little browser company called Netscape.

HTML continued to grow, with new, powerful, and exciting tags. We got <background>, <frame>, <font>, and of course, <blink>. Microsoft jumped into the game, and <marquee>, <iframe>, and <bgsound> started competing for room in the spec. And all this time, the W3C furiously debated something called HTML3, a sprawling document outlining all sorts of neat new features that nobody supported (remember <banner> and <fig>?). It was now 1995, and things were an absolute mess.

Something needed to give. If things kept up the way they were going, Netscape and Microsoft would eventually have two completely proprietary versions of HTML, but with no way of supporting the utopian vision of content negotiation. Instead, people would be forced to choose one browser or the other, and surf content specifically created for that platform. Content providers would either have to choose between vendors or spend more resources creating multiple versions of their pages.

There are still vestiges of this lingering on today's Web, but not the nightmare scenario that was anticipated. The HTML arm of the W3C changed course and started collecting and recording current practice in shipping browsers, rather than designing a future, unattainable version of the language. HTML3 was dropped entirely, and work began on HTML3.2, which, ironically, was far less technologically advanced than its predecessor. But, more importantly, it was realistic in its goal to give content providers and browser developers a common, if dated, reference from which to work.

So there's one big happy family now, right? Netscape, Microsoft, and the W3C are all working hard together to create the brightest HTML future possible, right? Well, reality isn't always that clean. Even recently, the standards wars have flared up over things like <layer> vs. CSS positioning, or the two competing webcasting proposals. But for the moment, at least we have a process in place for dealing with these issues.

And the process is continuing; we should see HTML4 this year. Currently referred to by its code name, "Cougar," the draft includes a slew of technologies being added to current browsers: stylesheets, scripting, frames, and more. We'll take a look at this draft in detail next week.