11 - XML: The Second Internet Revolution


This subject isn't directly related to writers using the Internet, but it may be useful to those who build their own web pages or those who are curious about how Web pages actually work. Major changes are occurring. The first Internet revolution, unplanned and haphazard, was adding graphics and formatting to plain text. The current revolution is part of a grand design for controlled growth and flexibility that will eventually replace everything on the Web.

Warning: There are a lot of acronyms below. Fortunately, almost all of them end the same ---ML for Markup Language. Once you have Markup Language down, the rest is easy.

First we go back to 1986, an eon in Internet time. Large organizations like the DOD (Department of Defense) were having problems with their documentation. With technical manuals from thousands of different suppliers flowing in, all using different styles, hardware, software, etc., there was no good way to standardize. Enter SGML (Standard Generalized Markup Language). SGML uses something called a Document Type Description (DTD) to define how text will be printed.

For example, if you want all chapter titles in Field Manual 32475 to be Ariel, 14 points, bold, you define them as such in the DTD, where you also define all other styles for that manual. When suppliers send you the electronic documentation for the left Widget #1239, all they have to do is designate in their files which words are chapter titles and the SGML software will check your DTD definition and print the marked text as Ariel, 14 points, bold. It doesn’t matter what the original text looks like or where it came from, once it’s tagged as a title, SGML does the rest.

If you’re already using HTML (Hypertext Markup Language) for your Web pages, this probably seems familiar. In fact, HTML is a baby-talk DTD of SGML. The main difference, aside from complexity, is that SGML is meant for printing and HTML is meant for displaying text on a monitor. To write basic HTML you simply tag text with a definition. For example, <b>AWL</b> tells your interpreter software (Web browser) to begin boldface at the A and to stop boldface after the L. The tag <title>Austin Writers League</title> simply tells browsers to display the text as its default style for title.

HTML displayed nothing but text in the early days of the Internet, and it worked fine, but Web developers soon began to demand more of their Web pages. They added tables, graphics, sound, colors, interactivity, and so forth. Everyone had to learn how to use JavaScript, VBScript, Active-X controls and many other non-HTML tools to create state-of-the-art Web pages. This mishmash is sometimes called DHTML (Dynamic HTML) because it makes things happen on the page.

It's not a good way to solve the problem. HTML was never designed to handle the unexpected demands being put on it. The World Wide Web Consortium (W3C), the people who set Internet standards, decided that the only real answer for HTML’s limitations was to completely overhaul how Web pages are built. They created XML (Extensible Markup Language). XML is a lot more like SGML, providing much of SGML’s power and versatility, but XML is much, much easier to learn and to use. There's not enough space here to fully compare HTML and XML except to say that XML is a tool that Web developers will love once they learn how to use it.

Today's Internet standard is a transitional form called XHTML (Extensible Hypertext Markup Language) which has now officially replaced HTML. It’s still pretty much HTML in look, but the tagging rules are much stricter than before. Eventually, Web developers and Web browsers will switch completely to XML and its strict standards and older the HTML-based Web pages will not work well any more unless they're upgraded.

Confused? Don’t worry. You won’t have to learn anything about XHTML, XML, or any other acronym unless you maintain a Web page. If you happen to notice some differences on your favorite sites, this may be what’s going on in the background but that’s really all you need to know. Pay no attention to the man behind the curtain, just be aware that he's pulling some levers and things are looking better.

Summary: HTML can no longer handle the Internet’s demands so it has been replaced with a transitional markup language called XHTML. XHTML will be officially replaced by XML, which will give Web developers a great deal more versatility. HTML will still be around for a while yet as one of many subsets of the new XML world.


First published November 2000
Copyright 2000
Fred Askew