XML:-Intoduction

[Home]

What is a:-markup; b:-markup language.

Markup is anything we care to put on a document which has a special meaning, (The coffee stain is NOT markup in this sense, unless it means you are a Klutz :) !!), and as such we use it all the time. Highlighted text is markup. A series of check marks on your bank statement are mark up.

If we want others to understand what our markup means we need a set of rules
a:- To declare what constitutes markup
b:- Declaring exactly what our markup means.

A markup language is just such a set of rules. For example in SGML any thing in angle brackets is considered mark up. <This-is-mark-up>.

The makers of HTML used SGML to make a set of rules declaring what markup in HTML means. This set of rules is contained in a separate document called the HTML DTD (Document Type Definition, see later).

Among other things the HTML DTD says that when you come across <P> in a document, start a new paragraph.

All SGML documents must have a DTD, but not all XML documents

What is XML?

XML is a simplified version of SGML designed for use on the web, which immediately brings us to the question of what SGML is.

SGML is an international standard that contains the rules of how to write a "markup language". You are almost certainly familiar with what a markup language is because HTML is a markup language and is written according to the rules of SGML.

If you think of SGML as a primer containing the rules of grammar and syntax of HTML, and of HTML as the vocabulary of a language you will not be far wrong. There are several other languages written according to SGML rules, HTML just happens to be the most widespread and well known. This way you can think of HTML as the dialect of a group of SGML languages.

This concept should present little difficulty. English is a dialect of a group of languages called "Old German". Hundreds of Millions speak English. Tens of Millions speak German. Millions speak Dutch. How many speak Friesian?

It so happens that the rules of SGML are very complicated so if you wanted to design your own markup language using SGML you would have to do a lot of ground work. Because there is a clear need for new languages (see next section) there was clearly a need for a simplified Grammar. XML is that simplified grammar.

Who needs XML?

Every one who needs to send documents over the Internet containing information that needs to be manipulated in various ways. (You still make your cool display pages using HTML!!)

XML allows us to markup a document with a set of tags of our own devising.

Markup can be of three sorts:-

Stylistic Markup:-

Tells how the document is to be styled. The <I>, <B>, and <U> tags are all stylistic markup in HTML.

Structural Markup:-

Tells how the document is to be structured, the <H*>, <P> and the <DIV> tags are examples of structural mark up.

Semantic Markup:-

Tells us some thing about the content of the text. <TITLE> and <CODE> are examples of semantic markup in HTML.

HTML has proven very adept at preparing documents for display over the web, but a document marked up in HTML tells us very little about the content of the document, and it so happens that for most documents to be useful in a business situation there is a need to know about the documents content.

As an example if a patients medical records was marked up in HTML, and I as a doctor had wanted to find out about the patients allergies, at present I would have to down load the whole record (several K), and then do a manual search through that document.

If however the patients records were marked up in XML and one of the tags was <allergies>, I could just send a request to the Server for that part of the document, and receive a few bytes of information instead of hundreds of Kilo- bites.

Using the same example of patients records, what if we wanted some one to have access to some part of the records, but not others, (Would you really want every one at the Insurance office reading the notes that your Shrink may have written about you?), then you could instruct the server to withhold certain parts of the document. i.e.. in the above example anything marked up <psych.-note> or <confidential>.

Thus the ability for individuals, groups of individuals, and institutions to write their own mark up language will expedite information transfer and provide other benefits such as confidentiality.

Is XML difficult?

No! XML was designed to be easy, the official specification is a mere 40 pages (down load it from http://www.w3.org/TR/PR-xml-971208) and is written in (almost) readable language.

Any one with a basic under standing of HTML can be writing XML documents in no time at all.