“ It may not be possible to organize the whole web... but it may be possible to develop an organizing map of the web”
WEB IS A FOREST ... SEMANTIC WEB A JAPANESE GARDEN ? Musings of a student of Ranganathan

F.J. DEVADASON

devadason_f_j@yahoo.com

TO THE MEMORY OF TWO OF MY TEACHERS OF LIBRARY AND INFORMATION SCIENCE


PROF. M.R. KUMBHAR	PROF. D.B. KRISHNA RAO

(Both were quite firm and strict, yet kind. The former accepted me as a research student against odds, encouraged me and supervised my research at Karnatak University, Dharwad. But for which I would have never been an academic. The latter instilled in me an interest in Colon Classification at the University of Madras, he being the only Ph.D under the supervision of Dr. S.R. Ranganathan).

IS THE PRESENT WEB NON-SEMANTIC ?

This is a question that came to my mind when we are being bombarded with the concept of Semantic web and its glories lifted up to the heavens. While discussing the importance of correct naming of concepts and using correct and meaningful terms in designing a faceted classification, Dr. Ranganathan mentioned in a class, that one method is to verify the meaning of the antonym and check the appropriateness / truthfulness. The truth is that the present web is as semantic (meaningful) as any other, otherwise no one can make any sense out of it and you know what would have been its fate. It is not non-semantic and that one has to create a semantic web to make it meaningful. True, the digital documents (objects) in it (text, image, animation, video, audio, or any combination of these) are not well structured like the documents (records) in a database. So these may not be processable easily as records in a database. But at the same time, such well structured databases processable by specific software to generate the required answers to queries are also embedded in the web as unique objects. The PRESENT WEB IS SEMANTIC and meaningful for the millions who access it and use it and add to it.

PRESENT WEB IS ALMOST LIKE A FOREST

Due to the fact that any one can add to the web, it has grown enormously, uncontrollably, and it is a peculiar man-made forest or jungle. It is unorganized, most of the information is ill structured. It is ill structured because it was not designed as a proper global cooperative information system in the first place. True to the nature of a forest, it has dangerous animals also living in there lurking secretly to devour the unsuspecting victim. Will any one attempt to organize a forest ? Trim it down and make Japanese Garden out of it, and call it "Semantic Garden", "Semantic Forest"?

But now there are such attempts called "Semantic Web" attempts. . It is not possible because web is growing every second and chaotically too. Freedom of expression and the democratic movement has taken root in every educated individual irrespective of the ideology followed in individual societies. If it is forced, then the Japanese Garden web type documents will equal the number of such gardens existing in the real world, for the rest would turn to blogs, myspace, e-mail to web document and such easy web publishing, leaving the Japanese garden type web documents to the "elite" and the rich. The web will be as chaotic as ever and special search engines such as blog search engines, email search engines and rss feed search engines etc., would become the order of the day.

Semantic Webbers would like to keep the forest, but make it a processable one !. They would like the unstructured or ill structured documents to be well structured. Is that achievable? OK you use RDF, XML, Ontologies (It is a faceted classification scheme). . At present it has been proposed that the ontologies must be specific domain based. This is actually again like Librarians' method of managing the information. Librarians knew that it would be difficult to manage one library collecting and organizing everything and made subject specific Libraries and Information Systems. You have the National Library of Medicine, the National Agricultural Library and so on. Now the semantic Webbers want to follow this model and suggest that ontologies must be domain specific !!

Alright, you have domain specific ontologies/ faceted classification schemes. All the web documents on the web are put as RDF using OWL, ontologies and all that stuff. That is, you have all the metadata and all the agents. . So what? They are processable, you can find out which medical doctor is available at a time suitable to you and all that stuff, PROVIDED YOU IDENTIFY FIRST THE WEB DOCUMENT HAVING SUCH TYPE OF DATA. . But how are you going to identify which document or web documents have to be accessed to further process by your agent. Is that also going to be identified by your agent? Are you going to have agent of agents to select the appropriate agent? Is your agent going to traverse the web to find out the documents to be processed? Is it that for a query the super agent to which the query is submitted selects the specific special agent / agents. The specific query using ontologies is processed and then the agent enters the web. But where? Into the forest and starts processing all the processable? You are going to have trouble as you may have to crawl at least a good part of the web, and follow the links provided in the web documents identified, to get the right data to process. .What to do then? You have to have an index to the web which you can search first then select the best fit from it, identify whether the required type of data is available in it and then process it.

MAP FOR THE WEB FOREST

When you wish to traverse a desert you perhaps cannot have a map and rely on it. But you may be able to use a compass. However the early explorers did a great job of developing maps for the routes they undertook. Not only they developed maps of the routes, they also kept diaries of the customs of the places they visited, the dangers, the wise things to do and so on. They had maritime maps and complicated codes using stars and planets and so on. Just like the early explorers set out to explore the earth and prepared maps to indicate what is where, what precautions one must take to be there etc., a map of the web must be constructed.

The picture on the right hand side is not the appropriate one. I saw a cartoon perhaps drawn by R.K. Laxman after he visited Los Angeles during the 1980's and got lost in the mesh of roads, while he ventured to explore the terrain by walking. That would have been the most appropriate. It was perhaps published in Span or The Hindu I am not sure. If any one comes across a weary backpack man on foot in front of a map of a flyover on the side of a flyover looking at the sign "You are here" please let me know.

Now a days, even when we want to visit a fairly medium sized and well laid out Garden or Park or even a Graveyard (cemetery) we need a map which will indicate where you are and where other things are. [I visited this Rabaul War Cemetery <http://www.roll-of-honour.org.uk/Cemeteries/Rabaul_War_Cemetery/ > in Papua New Guinea ]. Should we not have a map for the web ? Of course we have to have the map at different levels. We can even have the maps for specific domains maintained at different locations just like the National System of Libraries with specific libraries assigned for specific areas or subjects. We can create surrogate or Summary Record of the web document, like the one suggested in the paper “Faceted Indexing Based System for Organizing and Accessing internet Resources”. (The surrogate must be enriched with more information as mentioned below). We cannot organize the entire web but we can organize the surrogates/ summary records !. If we have the surrogates as domain specific then it is good. The surrogate index will not only have the summary of each of the sites indexed, but also information on the data available, its structure, how to submit a query to the specific database contained in the site etc.. Then the agent will have to select the segment of the surrogate file and easily select the required site / database and find out how to further process and get to the selected few sites to complete the job. How to further process the data contained in the site retrieved by searching surrogates is present in the surrogate itself. We can have a master index and domain specific indexes / surrogates. Even if the data in the target site is presented in a table, it is enough if the surrogate record for it has the structure information so that appropriate routines for processing could be selected. The target site need not follow the semantic web standards . This is the same idea of a catalog of media material. The surrogate record will describe the media material, its characteristics such as resolution, which equipment is required to play the media material and so on.

The possible surrogate systems are systems like Google, Yahoo, etc. These have already built up the necessary tools. But what is lacking is the inclusion of a facet analyzed subject heading with appropriate superordinates to each of the components in the heading enriched with synonyms. The required classification and indexing tool called Classaurus can also be derived from this facet analyzed subject heading. The facet analyzed subject heading / POPSI heading (for style sake it can be referred to as Logico Semantic Domain Expression (LSDE) -- which is nothing but a structured subject heading), may have to be assigned to each of the meaningful units / sections and subsections of the web document as required.

If by developing complicated standards, putting a page on the web becomes like touching the nose by winding the hand around the back of the neck, then the freedom loving, line of least action takers, would just put their web documents as blogs, e-mail (HTML), myspace, discussion lists (HTML) and such other easy, quick methods not caring for all the high flown standards of RDF, OWL, Ontology (Faceted Classification scheme) etc.. Then the web would become more chaotic than it exists today. However, any processable data / database in the individual web document could be indicated in the surrogate with structure identification , if necessary a sample of the data, perhaps one row of the data from the table in the web document with datanames (meta data) could be put in the surrogate for easy identification of processing routines. Better still, any routine necessary for processing could be stored in the surrogate itself. But for solutions that require data from different documents to be merged, some generalized processing routines would be required .

LET ME RECAPITULATE

1)Because the facet analyzed POPSI (POstulate based Permuted Subject Index) headings can produce an organizing effect while sorted, each of the web documents and their worthwhile sections and subsections, coherent text, or image or audio or video or any combination of these, must be fitted with such a heading, having, a Base or Discipline, its divisions and subdivisions as applicable, the Main Object or Core Entity, its species, parts, constituents, any Property of or Actions on the object or by the Object, and its types followed by Common Modifiers, each one of them "modulated" by their respective superordinates enriched with synonyms. If these are sorted then that would constitute the World Wide Web Map (WWWM), exhibiting an organizing effect. [Please see Bhattacharyya's POPSI : Its fundamentals and procedure based on a general theory of subject indexing languages, in "Library Science with a slant to Documentation, Vol 16 (1979) Iss (1); pp 1-34" which is not available on the web. However you can access this <http://drtc.isibang.ac.in/~guha/popsi/popsi-doc.pdf >, you may not have this "International Federation for Documentation / Classification Research Report No 21: Computerized Deep Structure Indexing System, Indeks Verlaag, Frankfurt 1986" but you can have a look at "Online Construction of Alphabetic Classaurus" <http://www.oocities.org/devadason.geo/OnlineClassaurus.htm >, and of course http://us.share.oocities.com/devadason.geo/DSIS.pdf

The most important classic document from which all the Facet Analysis stalwarts -- those who have developed "facet analysis for dummies", "easy facet analysis", the "true and simple facet analysis" and brought fame to them -- have copied is "Prolegomena to Library Classification by S.R. Ranganathan" and is available at <http://dlist.sir.arizona.edu/1151/> This is edition 3 1967. Some how I have a fascination for edition 2, 1957.

2) Such headings (Logico Semantic Domain Expressions - LSDE's) could be formed easily by the web document builders as it is almost similar to forming an expressive title to the web document, which can be done by answering a set of questions and following a few guidelines. Or it can be done by the surrogate creators / search systems. Initial efforts would become less time consuming as the web maps get built and are made available for reference while creating the subject headings.

3) These subject headings could be used to form the Classaurus (faceted classification scheme with vocabulary control features) and be used for easy translation of faceted headings from one language to the other (there are some problems, due to certain concepts not existing in certain languages and terms to denote them being not available and so on), and to categorize the web documents.

4) It would be worthwhile exploring the possibility of utilizing the existing library system such as the different National Libraries, having the necessary expertise in particular fields, to create and maintain specific WWW Maps for the areas of expertise of individual national libraries. For instance, the National Agricultural Library could be used as the agency responsible for mapping any web document to be categorized as belonging to Agriculture. The agency has a well developed thesaurus and would be the most appropriate agency for developing the Agriculture WWWM. This agency could be assigned a range of IP addresses (covering a specific geographical area or so), to monitor and update the web. In a similar way the existing information handling expertise could be channelized to form National WWW Maps in different subject areas and merged. It could be formed language specific, but there should be an English language LSDE for every subject heading in the summary record / surrogate. Switching between languages may be possible, but there are problems when such modulated subject headings are translated from one language to another. Even hierarchy would be a bit difficult to map correctly !. I do not want to go into examples here. The allocation of work could be on the style of cooperative global information system, and it could be subject - specific, language - specific, nationality - specific with full coverage of designated IP address ranges of the web documents assigned to individual agencies to build these maps and maintain them avoiding any duplication of effort.

5) Any speific data or database included in the web document could be indicated in the surrogate for that web document, with a model of it or structure of it, or even an example of it consisting of one row of the data with attached datanames / meta data. This will help prepare the processing routines for further processing of the data available in the web document. It would be even better to store the processing agent in the surrogate itself or provide a link to it, so that it can be loaded for processing.

E-mail: devadason_f_j@yahoo.com

21 May, 2007.

Return to My Old Home Page GeoCities, Athens.