CL Home Erica Brown Home

Information Retrieval


Information Retrieval is the process of determining the relevant documents from a collection of documents, based on a query presented by the user (Chinchor 2000).

Information Retrieval (IR), also called "Document Detection", generally performs two functions - document search, and document routing. These terms are defined in the document referred to as TIPSTER Generic IR (2000) as:

  • document search - the selection of documents from an existing collection of documents.
  • document routing - the dissemination of incoming documents to appropriate users on the basis of user interest profiles.
In commercial use today, most IR systems seem to concentrate on the first of those two functions.
One of the greatest limitations of IR is that currently most of the commercially-available technologies rely heavily on keyword searches. This can be very frustrating for the user, since the user has to figure out what the best keywords will be for a particular search. It is also difficult due to the fact that humans don't think of "keywords" as a natural way to search for information.
Some systems try to alleviate this problem by using a natural language interface that allows a user to input a normally-worded question (such as "what is the price of tea in China?"). The system then processes this query and essentially converts it into a keyword search by removing question words, prepositions, and articles, such as "what", "the", "is", "of", and "in". The words remaining are considered keywords, and the query submitted might look like "price tea China". Documents that contain these words will then be retrieved for the user.
Unfortunately, as anyone who has performed a keyword search on a commercial search engine such as AltaVista or InfoSeek can tell you, the relevance of the documents that are returned from the user's original query is often questionable.
Current research in this area is primarily concerned with improving the accuracy of the process and the relevance of the documents returned. Formal evaluations, in the form of the Text REtrieval Conferences (TRECs 1 through 9), have demonstrated that "Retrieval system effectiveness has approximately doubled in the seven years since TREC-1" (NIST 2000). However, there is still much progress to be made.

(TREC-7, TREC-8, and TREC-9 evaluations) TREC-9 URL corrected, 21 February 2001. EJB

This page last modified November 13, 2006 by Erica Brown.
httpd://www.oocities.org/ejb_wd/IR-intro.html
© 2000-2006, Erica Jean Lindsey Brown, All rights reserved

This page has been accessed - - since November 14, 2000