Internet Search Tools & Techniques

INTRODUCTION

According to the results of a study published in the July 8, 1999 issue of Nature, the World Wide Web is estimated to contain approximately 800 million pages of publicly-accessible information. As if the Web's immense size weren't enough to strike fear in the heart of all but the most intrepid surfers, consider that the Web continues to grow at an exponential rate: tripling in size over the past two years, according to one estimate.

Add to this, the fact that the Web lacks the bibliographic control standards we take for granted in the print world: There is no equivalent to the ISBN to uniquely identify a document; no standard system, analogous to those developed by the Library of Congress, of cataloguing or classification; no central catalogue including the Web's holdings. In fact, many, if not most, Web documents lack even the name of the author and the date of publication.

Imagine you are searching for information in the world's largest library, where the books and journals (stripped of their covers and title pages) are shelved in no particular order, and without reference to a central catalogue. A researcher's nightmare? Without question. The World Wide Web defined? Not exactly. Instead of a central catalogue, the Web offers the choice of dozens of different search tools, each with its own database, command language, search capabilities, and method of displaying results.

Given the above, the need is clear to familiarize yourself with a variety of search tools and to develop effective search techniques, if you hope to take advantage of the resources offered by the Web without spending many fruitless hours flailing about, and eventually drowning, in a sea of irrelevant information.


SEARCH ENGINES AND SUBJECT DIRECTORIES

The two basic approaches to searching the Web are search engines and subject directories.

Search engines allow the user to enter keywords that are run against a database (most often created automatically, by "spiders" or "robots"). Based on a combination of criteria (established by the user and/or the search engine), the search engine retrieves WWW documents from its database that match the keywords entered by the searcher. It is important to note that when you are using a search engine you are not searching the Internet "live", as it exists at this very moment. Rather, you are searching a fixed database that has been compiled some time previous to your search.

While all search engines are intended to perform the same task, each goes about this task in a different way, which leads to sometimes amazingly different results. Factors that influence results include the size of the database, the frequency of updating, and the search capabilities. Search engines also differ in their search speed, the design of the search interface, the way in which they display results, and the amount of help they offer.

In most cases, search engines are best used to locate a specific piece of information, such as a known document, an image, or a computer program, rather than a general subject.

Examples of search engines include:

The growth in the number of search engines has led to the creation of "meta" search tools, often referred to as multi-threaded search engines. These search engines allow the user to search multiple databases simultaneously, via a single interface. While they do not offer the same level of control over the search interface and search logic as do individual search engines, most of the multi-threaded engines are very fast. Recently, the capabilities of meta-tools have been improved to include such useful features as the ability to sort results by site, by type of resource, or by domain, the ability to select which search engines to include, and the ability to modify results. These modifications have greatly increased the effectiveness and utility of the meta-tools.

Popular multi-threaded search engines include:

Subject-specific search engines do not attempt to index the entire Web. Instead, they focus on searching for Web sites or pages within a defined subject area, geographical area, or type of resource. Because these specialized search engines aim for depth of coverage within a single area, rather than breadth of coverage across subjects, they are often able to index documents that are not included even in the largest search engine databases. For this reason, they offer a useful starting point for certain searches. The table below lists some of the subject-specific search engines by category. For a more comprehensive list of subject-specific search engines, see one of the following directories of search tools:

Table of selected subject-specific search engines
 
Regional (Canada) Regional (Other) Companies
People (E-mail addresses) People (Postal addresses & telephone numbers)
Images Jobs
Games Software
Health/Medicine Education/Children's Sites

Subject directories are hierarchically organized indexes of subject categories that allow the Web searcher to browse through lists of Web sites by subject in search of relevant information. They are compiled and maintained by humans and many include a search engine for searching their own database.

Subject directory databases tend to be smaller than those of the search engines, which means that result lists tend to be smaller as well. However, there are other differences between search engines and subject directories that can lead to the latter producing more relevant results. For example, while a search engine typically indexes every page of a given Web site, a subject directory is more likely to provide a link only to the site's home page. Furthermore, because their maintenance includes human intervention, subject directories greatly reduce the probability of retrieving results out of context.

Because subject directories are arranged by category and because they usually return links to the top level of a web site rather than to individual pages, they lend themselves best to searching for information about a general subject, rather than for a specific piece of information.

Examples of subject directories include:

Specialized subject directories
Due to the Web's immense size and constant transformation, keeping up with important sites in all subject areas is humanly impossible. Therefore, a guide compiled by a subject specialist to important resources in his or her area of expertise is more likely than a general subject directory to produce relevant information and is usually more comprehensive than a general guide. Such guides exist for virtually every topic. For example, Voice of the Shuttle (http://vos.ucsb.edu) provides an excellent starting point for humanities research. Film buffs should consider starting their search with the Internet Movie Database (http://us.imdb.com).

Just as multi-threaded search engines attempt to provide simultaneous access to a number of different search engines, some web sites act as collections or clearinghouses of specialized subject directories. Many of these sites offer reviews and annotations of the subject directories included and most work on the principle of allowing subject experts to maintain the individual subject directories. Some clearinghouses maintain the specialized guides on their own web site while others link to guides located at various remote sites.

Examples of clearinghouses include:


SEARCH STRATEGY

Regardless of the search tool being used, the development of an effective search strategy is essential if you hope to obtain satisfactory results. A simplified, generic search strategy might consist of the following steps:
  1. Formulate the research question and its scope
  2. Identify the important concepts within the question
  3. Identify search terms to describe those concepts
  4. Consider synonyms and variations of those terms
  5. Prepare your search logic
This strategy should be applied to a search of any electronic information tool, including library catalogues and CD-ROM databases. However, a well-planned search strategy is of especially great importance when the database under consideration is one as large, amorphous and evolving as the World Wide Web. Along with the characteristics already mentioned in the Introduction, another factor that underscores the need for effective Web search strategy is the fact that most search engines index every word of a document. This method of indexing tends to greatly increase the number of results retrieved, while decreasing the relevance of those results, because of the increased likelihood of words being found in an inappropriate context. When selecting a search engine, one factor to consider is whether it allows the searcher to specify which part(s) of the document to search (eg. URL, title, first heading) or whether it simply defaults to search the entire document.

Search logic refers to the way in which you, and the search engine you are using, combine your search terms. For example, the search I Love Cricket could be interpreted as a search for any of the three search terms, all of the search terms, or the exact phrase. Depending on the logic applied, the results of each of the three searches would differ greatly. All search engines have some default method of combining terms, but their documentation does not always make it easy to ascertain which method is in use. Reading online Help and experimenting with different combinations of words can both help in this regard.  Most search engines also allow the searcher to modify the default search logic, either with the use of pull-down menus or special operators, such as the + sign to require that a search term be present and the - sign to exclude a term from a search.

Boolean logic is the term used to describe certain logical operations that are used to combine search terms in many databases. The basic Boolean operators are represented by the words AND, OR and NOT. Variations on these operators, sometimes called proximity operators, that are supported by some search engines include ADJACENT, NEAR and FOLLOWED BY. Whether or not a search engine supports Boolean logic, and the way in which it implements it, is another important consideration when selecting a search tool. The following diagrams illustrate the basic Boolean operations.

AND
OR
NOT
Boolean operators are most useful for complex searches, while the + and - operators are often adequate for simple searches.