How to Search the World Wide Web: A Tutorial and Guide for Beginners

David P. Habib and Robert L. Balliot
Modified by George Cottay

Finding material on the Web can be time consuming and frustrating. This is not surprising given the enormous amount of information available on the Internet and the different ways it is stored and retrieved. The search process is made all-the-more difficult because of the large number of search tools, their differing information content and search methods, and the overall lack of industry standards.

This tutorial was originally prompted by the present inherent difficulties in searching the Internet and the absence of a guide of particular help to the beginner. To keep this guide simple, we have eliminated unnecessary information and explanations and concentrated on the essential elements of searching the Web. Our aim is to add to your knowledge and understanding of the search process and to help improve your skills in conducting searches.

Tutorial and Guide Subjects

Appendix

Search Tools Reference. Information about the listed search tools.

Glossary. Definition of terms used in the search process.

We have excluded specialized search tools as beyond the scope of this work such as news, medicine, libraries, government and law, to name only a few. Instead, this work covers general search procedures applicable to all information. Advanced search methods have been included because they are effective in getting difficult-to-find information.

Due to the continuous rapid growth and changes in the search field, there are few hard and fast rules, particularly in regard to search tools. Expect that some search tools will expand and improve, others will peak and change little, still others will be discontinued, and new entries and mergers will be frequent. All in all, searching the WWW is not simple for beginners, but it is manageable.

For those just starting Web searches, we recommend that you first become familiar with the components of the Guide. Follow with hands-on experience to develop a rudimentary knowledge of the search process, such as the Search Exercises at the end of Section A. Reading the Guide will then be more understandable. You will find that this Guide works best when it is used as a companion to your searches, especially with use of the glossary to clarify unfamiliar terms. As another aid in getting started, we recommend the early study of Planning and Executing a Search in Section E.

Netscape Navigator was the browser used during the development of this Guide. The teachings also apply to Microsoft Explorer, though some of the terms used are different. For example, MS Explorer Bookmarks are called Favorite Places and links are called shortcuts.

A. Search Tools

Various Web sites will make finding information on the Web possible. From among the many Web search tools, we have chosen 12 that we believe to be among the most useful. This required the omission of some of the historically popular ones to keep the number manageable.

Table 1 lists preferred search tools by the search method each employs. As can be seen in its headings, the search methods are: Directory, Search Engine, combined Directory / Search Engine and Multi-Engine.

Table 1

Preferred Search Tools by Search Method
Directory Search Engine Directory/ Search Engine Multi-Engine
LookSmart* AltaVista Excite All-In-One
- OneKey** Infoseek Mamma
- Hotbot Magellan Metacrawler
- - Yahoo SavvySearch

* Provides a keyword option independent of the subject search.
** Provides a subject option independent of the keyword search.

Search Methods

The following briefly describes each of the search methods employed and suggests exercises for achieving a familiarity with their use.

1. A Directory search tool searches databases by subject matter. It is a hierarchical search that starts with a general subject heading and follows with a succession of increasingly more specific sub-headings. It is also called a "subject search"

2. A Search Engine search tool accesses databases by using keywords. It responds to a specific item, or query, of interest with a list of references or hits. It is also called a "keyword search".

3. A combined Directory/Search Engine is a search tool that uses two allied search methods in concert. As a directory search, it follows a directory path through increasingly more specific sub-topics. At each stop along the path, a search engine option is provided to enable the searcher to convert to a keyword search. The subject search and keyword search are allied, because the keyword search can pose a query on a subject or topic along the Directory's path. The further down the path the keyword search is made, the narrower the search field.

4. A Multi-Engine, also called a meta-search engine, is a search tool that searches databases by employing several search engines in parallel. It then lists the hits either by search engine or by integrating the results into a single listing. The search is conducted via keyword using commonly-used operators or plain language. All-In-One searches operate differently than the others in the Multi-Engine category. It provides a convenient way to search any of a large number of search tools one at a time.

Most Directory search tools today supply search capabilities by both Directory [i.e. subject search] and by Search Engine [i.e. keyword search]. Because of their greater complexity, keyword searches receive far greater coverage here than subject searches,.

Search Exercises

The following ugh the references or hits and select those of interest to you for perusal. Click a hit that interests you to access or link to other references.

2. Keyword Search Type altavista.digital.com/ in the location box of your Internet Browser and press Enter. At the home page, type your query into the search box. Examine the hits of interest and click one to access other references.

3. Directory/ Search Engine Search Follow the same procedure as in [1] above, except at one of the stops along the path switch to a keyword search. Type a query in the search box and examine the hits of most interest.

4. Multi-Engine Search Tool Type www.Savvy.com in the location box of your Internet Browser and press Enter. Type the same keyword query as used in [2] above. Compare the hits with those obtained in [2] above.

Go back and read this section again from the beginning to re-inforce your understanding of the search methods.

B. Operators Used In Keyword Searches

Operators are the rules or specific instructions used in a keyword search for composing the question or query. You begin a keyword search by placing your query in the search box of the search engine's home page. To construct the query, use the appropriate operators for the selected search engine. While each search engine has its own operators, some are common to a number of search engines. The following describes the more frequently used operators, each of which is shown as a numbered heading.

1. Boolean Employs AND, OR, NEAR and NOT to connect words and phrases [i.e. term] in the query wherein: AND requires that both terms are present somewhere within the document being sought. NEAR requires that one term must be found within a specified number of words. OR requires that at least one term is present. NOT excludes term from query.

2. Plus / Minus Signs Employs [+] before a term with no space in between to retrieve only the documents containing that term. It is similar to the Boolean "AND". Employ [-] before a term with no space in between to exclude that term from the search. It is similar to the Boolean NOT.

3. Quote Marks Indicate that the words within the quote marks are to be treated as an exact phrase, or reasonably close to it. It is similar to the Boolean NEAR.

4. Brackets These are used much like Quote Marks but with the additional constraint that the words within the brackets will be considered and searched as a single entity.

5. As Per Example A technique employed to direct a search to that of the example by requesting "more like this".

6. Case Sensitive Adjacent capitalized words are treated as a single proper name. Commas separates proper names.

For more detailed information on the use of operators go to Help Sections of the search tools. AltaVista provides the most detailed operator help section; Excite and Hotbot provide the most concise.

C. Keyword Searching

You will find that keyword searches are easy to use, but not easy to use well. Unfortunately, most search engines developed their systems of search independently. Therefore, there are no standards in nomenclature, database organization or retrieval systems. Thus for each search engine, you get the best results by composing the query for that particular engine.

Table 2 is organized to help you to select the frequently used operators for the preferred keyword search tools. More specialized operators can normally be found in the search tool's help section. [See "Search Tool Reference" in the Appendix]. As can be seen, the table contents above are somewhat incomplete. The authors have contacted the search tools' Web Masters both to verify and supplement the information provided, and now await responses for future inclusion in this table.

Table 2.

Keyword Searching by Search Operators
Search Tool Boolean Plus/Minus Quote Marks Brackets Case Sensitive
All-In-One - - - - -
AltaVista yes yes yes yes -
Excite yes yes - yes yes
HotBot yes yes yes - yes
Infoseek excludes yes yes - yes
LookSmart - - - - -
Magellan - - - - -
Mamma yes yes yes - yes
MetaCrawler - yes yes - -
OneKey - - - - -
SavvySearch excludes yes yes - excludes
Yahoo yes yes yes - yes

In addition to the search operators in the table, most search tools also have unique operators. The more complex searches benefit from broader and stricter adherence to the use of operators.

While each search engine has its own operators, some are common to a number of search engines as can be seen in the table. We designated a selected set of these as "Common Operators". This aspect provides a useful search technique which is illustrated later under Moderately Complex Searches in Section E.

D. Hints

1. Bookmark your favorite search tools for convenient future use. At times you will also want to bookmark home pages for later reference. This is particularly useful when the address or URL is long and complicated. Also, bookmark interesting hit sites during a search, so that later you can find your way back to them.

2. Some search tools provide options in various search categories that help you narrow the focus of your search. Selecting one or more options helps improve the relevancy of the hits. These options are normally shown under the search box.

3. There are times when a search tool will not connect to a Web site for one of several reasons:

4. Use the Help Section of your frequently-used search tools to become informed and remain current as to their use.

5. For keyword searches, it is better to compose the query for the particular search engine you are using. This requires an understanding of its rules of composition. A well-composed query increases relevant responses and reduces the number of irrelevant hits.

6. An extraordinary number of hits often result because the query allows the search of words individually rather than as related words as in a phrase or title. For example. if the query asks for American customs rather than "American customs", then the responses will be for the words American and customs separately, in addition to the coupled words. The quote marks are the operators that connect the two words and limit the search. Other operators act similarly in limiting searches to the intended meaning.

7. Because each search tool has its own method and criteria for seeking and compiling information, their database content and detail differ. Due to these factors, responses to a query will vary from search tool to search tool. For any one query, you will find it more productive to use several search tools to improve your chances of finding the most useful information.

8. During a search you will sometimes find long articles that you prefer not to read or print at the moment. You can defer action by selecting the text, copying it onto Clipboard and then pasting it in a word processing window. Later you can read the articles and decide which parts, if any, you wish to keep for future reference. One limitation of this technique is that tables do not reproduce intelligibly.

9. Some Web sites may not give you the option of deleting graphics. For those of you with computers that are slow to download, you may prefer to use search tools with the least amount of graphics. Among these are Hotbot, Infoseek and Mama. Those with the most graphics include LookSmart and OneKey.

10. A knowledge of how information is indexed can be helpful in selecting an appropriate search engine for a query. There are three methods used in the creation of a database.

Full text indexing: Every word on the web page is put into the database. AltaVista is an example. Its best use is where you want every reference to the specific word or term in the query. However, it is not very useful in a general subject search, because it will produce an enormous number of irrelevant hits.

Keyword Indexing: Words and phrases are indexed in a database based on their location and frequency. However, if a name or term is mentioned only once or twice in the web page, it may not be included in the data base. Keyword indexing is the most used and fastest growing indexing.

Person [Human] Indexing: Unlike the above two indexing methods which employ a robot, this indexing is done by individuals who examine the web pages and select the most appropriate words and phrases to describe them. This provides a directory which is high in relevance and is similar to the way libraries categorize their information.

11. When constructing a Query, avoid using common words, except where modified by a specific one. Otherwise, you will get an enormous number of hits. For example, roof alone is too broad, but "tile roof" as a phrase is acceptable.

12. Use the links in this Guide to provide an easy means of going to related information within the document.

E. Planning And Conducting A Search

Your search for a specific item in a world of information can be difficult, especially if the search is done randomly and without any planning. This section offers suggestions to the beginner on conducting a search in an orderly and informed way.

The following are suggestions for those just starting to learn searching the Web.

Searching By Keyword

There are various levels of complexity in conducting a keyword search, begin with the simpler searches and work your way toward those that are more complex.

1. Simple Searches

For search queries that don't require operators, such as single terms or proper names, use a keyword search engine such as AltaVista or HotBot. These search engines rate particularly high for completeness and currency.

For queries using a phrase, use quotes to enclose the phrase. This will greatly reduce the number of hits and improve relevancy. Also, be sure to capitalize proper names.

2. Moderately Complex Searches

A convenient method of conducting a moderately complex search is to use search engines that have a common set of operators as found in Table 2. These "Common Operators" briefly recapped are:

For a quick and efficient search means , begin with a Multi-Engine search tool such as Savvy, using the above "Common Operators" to compose your query. Savvy works well for this type of search, because it utilizes several search tools simultaneously to provide a relatively short hit list of high relevancy.

If Savvy does not provide adequate results, use one or more of the preferred search engines singly. This approach provides many more hits and therefore more opportunity to capture the information you want. Suitable examples include Infoseek and HotBot.

One good way to use many search tools efficiently is to utilize the All-In-One search tool. It works this way:

Once set up, the procedure works rapidly. The slow part is evaluating the hits. Be prepared for some hits to show up in the results of several search tools. However, some hits will be unique to a search tool, among which may be the reference of most interest to you.

The greater simplicity provided by the use of keyword searches has a trade-off. It will normally produce a very large number of hits of generally fewer relevance. Because the hits will be ranked according to relevance, the first 30 hits or so are most likely to contain the most relevant references.

3. Highly Complex Searches

These searches are for obscure information or difficult to define queries and benefit from the use of a more sophisticated search engine. For these more difficult searches, try the advanced mode of the selected search engine and adhere to its instructions. This requires study of the help section of the search tool and diligent use of its operators. AltaVista so used is a powerful and effective search tool.

Directory And Directory/Keyword Searches

By comparison to Keyword searches, the procedure for a Directory search is rather simple. Such searches are for browsing, where the paths they take are from general subjects to increasingly more specific topics. Follow the search path to the desired topic and then examine the hits that are provided at each stop. The hits will normally contain links that will further your search.

Directories depend on persons to update their databases, and therefore the relevancy of the information they provide is high. However, it is achieved at the expense of completeness and currency of the information in the database. Conversely, search engines collect and update web sites automatically, and therefore are more current and complete., but at the expense of a much larger number of hits of generally fewer relevance. Automatic updating of search engine databases occurs routinely, usually within days. Directory references take considerably longer, normally weeks and sometimes as long as months.

Today, some Directory search tools provide an option for switching to a keyword search at each Directory stop along the way. This allows you to narrow the search field to simplify your search. When choosing the keyword option, compose the query in the search box provided and follow keyword instructions. Excite and Yahoo are effective search tools having this allied subject/keyword capability.

Evaluating Hits

This is usually the hardest and most time-consuming part of a search. The number of hits you obtain can range from none to hundreds of thousands, and their relevance or usefulness can vary from considerable to negligible. There are some things you can do to help produce more relevant hits for the fewest total number.

Success in any particular search query is usually more a question of which search tool has the best database for the subject and how the information is organized for retrieval. This is why it is often necessary to try a number of different search tools when searching for obscure information.

Some search engines list the hits by titles, some by brief text and some give you a choice. When available choose the brief text, as it is easier to evaluate. Even so, it is often necessary to click the link to see the entire document before you can assess its content. Some sites may not be of apparent interest, but will contain links that have great relevancy. Some searches yield the desired information quickly, and some you may just have to plod through.

As you gain experience, you will find the search tools to use that are most appropriate for your particular interests and how best to evaluate the hits.

Summation

Learning to search the Web is an incremental process that builds with experience. You will find that your search skills will increase as you gain greater understanding of search terminology, search tools and their intricacies and the way information is stored and retrieved. The learning process is arduous; the reward is a world of information that is made available to you.

Appendix

SEARCH TOOLS REFERENCES

This section provides a convenient way to access help and background information on each of the search tools listed in Table 2. Because of rapid changes in the search field, you will want to keep abreast of the changes for the tools that you mostly use. The following explains terms used in this section and provides some helpful hints.

Address: is the Web address or URL. You can access an address by clicking it.

Automatic Document Scanning: This is the means of identifying, indexing and cataloguing Web sites. It employs robots or spiders for scanning virtually all web sites to augment and update the databases of search engines.

Bookmark: To access Home and Help Pages conveniently, create an address folder for each under Bookmarks. This is done by going to the Home or Help Pages via the links provided in this guide and adding them to the appropriate bookmark folder.

Common Operators: We use this term to describe a set of most-used operators of the popular search engines. Common Operators are generally compatible with Multi-Engine search tools use as well. [See Searching By Keyword in Section E for a description of their use]

Default: The operating mode when no other is specified.

Frame -based Information: That which resides in a box within a Web page. Some search engines will not search within frames and therefore the information there is not indexed and retrievable.

Full Text: Indicates every word in the text is scanned. The information recorded is therefore potentially accessible via keyword use.

Home and Help Pages: Visit these Web pages for the search tools that you use most to remain current. FAQ [Frequently Asked Questions] also contains help and other useful information.

Relevance Ranking: Each search engine has its own way of assigning relevance. Higher relevance is normally given to query terms in the title and first few words in the document. For some search engines, proximity and frequency of use are also factors. It is unusual that the best source ranks first, unless the query terms are optimally located in the document.

SEARCH TOOLS

We recommend the following search tools, because each has somewhat different capabilities and advantages. In this respect, they complement each other, making it possible to find and retrieve even obscure information. In time, and by trial and error, you will learn which are the best for your use and under what circumstances.

This Reference represents our understanding of present practices. Expect the contents to change as search tools expand their scope and improve their performance.

1. ALL-IN-ONE

2. ALTAVISTA

3. EXCITE

4. HOTBOT

5. INFOSEEK

6. LOOKSMART

7. MAMMA

8. MAGELLAN

9. METACRAWLER

10. ONEKEY

11. SAVVYSEARCH

12. YAHOO

GLOSSARY OF WEB SEARCH TERMS

This glossary contains terms used both in this work and other articles applicable to searching the WWW. For ease of use by the beginner, the definitions are brief and in simple language.

Boolean Search A keyword search that uses Boolean Operators for obtaining a precise definition of a query. [See "Operators Used In Keyword Searches" in Section B]

Browsing A Directory Search, which is a method of searching the Web by subject through linked documents. In popular use, browsing is accessing information from the Internet.

Browser A program used to connect to sites on the World Wide Web. More generally, a program that accesses information on the Internet. Examples of WWW browsers are Netscape Navigator and Microsoft Explorer.

Concept Search A query that implies a term's broader meaning, and not its literal meaning.

Database Stored information about a topic or subject organized for retrieval. A search engine database is kept current by means of an automated search engine procedure called a robot or by author- supplied information.

Directory Search A hierarchical search that proceeds through increasingly more specific headings or sub-topics.

False Drops Documents that are retrieved but are not relevant to the user's interest.

Full-Text Indexing An indexing method where every word in the Web page is put into the database with the exception of prepositions, conjuctions, and the like.

Hierarchical A ranking of subjects from the most general to the most specific.

Hits Documents or references to documents that are returned in response to a query, also called matches or matching queries.

Hypertext Link A highlighted word or image [shown in color] on a Web page that when clicked connects or links to another location with related information. [Links provide an easy way to move about the Internet]

Internet A worldwide collection of computers and computer networks that can communicate with each other. The internet functions through Clients and Servers. Clients are used to access and obtain information from databases. Examples include on-line providers such as AOL and Compuserve. Servers are used to provide information; examples are search tools and electronic mail services.

Keyword Search A search that utilizes terms that define the user's interest.

Link In WWW paralance refers to a hypertext link.

Location Box A designated place within a browser for an address [URL] . It is the starting point for accessing a Web site.

Multi-Engine Search A search that uses several search engines in parallel to provide a single response to a query.

Operator A rule or specific instruction on an aspect of composing a query used to define the information sought.

Phrase Search One that states the words exactly as they are to be searched. [A phrase is a string of words that are adjacent and related.]

Precision A standard measure of information retrieval. It is defined as the number of relevant documents obtained divided by the total number of documents retrieved.

Proximity How closely words appear together within a document. "Adjacency" or "phrase" usually means that words must appear exactly in the order specified with no intervening words. "Near" usually means that the words must appear within a certain number of words of each other, although exact word order is not specified.

Query A search request. A combination of words and symbols that defines the information that the user is seeking. [Queries are used to direct the search tool to appropriate databases.]

Query By Example Use of an example to solicit more like information.

Ranking A means of listing hits in the order of their relevancy. It is usually determined by how well the reference matches the query and by the number of occurrences of the term in the document being searched

Relevance The usefulness of a response to a query.

Robot The software for adding or updating databases by scanning documents via a network of links. [A robot is also known as a spider, crawler and indexer].

Search Box The place within a search engine's home page to enter a query.

Search Engine A computer program that locates information through the use of keywords. The search engine usually resides in a host computer and provides information service to other computers on request.

Search Tool The software which conducts a search by one of several methods, namely Directory, Search Engine, Directory/Search Engine and Multi-Engine.

Site A location on the Internet. In WWW, it is called a Web site and identified by its URL.

Spider See robot

Stemming The use of a stem [i.e. root] of a word to search words that are derived from it. For example, "child" would retrieve information on child, children, childhood, childless etc..

Term A single word or combination of words used in a query.

Truncation See Stemming

Uniform Resource Locator [URL] Uniform Resource Locator is a unique address on the World Wide Web.

Web Server is a computer program that accepts requests for information, processes the requests, and provides files accordingly.

Web Site A specific address or URL in a computer network.

About The Authors

David Habib conceived and composed the tutorial. His main qualification is that he is a recent beginner in conducting searches and thus more aware of beginners' problems. In addition, he has amateur experience in researching and presenting complex technical subjects.

Robert Balliot is the Information Services Librarian of the East Greenwich Free Library in Rhode Island with broad experience in conducting computer searches. He served as an expert resource, ensured the accuracy of the tutorial's contents and produced the Web page.

Request for Comments As this work is published, the authors are still too close to the subject to see its flaws and omissions. We welcome your comments for use in future revisions. Send them to davehab@ids.net.