iWorld Web Site Newsstand Trade Shows Advertising Rates and Information Corporate Information Search Page Subscription Information
Internet.com IW Online IW Online toolbar Mecklermedia toolbar

Well-built. Dependable. Dedicated to a long-term relationship. IBM Netfinity 7000. Let's get together.

IW Online Special Report



The Right Search Engine
IW Labs Test 

By David Haskin 

All search engines can uncover a needle in a haystack. But how much work will you have to do?

Imagine being transported to a large city and being asked to find a small object that might be hidden anywhere. That's the challenge you face when you try to find useful information among the staggering number of pages on the World Wide Web.

Web search engines are the solution, but before you even start your search, you face a choice. Hyperbole is common among the search engines: Each claims to be the best.

To help Internet World magazine readers find the site that works best, IW Labs put the search engines through their paces by comparing the results using a long list of search terms.

We tested six of the leading search engines: AltaVista, Excite HotBot, Infoseek, Lycos, and WebCrawler.

We found that each can find an enormous amount of information, but a few are clearly superior in the way they home in on the most relevant information and in the interface they offer.

We did not review sites like Yahoo or Magellan because they don't search the entire Web. Rather, these are directories of the Web, closer in nature to gigantic phone books than to search engines. Directories are useful services, and they often include reviews to guide you, but they don't list as many sites as search engines. In addition, while search engines search through the actual contents of sites, querying a directory only examines descriptive words provided by the directory service. Interestingly, most directory services provide an option to launch full Web searches, but when they do, they actually use one of the six main search engines.

Directories are good if you are willing to sort through menu after menu, hunting for the best site. Search engines are much better if you just want to see what's available on a topic since the engine does the hard work. You type in a word or short phrase, and the search results are displayed in a long list you can review.

Search engines also have the advantage of finding information that the directories didn't include on a particular topic. For example, if you're looking for information about a product, you probably want to see not only information created by the company selling the product, but off-hand references from people who are voicing their opinion in a page that wouldn't show up in a directory. The technology behind search engines makes them more effective in uncovering these subtle references.

Robot Technology

Search engines rely on two tools to gather the information from the World Wide Web: spiders and indexes.

Spiders, sometimes called bots (as in robots), roam the Web, crawling from site to site. Some spiders move from one site to another indiscriminately while others prioritize and focus their attention on the most popular sites. Depending on your needs, one approach isn't necessarily better than another. For instance, it does little good to view a list of 50 irrelevant pages from the same irrelevant site, which can occur when a search engine indexes every page that its spider can reach. Picking popular sites provides more concise results.

Once the spider is at a site, it reports back to the search engine and indexing begins. Indexes have been used to speed retrieval since long before the World Wide Web--they're part of most database programs and even address book software uses them to help you find information faster.

An index is a list of every word found at every site with a pointer to its precise location (except for unimportant words like "the," "and," and "but"). When you search for the word "widget," the search engine submits that term to the index. The index then finds the word and displays a list of pages containing it.

As with spiders, however, indexing varies among the search engines. Some engines index the entire contents of the page. Other engines index only specific parts, such as the top-level heading. And some search engines look at key words, embedded in "meta tags" at the top of the page to categorize the content.

Interfaces Can Help or Hinder

Whichever technology a search engine uses, it eventually must serve the user--that's us. Differences in user interfaces can be just as pronounced as differences in the underlying technology.

The basic drill for all Web sites is the same. You visit the site, enter a term, click a button to start the search, and, a few seconds later, view a list of sites that meet your search request. As with any program, an important question when interacting with a search engine is: How easy is it to use?

Search engines should make it simple to frame complex searches. If you search for "automobiles," expect many responses that will take a long time to sort through. However, what if you only want to know about the Ford Taurus and Chevrolet Lumina? For such a narrow search, you shouldn't have to wade through a list of every document on the Web containing the word automobile.

Boolean (or logical) search expressions can narrow down your search. Here are some sample searches for a car shopper:

  • "Ford Taurus" or "Chevrolet Lumina" finds all pages that include either of the terms.
  • "Ford Taurus" and "Chevrolet Lumina" finds pages that include both phrases.
  • "Ford Taurus" and "Chevrolet Lumina" not "Honda Accord" finds pages including information about the Ford and the Chevy, but not the Honda.
The first search would be useful if you are researching both cars and don't care whether a site talks about one, the other, or both. The second search would be helpful if, say, you are looking for comparisons between the two autos. The third search would be useful if the previous one turns up a lot of reviews that include the Honda, which you don't want to read.

These Boolean searches can be complex to construct. The search engine should provide some help, either through interface refinements or through the help system.

Note, too, that we placed the full model name in quotes, such a "Ford Taurus." That instructs the search engine to look for "Ford Taurus" as a phrase, not as separate words. If you don't designate the phrase, the search engine will find documents that have the words "Ford" and "Taurus" anywhere within them. In other words, it will find a Web page about somebody named Arthur Ford whose zodiac sign is taurus. The interface of the search engine should make it easy to designate phrases.

Another interface issue is output--how does the search engine display the items it finds? We've all been frustrated by search results that list, say, the first 70 characters from each Web page. Other times that's useful because Web authors tend to offer site descriptions at the top of the page. Sometimes, however, such a list is meaningless. The search engine should make sense of the results.

Web queries often turn up large numbers of documents, so the search engine should provide some idea of how relevant each page is to our query. The search engines provide a list with the most relevant sites at the top of the list. Relevancy ranking is another old technology that, typically, uses factors like how frequently the search term appears in the page. Some search engines combine word frequency with other factors, such as how often the Web page is visited and how close together in the page multiple search terms are.

Relevancy ranking is, at best, an imprecise science. If the developers of the search engine implement it poorly, the rankings will be meaningless. It also is impossible to benchmark, so we did searches on topics with which we are very familiar to get a feel for the accuracy of the relevancy ranking at each site.

Our reviews, especially our selection of "Best of Test" was based on the accuracy of relevancy ranking, the help offered in constructing Boolean searches and the comprehensiveness of the site. You may find that while a few are clearly better than the others, a single search engine isn't best for you. So while we selected a single "Best of Test," we're also printing a detailed report on each site, so you can be sure you're using the search engine that offers what you need.

Best of Test

We're still looking for the Holy Grail of the Web--the search engine that can find absolutely everything but is simple enough for even newcomers to use. Until that day comes, if we could use only one search engine, it would be HotBot.

In our tests, HotBot's search results were unmatched. It also provides what is arguably the simplest-to-use and most customizable user interface in the group. Its skills at refining searches also are the strongest--we particularly liked its ability to search based on when pages were last modified. Its search results are pleasingly displayed and we thought its relevancy rankings were reliable.

AltaVista, the co-winner of the Best of Test last time (Internet World, May 1996), still is a force to be reckoned with. It was only slightly behind HotBot in our retrieval tests, although it retains its busy interface and difficult-to-understand search results pages.

Slightly behind AltaVista was Infoseek (also a co-winner last time) which is a powerful search engine and provides many ways to refine searches. While several other search engines also provide Yahoo-like directories, Infoseek is easily the best combination of Web searching and directory. As a result, Infoseek is an excellent all-around choice.

The other search engines all have something to offer for occasional use. Lycos is comfortable to use and offers a lot of flexibility forfinding additional information after you've searched. Excite makes it easy to search through a variety of sources, including news stories. It also combines its competent search engine with a directory service. WebCrawler is the least powerful search engine but offers niceties such as a listing of sites that are most popular among its users.

Here is a report on our test for each of six search engines.

AltaVista
Excite
HotBot
InfoSeek
Lycos
WebCrawler

Alta Vista

AltaVista is a powerful search tool. However, it's like a nitro-fueled dragster: very powerful but you wouldn't drive it to the grocery store. Similarly, Alta Vista is a bit much for quick, simple searches.

AltaVista no longer came out on top in our search tests, although it was among the best on most searches. It did ace our tests for finding obscure references on obscure pages. For example, when we searched for a telephone number included 0n a back page of a law firm's Web site, it correctly found the page.

You can create extremely precise searches with AltaVista. Like some of its competitors, it can search through the source code of each page so that, for instance, you can find pages with specifically named image files. Or, you can execute a search based on the URLs to which a page links. In other words, you can find all pages with links to your home page.

AltaVista's advanced search syntax, however, is complicated and its help system doesn't help much because it's mired in technical jargon. Still, in a unique, techie-centric way, Alta Vista is trying to help. It offers a new interface to help frame complex searches, although some still may find this new interface too complex.

It works like this: If you perform a simple search that finds many pages, AltaVista optionally displays a "topic map." This map summarizes the search results as a flow chart. Your search term is in the middle of the chart and boxes shoot off from it containing keywords located within the found documents. If you click on one of those boxes, another box appears with common terms found within documents containing both terms.

For instance, our search for "bicycling" found more than 31,000 pages. Radiating out from the box representing "bicycling" were boxes with bicycle-related keywords like "touring," "rides" and "helmet." We clicked on the "Touring" box and one of the keywords that appeared within it was "racing," indicating that, within pages that included both the words "bicycling" and "touring," "racing" was a common word. We clicked on "racing" to add it to the search box, making our search term "bicycle" and "racing."

AltaVista also is known for its dense search results screens. You can choose between standard or detailed forms, but we found both results screens equally dense. It also can display results only with URLs and a few keywords from the top of the document. Alta Vista's relevancy ranking was, in general, useful. However, it wasn't as consistently useful as the ranking found at the HotBot site.

In the last year, serious competitors have emerged. However, AltaVista still is an excellent site if you need to cast your net widely for specific information hidden somewhere on the Web.

Excite

Excite won't live up to its name if you're a hard-nosed Web researcher, but it does offer a lot to less demanding users. It has some interesting features for finding information and it provides a lot of flexibility when viewing the information you've found.

The most interesting of those features is ICE -- Intelligent Concept Extractions. This examines your search request and looks for synonymous and similar meanings and searches for them, too. For example, if you search for "youth" it also will search for "teenager."

Despite this technology, in all our simple and complex test searches, the number of pages Excite found was solidly in the middle of the pack. In addition, it found none of our obscur sites and had the worst performance in the group in those particular tests. Nor does Excite have the level of search options found at sites like HotBot or AltaVista. For instance, you can't search based on page modification date or the name of an embedded GIF file.

While these shortcomings mean that serious Web mavens probably won't be excited, average users will like its simple-to-use interface. For example, besides searching the Web, you can choose to search through news articles, city guides, or Excite's directory listings--you need only make your selection by clicking on a radio button.

Next to each found document is a "More Like This" button. If you click on it, Excite finds similar sites to the found one. This makes Excite an excellent choice for serendipitously searching for information by following links.

We also liked that you could sort the list by Web site. Typically, the search engines return multiple pages from the same site. Sorting the list by Web site is a good way to see which pages at individual sites answer your query. If you wish, you can jump from Excite's listing to the site's home page instead of to the specific page found by the search.

We found the relevancy rankings to be above average in reliability and the presentation of found sites is attractive and easy to read. At the bottom of each search results page, are icons for applying the search term to other to resources, including WebCrawler, which Excite now operates. On the downside, Excite extracts key words and phrases to summarize found sites; we often found those summaries difficult to understand.

In the last year, Excite has added extensive directory listings. That, combined with its easy-on-the-eyes and flexible search result screens, make Excite an excellent choice for day-in, day-out Web browsers.

HotBot

HotBot is a relative newcomer to the search engine field and it is, indeed, a hot site. It was the most powerful searcher in our tests, it has a rich set of search capabilities that are easy to use, and it sports an attractive interface.

In the more-is-better world of the Web, HotBot claims to have indexed the full text of more than 50 million documents, which ties it for first place with Infoseek. But these are marketing claims; the proof is in the searching. When it got down to cases, HotBot found more documents in our searches than the other search engines. It also aced our test searches for obscure sites, finding, for instance, misspelled references in one obscure site to the name of another lightly trafficked site.

It isn't just its power that makes HotBot the best bet for searching. Its interface is a delight to use. It doesn't force you to learn Boolean syntax, for instance. Instead, you can create Boolean queries by selecting operators from drop-down lists and typing your terms.

Its advanced querying capabilities are quite strong. You can ask it to return only exact matches to your request or near matches. You can limit your search to specific domains (such as .com or .org), to geographic locations, and it searches for embedded items like ActiveX controls, Java applets, images, or videos.

Our favorite search narrower, though, was HotBot's ability to search by the date the page last was modified. Searching for recently modified sites is a good way to avoid a long list of dead end pages. You also can search within the search results. Although the self-evident interface is unlikely to baffle you, HotBot's help system is well written and eschews technical jargon.

We also liked HotBot's on-screen layout. To the basic search screen, you can display modules for advanced capabilities, such as limiting searches to specific domains. After you create a search page that suits you, you can save it so it appears automatically the next time you check in. HotBot does this by saving a cookie to your hard drive.

HotBot's readable output and generally accurate relevancy ranking is another plus. The results are attractively placed on the page, with the page title at the top of the listing. HotBot does a good job of creating the summary without providing a gibberish explanation. You also can ask HotBot to display more terse descriptions.

The only thing missing from HotBot is directory-like services. It has a link to the Wired Source, developed by its corporate sibling, Wired magazine, which provides links to a handful of useful sites. Unlike multipurpose sites like Infoseek and Lycos, however, HotBot is solely for searching.

HotBot is attractive, powerful, and easy to use. That makes it an excellent choice for both experienced searchers and relative newcomers.

Infoseek

Infoseek tries to be the best in two worlds. It's a darned good search engine and it's a directory service that includes useful features such as personalized news and links to tools for finding phone numbers and businesses.

Whether it's the best search engine, as it claims, is questionable. However, in the last year, Infoseek underwent a significant remake and its search engine has greatly improved--it claims an index of 50 million pages.

The results for Infoseek Web searchers were quite good overall. Our basic searches typically turned up about a third fewer sites than HotBot, but that still was ahead of most of the other search sites.

Plus, our tests confirmed Infoseek's claim that it does a better job than its competitors at removing defunct Web pages. For instance, most other search engines displayed a site that was updated three months before our test, then removed one month after that. Infoseek had removed that site from its index. This diligence undoubtedly explains, in part, why it didn't display as many found pages as other search engines.

Infoseek's advanced querying capabilities give you a running shot at finding specifically the pages you want. For instance, like HotBot, you can search within previous search results. Its advanced search capabilities enable you to search for words within URLs. For example, if you know the name of a company has "Johnson" in it, searching the full index for that name finds an overwhelming number of responses. Searching only for URLs containing the name Johnson provides a more manageable number of responses. In addition, like several other sites, you can search for pages with specific links in them, such as pages with links to your home page.

Infoseek's searches are case sensitive and, notably, it offers plain language queries. For some queries like "what are the lyrics of 'My Funny Valentine'" it worked. But for other queries like "how many home runs did Ted Williams hit in 1955?," it didn't.

After searches, particularly after general searches that find a lot of pages, its Related Topics lists can be extremely handy. For instance, after searching for "bicycling," the related topics included "Cycling associations" "Bike racing" and "Street biking."

We were less satisfied, however, with the interface and output. We found most of Infoseek's screens to be over-busy and its search result screens are no exception--it was often was hard to read the results with links to other Infoseek items crowding the screen. We also found its relevancy ranking frequently often wasn't very useful. Nor does Infoseek offer many display options. You can show the summaries along with found documents or you can hide them and see only the URLs of found pages. Infoseek would benefit from an option to show brief summaries.

Strictly as a search engine, Infoseek is a strong contender. However, by combining its search engine with a thorough directory service, Infoseek makes a strong case for frequent visits as you navigate the Web.

Lycos

Like other Web search engines, Lycos claims it is "the most complete catalog of Web site addresses available." In reality, Lycos won't win any contests of Web searching strength. However, over the years Lycos has combined its competent search engine with a decent Web directory to make itself a helpful tool for finding what you want.

Lycos' test results consistently placed it toward the back of the pack. On our simple search for "bicycling," it turned up less than a quarter of the sites found by HotBot. Also, Lycos found only a handful of obscure sites.

While Lycos can't match the brute strength of HotBot, AltaVista, and Infoseek, it provides laudable searching finesse. Its flexible and easy-to-use custom search syntax enables you to find only exact matches to your search term or to use stemming, which would find "bicycles" and "bicycling" if you search for "bicycle." Uniquely, you also can have Lycos retrieve pages only if they contain the search term a specific number of times.

Another likable aspect to Lycos is its flexibility after you complete the search. You can click on a button in its "Get more on" box to have Lycos find images, audio clips or other multimedia items relevant to the search topic. Another button finds Lycos' "Top 5%" sites related to your search request. These site reviews by Lycos' editorial staff are included in the directory.

Lycos did a consistently good job of placing the most relevant pages near the top of the list. The help system is thorough, well written, and quite personable--it's arguably the handiest search engine help system. Besides being lucid, it provides many examples. On the downside, the search results screen provides too little information to accurately describe found sites.

Along the left side of the screen is a list of broad topics offered by Lycos--clicking on one of the topics provides a Yahoo-like listing of sites. It also has links for services such as stock prices and guides to many large cities.

If AltaVista is like a nitro-powered dragster, Lycos is more like a basic sedan, comfortable and appealing to a wide range of users. Put differently, HotBot, AltaVista and Infoseek are better tools for hardcore researchers but Lycos is a reasonable choice for those who simply want to get the most out of the Web.

WebCrawler

WebCrawler was the first large-scale Web search engine, a university project when the Web was in its infancy. Today, WebCrawler takes a less-is-more approach to searching the Web. The pages at this site, starting with the home page, have plenty of white space, making WebCrawler easy on the eyes.

WebCrawler also offers less when it comes to customization options. You can set it to display either the page titles of foundsites or summaries, and you can ddetermine whether it shows 10, 25 or 100 results at a time. You can determine whether it shows a little icon to show relevancy or whether it displays a percentage. Beyond that, though, there are few additional tweaking options.

After the jam-packed pages of sites like Alta Vista, a little sparseness would be welcome--if WebCrawler provided more power. However, it consistently came in last in our search tests, often finding a fraction of the sites found by the high-end search engines like HotBot and AltaVista.

Among its more appealing features, however, is natural language querying. Like Infoseek's natural language querying tool, in our tests WebCrawler's worked well in some cases and not well in others.

One useful but uncommon querying capability that WebCrawler does support is proximity searching, which finds one word within a specified proximity of another. This enables you to, say, find Web pages in which "insurance" is located within three words of "fraud." This is a good way to find pages about specific topics that don't lend themselves to phrase searching.

WebCrawler's search result pages have some pleasant surprises. If you search for, say, "restaurants in Chicago," the first item in the list asks if you want to see a map of the city. Like Infoseek, WebCrawler provides a "Find Similar Pages" link for finding pages that are like the one you selected. However, we found it difficult to learn much about the contents of pages because the summaries often were garbled. Also, we didn't have a high level of confidence in WebCrawler's relevancy rankings.

Over the years, WebCrawler has added a modest directory to its offerings that some may find useful. A fun aspect of the directory service is its listing of the most popular sites that people jump to from WebCrawler so you can see what's popular.

Perhaps it shouldn't be surprising that WebCrawler hasn't kept up with the times. Once a university research project, it now is maintained by a competitor--Excite. WebCrawler is like a personable pioneer, rich in history. But it's a bit out of date.


URLs for the Search Engines

AltaVista -http://www.altavista.digital.com
Excite -http://www.excite.com
HotBot - http://www.hotbot.com
Infoseek - http://www.infoseek.com
Lycos - http://www.lycos.com
WebCrawler -http://www.webcrawler.com


David Haskin is frequent contributor to Internet World.



Maximum Power. Maximum Control. And it looks cool too. Check it out. IBM Netfinity 7000

Copyright 1997 Mecklermedia Corporation.
All Rights Reserved. Legal Notices.