22 - Search Engines Revisited


It's been almost two years since I wrote about Internet search engines. Cyberspace years are like dog years and there have been many changes in the search engine world since then. In particular, several new search engines are each battling to become the most-used search engine in the world. The old classics such as Google, Alta Vista, Hotbot, and so forth are still around and gong strong, but the creators of the new challengers hope that their engines will completely replace the current standards.

Google is possibly the best search engine at this time and it seems to be the one to beat. It has the largest number of Web sites indexed in its search base compared to any other engine (about 1 billion) and it works perfectly well for most people and for most searches.

The basics of how search engines function are the same for both the old and new types. You enter keywords and the software checks millions of Web page titles, metatags, and pages of text looking for those keywords. The difference between engines is in how they organize the final results into a useful hierarchy. It does little good for a user to get thousands of hits back on a keyword because there's simply no practical way to check them all.

So how is does a search engine decide what sites to show you? Google ranks Web sites by popularity. That is, the more that people visit a particular page and the more links to that page from other Internet sites, the higher it is ranked in a Google search. The problem with this system is that links and hits may merely reflect how popular a site is rather than how useful it is. In contrast, many of the new engines use a ranking method based on something called "document clustering".

Here are some of the applications and how they compare to Google in their internal workings.

Vivisimo, http://vivisimo.com/, is a meta search engine. It queries to other search engines for keywords and then uses an information clustering algorithm on the page summaries it receives from those engines. That is, Vivisimo only ranks the search engine summaries given to it, it does not do any searching itself. Vivisimo's creators claim that this method makes searching faster than using regular engines.

Lasoo, http://www.lasoo.com/, specializes in searching geographical locations, It's what's called a spatial search engine. It's used to find businesses, services, or jobs in a particular place. You search for a particular business by entering a city and ZIP code or using the "lasoo" drawing feature to draw a circle around a section of a map. Sites in that area are then ranked according to how close they are physically located to the center of the search circle. Lasoo claims to cover areas all over the world, unlike many engines that only deal with US cities.

Wisenut, http://www.wisenut.com/, ranks sites based on the number of links that connect to them just as Google does, but it also uses a context-sensitive ranking algorithm that supposedly measures the relevancy of hits and then clusters the results into more useful groupings. Wisenut claims to have about 800 million sites indexed and is adding more daily.

CURE (Collaborative Use Research Engine), http://www.starpond.com/, is a subscription service that limits its searches to preselected academic subjects. You pay a fee to become a member, you specify the field of knowledge that you're interested in and CURE searches only those academic sources that contain such information. Sites are ranked by how often they are used by others in the same field.

Teoma, http://www.teoma.com/, ranks sites based on how many links they have from other related sites. This differs from Google, which only counts the total number of links from other sites and not whether those sites are similar in subject. Teoma also presents its results in three different levels: normal, which means the most authoritative sites, by topic, and by experts links. Experts links are gateways supposedly created for Teoma by experts in a particular subject. Unfortunately, Teoma is still building its database and only has about 100 million pages indexed at this time. As its database grows larger, its overall usefulness should increase proportionately.

Are any of these new ones really better than the old ones? The only way to find out is to try them for yourself. Select a subject that you've had difficulty with in past searches and see if one of these can do better.

Regardless of which engines you prefer, your choice of keywords is the most important part of the search. Vague words or phrases can produce hits in the hundreds of thousands. On the other hand, well-chosen, precise keywords can place the site you need in the top ten of the results list.


First published October 2001
Copyright 2001
Fred Askew