How Search Engines Work


 

Search Engine Computers, Databases and Ranking Algorithms


We begin our intermediate education with a description of search engine servers, followed by search engine databases, and finally general information about search engine ranking algorithms.

Search Engine Servers

Search engines, like all other web sites, are housed on high speed computers called WWW servers. They are completely dedicated to providing effective search services 24 hours a day. Search engine servers are connected to the backbone (high speed infrastructure) of the WWW via extremely fast, expensive telephone lines called T3 lines. Most of Yahoo's servers, for example, are located in Santa Clara, California.

Search Engine Databases

Before search engines can function, they need to have a collection of information (a database, also called an index) to search. No search engine actually goes out onto the WWW to look for matches when a query is entered. Think about it: web sites sometimes go offline for maintenance, and connection speeds vary depending on how busy the web is at any given time. If a search engine were to initiate its search of the WWW when its visitor clicked the "Search" button, its search would take weeks, not seconds!

The solution to this problem is the creation and maintenance of an enormous database. When a surfer performs a search, the engine searches its database, not the WWW itself. Ideally, these databases are a perfect, complete reflection of the WWW. Due to additions and deletions and changes of thousands and millions of web pages every minute of every day, no search engine database meets this lofty goal. If it did, it would simply be a copy of the whole WWW! Realistically, each database is at least a large variety and significant sampling of quality web sites. At most, these collections sport an impressive, frequently updated and detailed majority of the WWW.

Once a database is in place, search engines keep much of these giant summaries in the memory of their computers, not just on hard drives or other mechanical storage media. Electronic searches (in memory) are much faster than mechanical searches (on hard drives) because electronic searches can be performed at the speed of electricity (near the speed of light). In this manner, a search engine can search through its database of millions of web site summaries within a few seconds, delivering very fast results. Most household PCs these days have around 32 MegaBytes (MB - millions of bytes) of memory in them. Computers used as web servers for search engines have GigaBytes (GB - billions of bytes) of memory to allow them to maintain much of their huge databases in quickly searchable electronic memory.

Search Engine Ranking Algorithms

After the database has been created and placed in the search engine computer's memory, the device is finally ready to perform searches and deliver results. Only now does another device come into play: the ranking algorithm. All search engines, including directories, score the relevancy of web pages through these mathematical machines. Their purpose is to deliver links to web pages most relevant to each search phrase. Rightfully so, these automatic mechanisms are a source of great pride and revenue for their inventors.

When a surfer types in a search phrase on a search engine and hits the "Search" button, the algorithm jumps into action. Say, for example, that a surfer types in "martial arts in phoenix" as their search phrase. The algorithm then looks at the first database entry in its memory, searching for occurrences of the entire search phrase, or for occurrences of the individual key words "martial", "arts" or "phoenix" (extremely common words like "in" are usually ignored).

Each ranking algorithm assigns different weights to different occurrences of the key words, depending on where and in what form these matches are found (more on this below). Taking all these factors into account, these algorithms generate a relevancy score for the first web page in their memory. They then proceed to do the same for the second, third and millionth web pages. Finally, the relevancy scores are sorted in order from most relevant to least, and the corresponding web pages are listed in this order with informative summary information from the database. Viola! The surfer (hopefully) gets the results he or she was looking for.

Although all search engines incorporate the basic components described above, the boundaries among these components are not rigid. The designs of a search engine's database and of its ranking algorithm go hand in hand, and usually it's difficult to discern where one ends and the other begins. For example, some search engines might calculate and store ranking information for obvious web page themes during the creation of their databases, in order to speed up the job of the ranking algorithm. Major functional differences are also apparent between deep search engines and directories, beginning with their distinct approaches to building databases.

Previous: Search Engines for Intermediate Students
Next: How Search Engines Rank Web Pages