Simultaneous search engines Volver a mb111

Comments from the author of CUSI:

People have often suggested that we extend CUSI to process queries to multiple search engines simultaneously. This saves the user having to try one after the other.

While possible (indeed such services now exist), we have always rather disapproved of the idea, for the following reasons:

Load on the central service
The central service has to contact a number of services, and requests documents from them. This means this central site is in fact quite busy, whereas with CUSI the central site doesn't in fact do much at all, but simply refers a user's browser to the relevant service.
Load on the network
With CUSI, because CUSI only refers browsers, it is likely that less network distance is travelled by the information. Take for example the case where say Finnish a service handles a request for a US user, to a US database. With CUSI the search results are sent from the service in the US, to the client in the US, over the US network. With the simultaneous approach, the search results are first shipped to Finland, and then shipped back to the US.
Load on the remote services
Chances are that one of the services contacted will return a relevant hit. With CUSI this means the user can use that hit, without querying further services. But with simultaneous requests this is not the case, and further services are being searched. WWW search engine lookups tend not be cheap (like e.g. DNS).
Relevance of selected services
The user can guarantee the maximum number of hits by selecting many services (which with multi-threading may not even give a performance penalty). So it is likely that more services are selected than strictly required (this is similar to the previous point).
Presentation of Results
With CUSI, the search engine resolving the query determines exactly how its information is displayed. With the simultaneous approach the central service determines how the result is shown. This is likely to lead to inconsistently displayed results, might lead to broken links and illegal HTML, and could in devious cases even be used to hide the original service, so the central service gets the credit :-)
Log skewing
Because a simultaneous search engine chains queries on behalf of users, the access logs of the search engines it contacts are skewed. One doesn't normally know if the access from the host in question was from users at that host, or from the simultaneous search engine. At the same time this hides useful information about the users actually issuing the queries, such as domain names, user names, Referer lines etc.

Martijn Koster

Simultaneous search engines Volver a mb111

Comments from the author of CUSI:

For questions or comments regarding this service, contact webmaster@emnet.co.uk