Comments from the author of CUSI:
People have often suggested that we extend CUSI to process queries to multiple search
engines simultaneously. This saves the user having to try one after the other.
While possible (indeed such services now exist),
we have always rather disapproved of the idea, for the following reasons:
- Load on the central service
The central service has to contact
a number of services, and requests documents from them. This means this
central site is in fact quite busy, whereas with CUSI the central site
doesn't in fact do much at all, but simply refers a user's browser to the
relevant service.
- Load on the network
With CUSI, because CUSI only refers
browsers, it is likely that less network distance is travelled by the
information. Take for example the case where say Finnish a service handles a
request for a US user, to a US database. With CUSI the search results are
sent from the service in the US, to the client in the US, over the US
network. With the simultaneous approach, the search results are first
shipped to Finland, and then shipped back to the US.
- Load on the remote services
Chances are that one of the
services contacted will return a relevant hit. With CUSI this means the user
can use that hit, without querying further services. But with simultaneous
requests this is not the case, and further services are being searched. WWW
search engine lookups tend not be cheap (like e.g. DNS).
- Relevance of selected services
The user can guarantee the
maximum number of hits by selecting many services (which with
multi-threading may not even give a performance penalty). So it is likely
that more services are selected than strictly required (this is similar to
the previous point).
- Presentation of Results
With CUSI, the search engine resolving
the query determines exactly how its information is displayed. With the
simultaneous approach the central service determines how the result is
shown. This is likely to lead to inconsistently displayed results, might
lead to broken links and illegal HTML, and could in devious cases even be
used to hide the original service, so the central service gets the credit
:-)
- Log skewing
Because a simultaneous search engine chains
queries on behalf of users, the access logs of the search engines it
contacts are skewed. One doesn't normally know if the access from the host
in question was from users at that host, or from the simultaneous search
engine. At the same time this hides useful information about the users
actually issuing the queries, such as domain names, user names, Referer
lines etc.