Drawbacks of Web Robots
Drawbacks
- They use bandwidth as they surf the Web and retrieve sites
- WebBots can stress/crash servers with their "rapid fire" (requesting many pages from the same server
in a short period of time) retrieval of pages
- Only known pages are harvested: WebBots work by knowing of one server
and "jumping" to servers linked to by the known server. This leaves an
estimated 5% of Web pages isolated, as they are not linked to other pages
- WebBots don't do their work in real time: it takes a day or three from
the point where the WebBot discovers the page to when that page can be
retrieved by a search engine working in concert with the WebBot.
- WebBots can be turned away from sites
(see the following page, The Standard for Robot Exclusion for details)
- rogue bots are programmed to purposely crash servers by continuously
requesting large numbers of pages; WebBots can also be programmed to
retrieve copies of pages or entire sites for loading elsewhere as the work
of someone other than the true author
- WebBots can be programmed to cast votes in polls, etc. (one recently
threw off a Web poll done by People Magazine)
- hits made by WebBots can inflate page hit statistics: when a WebBot
hits a site, the hit rate of the site will spike; a Webmaster may not
realize those hits were an infrequent WebBot hit and not a trend.
Advertisers, learning this, are sometimes reluctant to depend on page hit
counts as the costing criteria for Web ads.
Prev
Index