Jef Poskanzer's original WebCopy Java program was wonderful, but would not follow pages nested in FRAMESET FRAME's, and would not get images used as page BODY BACKGROUND's. So I fixed it so it would. I extended his hand-written HTML parsing finite state automaton. Along the way I simplified his HtmlObserver and HtmlEditObserver Interfaces, making them easier to expand. Other than that, the zipfile contains all the Acme Java support classes necessary to compile and run WebCopy. Enjoy!
Given one or more URLs as arguments, enumerates the files reachable at or below those URLs and copies them to the local disk, creating subdirectories as necessary.
Options:
Sample run:
% mkdir flow % cd flow % WebCopy -v http://www.acme.com/jef/flow/ Copying http://www.acme.com/jef/flow/ to index.html Copying http://www.acme.com/jef/flow/troublemaker_small.jpg to troublemaker_small.jpg Copying http://www.acme.com/jef/flow/cdec.html to cdec.html Copying http://www.acme.com/jef/flow/snapshots/ to snapshots/index.html Copying http://www.acme.com/jef/flow/snapshots/16may96.html to snapshots/16may96.html Copying http://www.acme.com/jef/flow/snapshots/16may96_namerican.gif to snapshots/16may96_namerican.gif % ls -l -rw-r--r-- 1 jef 39759 Jul 5 14:40 cdec.html -rw-r--r-- 1 jef 993 Jul 5 14:40 index.html drwxr-x--x 2 jef 512 Jul 5 14:40 snapshots -rw-r--r-- 1 jef 3107 Jul 5 14:40 troublemaker_small.jpg