FTP Mirror

FTP mirror is a program for mirroring FTP sites using FTP.

The distribution is in ftp_mirror-1x0.tgz

The algorithm

We assume two sites. The development site which contains the files of interest, and the ftp site where this data is to be stored.

The top-level of the development site must contain the file ftp.conf this contains information about the ftp site that is to be used and which files are to be copied, etc.

run the command ftp_mirror. This command reads ftp.conf and updates the specified FTP site. The resultant state of the site is stored in a file called ftp_cache.bin. Every time any changes are made to the ftp site the exact state of the site is contained in this cache.

Configuration

Create a file called ftp.conf in the top-level directory of the development site. Here is an example:


{host, "ftp.oocities.com"}.
{user, "some_user_name"}.
{password, "the password"}.
{max_space_allowed, 10000000}.
{root, "dir1"}.
{include, [".html", ".jpg", ".gif", ".tgz"]}.
{exclude_dirs,["tmp", "dustbin"]}.

The parameters have the following meaning:

host The host name of the ftp site.
user The user account name of the ftp site
password The password of the ftp site
root The root directory on the ftp site where everything is to be stored
max_space_allowed The maximum space allowed on the ftp site (in bytes).
include File extensions to be sent to the ftp site (note all other extensions are not copied)
exclude_dirs A list of directories to be not to be copied to the ftp site.

Implementation notes

The algorithm involves computing the following items and then applying the changes:

  1. Old files to be deleted
    These are files on the ftp site which no longer exist on the development site.
  2. Directories to be deleted
    These are directories that exist on the ftp site but not on the development site.
  3. Directories to be added
    These are directories that exist on the development site but not on the ftp site.
  4. New files to be added
    These are completely new files, which have never been transferred to the remote site.
  5. Old files to be refreshed
    These are files which exist in the same place on the FTP site and on the development site. The file on the development site has been changed since it was last copied to the ftp site.

To compute if a file has changed we first check the time last modified. IF the time last modified has changed we compute the MD5 checksum of the file. If the MD5 checksum has changed then we assume the file has changed. If the date last modified is the same we do not perform any further checks on the file. We only check for equality of the date last modified since we make no assumptions as to the accuracy of the clock.