Server-side programs: calculation

Which algorithm was used?

In his 2000 thesis on the same subject, Max Ludecke did a pretty good job of comparing Floyd’s and Dijkstra’s algorithm and came down heavily on Dijkstra’s side. Since the author already familiar with Dijkstra’s, an early decision was made to use it. OK, but how does it work? A fully worked example is given in appendix 8, but a quick precis goes something like this:-

Step 1: Generate a list of nodes and routes leading from each node.
Step 2: Mark your "from" node as "known" and make the distance to it zero.
Step 3: Add the distance to the "known" node to the lengths of the routes leading from it. Make the result the distance to the neighbour.
Step 4: Go through the list of towns. Find the one with the minimum distance that isn’t marked as "known"
Step 5: Mark that town as "known" and go to step 3.
Step 6: Iterate steps 3-5 until your destination is marked as "known".

Modifications to the algorithm for this thesis

Now, one of the ambitions for this thesis was to produce a routefinder that would work regardless of the type of route considered: ie one that could cope with the road and rail network combined, indeed one that would make the distinction irrelevant. Much to the author’s irritation, Max had independently arrived at the same conclusion as he had: that the algorithm shouldn’t be looking at distances, it should be looking at times: ie the distances should be measured in seconds, not miles. However, a further modification was needed: that routes should also have a date of birth and a date of death. The reason for this has to be explained carefully, and to do that requires a bit more explanation about another bottleneck: the detail generation phase.

Bottleneck: the detail generation phase.

To use Dijkstra’s, you first have to generate a list of nodes and the routes leading from them. Consider the situation where a train arrives at a station. When it arrives, a "route" from that station springs into existence and will remain in existence until the train leaves (once it’s left of course, the route ceases to exist because you can’t get on it). Now, if our list of routes doesn’t have birthdays & deathdays then that list will have to be edited twice to cover this situation: once to add a route when the train arrives at the station and once to delete the route when the train leaves. And that’s for just one train. Imagine the changes that will have to be made for the entire rail network. This isn’t plausible.

Getting round the detail generation phase bottleneck

The point is that if birthdays and deathdays are incorporated into the list of routes, then a continuous series of adjustments becomes unecessary. Details can just be loaded in (say) once a day and only occasional changes to cope with things like delays (move the birthday & deathday forward) or roadworks (increase the duration) need then be made. This helps get round the bottleneck but it also achieves the ultimate goal: the unification of the road/rail network.

The unification of the road/rail network

The introduction of birthdays, deathdays and durations removes the distinction between roads, trains, planes, etc since they can all be considered as routes with a duration, a birthday, and a deathday. A road was "born" when it was built, will "die" when it’s decomissioned, and will have a "duration" of how long it takes to travel it by car. A train journey is "born" when it arrives at the station (ie the first point at which you can board it), will "die" when it leaves the station (ie the last point when you can board it), and will have a "duration" of how long the journey is expected to last. Even better, foot, aircraft and ship journeys can all be considered the same way, so effectively the whole sphere of human transport has been unified. Not bad for a day’s work.

Was a new program written?

The original intent was to write a program to do this from scratch. However, Oege DeMoor had already written a program using Dijkstra’s for one of the course modules. Wouldn’t it be quicker to just use that instead? Now, this wasn’t as clear cut as it seemed: it was written in Oberon (which doesn’t cope with dates very well), and the calling procedure wasn’t as simple as it could have been. Howevere these were minor points and (as per the fifth design principle: don’t keep a dog and bark for yourself) it was decided to take the program, acknowledge the source and modify as necessary.

Incidentally this vitiated the decision to use Perl for the junction program: if another language had been used, then it couldn’t have coped with Oege’s Oberon program.

What changes were made to Oege’s program?

Oege’s program "planner" got the origin & destination from a GUI called "autoserve" and fed the optimum route back to it. It had to be changed to receive info from the "junction" program , feed the optimum route back in a form that the translation program could understand, cope with birthdays and deathdays, and lots of other little changes. Broadly speaking, the changes looked like this:-

Change 1: Add time details to the route table

"Duration", "birthday" and "deathday" information had to be added to the route data structure. Now this highlighted an interesting problem: the different ways languages handle time. Perl measures time in number of seconds since 1970 (currently a nine-digit number), but Oberon uses exponential maths to cope with numbers that big and can’t really cope when the numbers are only slightly different – for example it’ll think that 987654321 and 987654123 are the same number, since they’re both 1 x 1010 . This meant that Oege’s Oberon program couldn’t cope with the dates that the Perl program was feeding it. After some thought, it was decided to use "number of minutes since 2000" as the time standard: that won’t be a nine-digit number until the year 2190 so Oberon can cope with it, and it’s easy to derive from the Perl standard so Perl can cope with it. An example of a changed route table is given in appendix 7.

Change 2: Add time and backlink details to the node table.

Each node was changed to use "duration" instead of "distance", and more details were included for the backlink information. An example of a changed node table is given in appendix 7.

Change 3: Change "visit neighbours" subroutine to cope with time details

The guts of Dijkstra’s is in Step 3 above, which is usually referred to as the "visit neighbours" subroutine. This was expanded it to take account of the "birthday", "deathday" and "duration", thus:-

   If   time to known node > deathday
   then don’t consider that route
   else if time to known node > birthday
        then time to neighbour = duration from known node to neighbour + time to known node
        else time to neighbour = duration from known node to neighbour + birthday

Change 4: Change the output standard

Oege’s output was set to match the "autosnail" program, but it had to match the translation programs. This was done by surrounding the output with tags: the "from" part would be enclosed in "A" tags, the destination in "B" tags, and so on. This structure was chosen because it was easy to understand and fitted in nicely with the Perl "grep" construct and the translation programs. A full list of tags is given in appendix 10.

Change 5: Remove unnecessary code

The program had code set up to output tracking messsages for "autosnail": this was unnecessary and was removed. It also had an Euclidean check to see if the route distance was less than the straight-line distance between two points: since the program was now using duration instead of distance this check became positively dangerous and had to be removed.

Change 6: Change route and node tables to cope with the machine/human difference.

The next chapter will explain this in greater detail, but suffice it to say here that concepts like the "from" node were split into two parts: "machine values" and "human values". The route and node information had to be expanded to include these details.

Was it worth it?

There are very good reasons why one should not use somebody else’s program: you have to spend time working out what the original author meant, you run the risk of retaining now-useless or redundant code, and of course you can’t claim as much credit if you’d written your own program in the first place. Also there were so many changes that had to be made to Oege’s program to make it fit its new function then it may have been simpler to write a new one from scratch. However, remember the fifth design principle? Using a pre-existing program meant that a working version was in place from the start of July and could be progressively modified as time went on. If a new program had to be written from scratch then it would not have been ready until the end of July/beginning of August and inevitably something would have gone wrong with such a tight timetable. Despite all the problems, the use of Oege’s program shaved a full month off the development timetable and enabled the thesis to reach a successful conclusion. It was therefore very much worth it.