Traveling salesman (TSP) problem involves an individual who must leave a base location, visit n-1 other locations (each one and only once ), and then return to the base.
The objective is to schedule a minimum- cost itinerary. Since what is important is the circuit executed by the salesman, it is purely a matter of convenience which of the n locations is designated the base.
The study of this problem has attracted many researchers from different fields, for example, mathematics, operations research, physics, biology, or artificial intelligence. This is because TSP exhibits all aspects of combinatorial optimization and has served, and continues to serve as the bench mark for new algorithmic ideas like, simulated annealing, tabu search, neural networks and evolutionary methods.
The TSP has many variations:
Why it is NP
hard.
Only the simplest symmetric
traveling salesman problem is considered in this project. Although the problem
definition is quite simple, the search space of TSP is quite large. Actually, O(TSP)=n!
, which implies that the TSP is NP-Complete. Say a case of 31 cities, if
the computer can check 1 million permutations in one second, it would require
about 4 * 1018 years for the computer to finish checking all
possible permutations. Therefore, we have no choice but seek for heuristics.
As the TSP is NP- complete, generating optimal solution is ruled out given the limitations of computing. All the trials to solve TSP can be classified into one of the categories below:
An
assignment problem may be associated with each traveling sales man problem as
follows. Arbitrary label the locations involved in the traveling salesman
problem with the integers 1, 2, 3 .., n. Consider a set of n “workers” and a
set of n “jobs”. The cost of an assignment, cij , is the cost of
traveling directly from location i to location j. It is clear that every
feasible solution to the traveling salesman problem corresponds to a feasible
solution to the associated assignment problem. However, the assignment problem
will posses feasible solution to (corresponding the non cyclic permutations)
which do not appear a feasible solution to the traveling salesman problem. The
optimal solution to the assignment problem serves as a first approximation to
the solution of the traveling salesman problem. We apply the Hungarian method to the cost matrix of the
assignment problem (which is the same as the matrix of the salesman problem).
steps of
the Hungarian method for the TSP
Step-1. make a table like as shown in table-A
cities à
cities¯ |
1 |
2 |
3 |
… |
n |
1 |
….. |
C12 |
C13 |
|
C1n |
2 |
C21 |
…… |
C23 |
|
C2n |
3 |
C31 |
C32 |
…… |
|
C3n |
.. |
|
|
|
|
|
N |
Cn1 |
Cn2 |
Cn3 |
|
…… |
Table - A
Step-2. As the cost of traveling from location i to i is zero. Replace each dotted entry in the table by an exorbitant number to prohibit those links under an optimal itinerary.
Step-3. In each row of the table A , locate the smallest element and subtract it from every element of that row. Repeat this procedure for each column ( the column minimum is
determined after the row
subtractions). The revised cost matrix will have at least one zero in every row
and column.
Step-4. Determine whether there exists a feasible assignment involving only zero costs in the revised cost matrix. In other words, find if the revised matrix has n zero entries no two of which are in the same row or column. If such an assignment exists, it is optimal and hence stop. If no such exists, go to step5.
Step-5. Cover all zeros in the revised cost matrix with as few horizontal and vertical lines as possible. Each horizontal line must pass through an entire row, each vertical line must pass through an entire column; the total number of lines in this minimal covering will be smaller than n. Locate the smallest in the cost matrix not covered by a line. Subtract this number from every element not covered by a line and add it to every element covered by two lines.
Step-6. go to step4.
If the result corresponds to a feasible itinerary, that must be optimal. If not, a variant of branch and bound may be used to create two new assignment problems, which between them contain the optimal solution of the traveling salesman problem.
Branching is on the matrix element cpq where pà q is any of the assignment in the current approximation (which, by hypothesis, does not reflect a feasible itinerary). One new cost matrix is obtained by replacing cpq by a prohibitively large number; the other cost matrix is obtained by replacing cqp (the transposed element), as well as all elements in the pth row or qth column except cpq itself, by a prohibitively large number.
Example-
Cities
|
1 |
2 |
3 |
4 |
1 |
10000 |
65 |
53 |
37* |
2 |
65 |
10000 |
95* |
10000 |
3 |
53 |
95* |
10000 |
81 |
4 |
37* |
10000 |
81 |
10000 |
The total flight time between the cities is as follows
Cities |
1 |
2 |
3 |
4 |
||
1 |
… |
65 |
53 |
|
||
2 |
65 |
… |
95 |
… |
||
3 |
53 |
95 |
… |
|
||
4 |
37 |
… |
81 |
… |
Original cost matrix
Cities |
1 |
2 |
3 |
4 |
1 |
10000 |
10000 |
10000 |
37* |
2 |
65* |
10000 |
95 |
10000 |
3 |
53 |
95* |
10000 |
10000 |
4 |
10000 |
10000 |
81* |
10000 |
Cities |
1 |
2 |
3 |
4 |
1 |
10000 |
65* |
53 |
10000 |
2 |
65 |
10000 |
95* |
10000 |
3 |
53 |
95 |
10000 |
81* |
4 |
37* |
10000 |
81 |
10000 |
(obtained by replacing all elements in 4th
row and first column by very large number 1à4,
4à3,
3à2,
2à1 Solution 1à2, 2à3,
3à4,
4à1
Step2B –Problem-2
Step2A –problem-1
The solution is 1à2, 2à3, 3à4, 4à1 from problem-1 with a cost of 278 min and the solution of problem-2 is 1à4, 4à3, 3à2, 2à1, with a cost of 278min. both are optimal.
This procedure is computationally
impractical for large problems, so we try to get a near optimal solution by
other methods
Nearest neighbor method
The nearest neighbor method is based on sequentially selecting the cheapest remaining link such that its inclusion does not complete a circuit too soon. The steps are as follows
Step-0. Create a cost matrix adapting step-1 and step-2 of the Hungarian method.
Step-1. Locate the smallest element in the cost matrix (break ties arbitrarily), circle it, and include the corresponding link in itinerary.
Step-2. If the newly circled element is cpq , replace all other elements in the pth row and all other element in the qth column, as well as the transposed element cqp by a prohibitively large number.
Step-3. Locate the smallest uncircled element in the latest cost matrix. Tentatively adjoin its corresponding link to the (incomplete) itinerary. If the resulting itinerary is not infeasible, circle the designated cost and go to step-5.
Step-4. If the resulting itinerary is infeasible, remove the latest link from the itinerary and replace its corresponding cost by a prohibitively large number.
Step-5. Determine whether the itinerary is complete. If so, accept it as the near optimal solution. If not go to step-2.
Example- [NB- here chosen element has been shown by asterisk and selected element by red color underlined digit]
Cities |
1 |
2 |
3 |
4 |
1 |
10000 |
65 |
53 |
37* |
2 |
65 |
10000 |
95 |
10000 |
3 |
53 |
95 |
10000 |
81 |
4 |
37 |
10000 |
81 |
10000 |
we choose cell c14,
Cities |
1 |
2 |
3 |
4 |
1 |
10000 |
10000 |
10000 |
37 |
2 |
65 |
10000 |
95 |
10000 |
3 |
53* |
95 |
10000 |
10000 |
4 |
10000 |
10000 |
81 |
10000 |
we accept 1à4 as itinerary and
choose cell c31
Cities |
1 |
2 |
3 |
4 |
1 |
10000 |
10000 |
10000 |
37 |
2 |
10000 |
10000 |
95 |
10000 |
3 |
53 |
10000 |
10000 |
10000 |
4 |
10000 |
10000 |
81* |
10000 |
We accept 3à1, 1à4 as itinerary and
Cities |
1 |
2 |
3 |
4 |
1 |
10000 |
10000 |
10000 |
37 |
2 |
10000 |
10000 |
95* |
10000 |
3 |
53 |
10000 |
10000 |
10000 |
4 |
10000 |
10000 |
10000 |
10000 |
we reject 4à3 and choose cell c23 Choose cell c43 for further probe
Cities |
1 |
2 |
3 |
4 |
1 |
10000 |
10000 |
10000 |
37 |
2 |
10000 |
10000 |
95 |
10000 |
3 |
53 |
10000 |
10000 |
10000 |
4 |
10000 |
10000* |
10000 |
10000 |
We accept 2à3, 3à1, 1à4 as itinerary
Cities |
1 |
2 |
3 |
4 |
1 |
10000 |
10000 |
10000 |
37 |
2 |
10000 |
10000 |
95 |
10000 |
3 |
53 |
10000 |
10000 |
10000 |
4 |
10000 |
10000 |
10000 |
10000 |
we accept 4à2,
2à3,
3à1,
1à4
as Itinerary
The cost of this itinerary is 37+10000+95+53= 10185 which is very far from the optimal solution
Genetic
algorithm
What is genetic algorithm
Genetic Algorithms are a family of computational models inspired by evolution. These algorithms encode a potential solution to a specific problem on a simple chromosome-like data structure and apply recombination (crossover) operators to these structures so as to preserve critical information. Genetic algorithms are often viewed as function optimizers, although the range of problems to which genetic algorithm have been applied is quite broad.
In other words, genetic algorithms are stochastic search techniques based on the mechanism of natural selection and natural genetics. The GA uses Survival of the Fittest with the different solutions in the population. The good solutions reproduce to form new and hopefully better solutions in the population, while the bad solutions are removed.
How does it works (the general structure of
genetic algorithm)
Genetic algorithms, differing from conventional search techniques, start with an initial set of random solutions called population. Each individual in the population is called a chromosome, representing a solution to the problem at hand. A chromosome is a string of symbols, it is usually, but not necessarily, a binary bit string. The chromosome evolves though the successive iterations, called generations.
chromosome chromosome chromosome Population
During each generation chromosome is evaluated, using some measure of fitness. To create the next generation, new chromosomes called offspring, are formed by either
a. Merging two chromosomes from the current generation using a crossover operator or
b. Modifying a chromosome using a mutation operator.
A new generation is formed by
a. Selecting, according to the fitness value some of the parents and offspring and
b. Rejecting others to keep the population size constant.
Fitter chromosomes have higher probability of being selected. After several generations, the algorithm converges to the best chromosome, which hopefully represents the optimal or sub optimal solution to the problem.
The Genetic Algorithm lets us
obtain solutions to problems that do not have a precisely-defined solving
method, or if they do, when following the exact solving method would take far
too much time. There are many such problems; actually, all still-open,
interesting problems are like that.
Such problems are often characterized by multiple and complex, sometimes even contradictory constraints, that must be all satisfied at the same time. Examples are crew and team planning, delivery itineraries, finding the most beneficial locations for stores or warehouses, building statistical models, etc.
Astonishingly, this crossover-then-selection process favours the apparition of better and better solutions. By encouraging the best solutions generated and throwing away the worst ones ("only the fittest survive"), the original population keeps improving as a whole. This is called "selective pressure".
We are not actually calculating a solution to the problem being treated; we are merely selecting and encouraging the best emerging ones after certain, random operations. This is why we actually do not need to know how to solve the problem; we just have to be able to evaluate the quality of the generated solutions coming our way. All we have to do to let a Genetic Algorithm solve our problem is write a "fitness function" - nothing else !
This very surprising mechanism has been mathematically shown to eventually "converge" to the best possible solution. Of course, "eventually" comes much faster using skillfully written implementations.
The evolution and selection process is problem-independent; only the fitness function and one that decodes the chromosomes into a readable form are problem-specific. Once again, these functions do not require us to know how to solve the problem.
This is a rather brutal
approach, requiring large amounts of processing power, but with the immense
advantage of supplying solutions to things we don't know how to solve, or don't
know how to solve quickly. For instance, what is the shortest path linking a
number of cities ? The only exact solution to this is to try them all and
compare - this will take geological time on any real-world problem. The
Genetic Algorithm provides excellent, fast approximations to that.
No knowledge of how to solve the problem is needed : you only need to be able to assess the fitness of any given solution. This means implementation is easy and can rely on a problem-independent "engine", requiring little problem-related work.
Pseudo code for the genetic algorithm
Let p(t) and c(t) be parents and offspring in current generation t.
Then the pseudo code is as follows
Begin
Tß0;
Initialize p(t);
Eveluate p(t);
While (not the termination condition ) do
Recombine(i.e.
crossover and mutaion ) p(t) to yield c(t);
Evaluate c(t);
Select p(t+1) from p(t)
and c(t);
T ß
t+1;
End
End
Application of genetic algorithm to the TSP
The studies of genetic algorithm/TSP provide rich experience and a sound basis for combinatorial optimization problem. Roughly speaking, major efforts have been made to achieve the following:
1. Give a proper representation to encode a tour
2. Devise applicable genetic operators ( crossover and mutation ) to keep building blocks and avoid illegality.
3. Prevent premature convergence.
Representation
1.Adjacency Representation
2. Ordinal Representation
3. Matrix Representation
4. Path Representation
5. Random Key Representation (used in the project)
A first classification of the representation functions existing for the TSP partitions them into binary and non-binary functions.
Binary encoding fits to the most classical GA model. In this one, each node is represented by a segment of n bits (n is a natural number obtained from rounding up by excess log2m, where m is the number of nodes), and a solution by an ordered sequence of such segments. This encoding has many problems.
First, if the number of nodes is not a power of 2, there exists segments without a corresponding node and, as a result, non-valid binary strings. Second, any operator working on this encoding should consider the segments corresponding to each node as atomic units; in any other case many non-valid strings would been generated. Although some positive results have been reported using this representation , it is usually less efficient and performs worse than other encoding functions.
Non-binary encoding use an alphabet with n symbols (n is the problem size) to represent solutions for the problem. There exist some options to encode: path, adjacency and ordinal.
Path encoding is the most intuitive one: every node is
assigned a number (e.g. from 0 up to n-1) and solutions are represented
by the ordered sequence of visited nodes.
Adjacency encoding uses each position in the string to
represent an edge of the tour. So, if jth position of the chromosome
contains the value k, the edge (j,k) belongs to the tour.
Ordinal encoding is the third option. Using it, each solution
is represented by a list of n values, such that the ith position
of it cannot contain a higher value than n-i, due to the fact that every
gen points to a position within a stack from where visited nodes are
progressively extracted. Initially, the stack contains all nodes in a fixed,
predetermined, arbitrary order.
Random key representation
It was first introduced by Beans. The representation encodes a solution
with random numbers from (0,1). These values are used as sort keys to decode
the solution. For example, a chromosome to a 6- city problem may be
0.23 0.45 0.11
0.43 0.65 0.56
1 2 3 4 5 6
after sorting
0.11 0.23 0.43
0.45 0.56 0.65
3 1 4 2 6 5
My approach
For six city problem
generate 5 random number and indent them from 2 to 6. Then sort it.
Example
--
0.22 0.31
0.95 0.11 0.21
2
3 4 5 6
After sorting
0.11 0.21
0.22 0.31 0.95
5
6 2 3 4
and hence the
chromosome is 1à 5à6à2à3à4à1
Cross over operator
1. Partially-Mapped Operator (PMX): - proposed by Goldberg and Lingle - builds an offspring by choosing a subsequence of a tour from one parent and preserving the order and position of as many cities as possible from the other parent.
Example:
p1 = (1 2 3 | 4 5 6 7 | 8 9)
p2 = (4 5 3 | 1 8 7 6 | 9 3) => swap the selected segments
o1 = (x x x | 1 8 7 6 | x x)
o2 = (x x x | 4 5 6 7 | x x) => define the mapping (1-4,8-5,7-6,6-7)
copy the remaining genes in p1 to o1 using the mapping just defined:
o1 = (1-4 2 3 | 1 8 7 6 | 8-5 9) = (4 2 3 | 1 8 7 6 | 5 9)
construct o2 in the same way:
o2 = (4-1 5-8 3 | 4 5 6 7 | 9 3) = (1 8 3 | 4 5 6 7 | 5 9)
2. Order Operator (OX) - proposed by Davis - builds offspring by choosing a subsequence of a tour from one parent and preserving the relative order of cities from the other parent.
Example:
p1 = (1 2 3 | 4 5 6 7 | 8 9)
p2 = (4 5 2 | 1 8 7 6 | 9 3) => copy the selected segments
o1 = (x x x | 4 5 6 7 | x x)
o2 = (x x x | 1 8 7 6 | x x) => the seq of p2 : 9-3-4-5-2-1-8-7-6
remove 4, 5, 6, 7 in the seq and put it in o1
the remaining seq : 9-3-2-1-8
o1 = (x x x | 4 5 6 7 | x x) = (2 1 8 | 4 5 6 7 | 9 3)
construct o2 in the same way:
o2 = (3 4 5 | 1 8 7 6 | 9 2)
3. Cycle Operator (CX) - proposed by Oliver - builds offspring in such a way that each city (and its position) comes from one of the parents.
Example:
p1 = (1 2 3 4 5 6 7 8 9)
p2 = (4 1 2 8 7 6 9 3 5) => follow the (p1-p2) sequence
1(p1)-4(p2), 4(p1)-8(p2), 8(p1)-3(p2), 3(p1)-2(p2), 2(p1)-1(p2) -> cycle
o1 = (1 2 3 4 x x x 8 x) => fill the x using p2
o1 = (1 2 3 4 7 6 9 8 5)
construct o2 in the same way:
o2 = (4 1 2 8 5 6 7 3 9)
4. Edge Recombination Operator (ER) - proposed by Whitley, Starweather, and Fuquay - builds an offspring exclusively from the edges present in both parents. This is done with help of the edge list created from both parent tours.
Example:
p1 = (1 2 3 4 5 6 7 8 9)
p2 = (4 1 2 8 7 6 9 3 5) => build the edge list: (-) means listed twice
city 1 : edges to other cities : 9 -2 4
city 2 : edges to other cities : -1 3 8
city 3 : edges to other cities : 2 4 9 5
city 4 : edges to other cities : 3 -5 1
city 5 : edges to other cities : -4 6 3
city 6 : edges to other cities : 5 -7 9
city 7 : edges to other cities : -6 -8
city 8 : edges to other cities : -7 9 2
city 9 : edges to other cities : 8 1 6 3
The construction of the offspring starts with a selection of an initial city from one of the parents. The city connected to the initial city with the smallest number of edges is selected as the next one. Note that priority should give to negative cities. From a series of experiments, edge failure occurred at a very low rate (1% - 1.5%).
My approach (ordered cross over )
Left right
Parent Gene1 |
3 |
4 |
2 |
5 |
7 |
6 |
Parent Gene2 |
2 |
7 |
3 |
4 |
5 |
6 |
Parent Gene1 |
3 |
|
|
|
7 |
6 |
Child Gene1 |
7 |
4 |
2 |
5 |
3 |
6 |
Parent Gene2 |
2 |
7 |
3 |
4 |
5 |
6 |
Parent Gene2 |
2 |
|
|
|
5 |
6 |
Child Gene1 |
2 |
7 |
3 |
4 |
5 |
6 |
Parent Gene1 |
3 |
4 |
2 |
5 |
7 |
6 |
Mutation (my approach)
1. Reciprocal Mutation
it selects tow positions at random and then swaps the cities on that position.
Example
Parent Gene |
3 |
4 |
2 |
5 |
7 |
6 |
Child Gene |
3 |
6 |
2 |
5 |
7 |
4 |
2(a) Right displacement mutation - it selects a city at random among citties starting from 3rd position to the end (say element at position y) . Then places it any where in the left of the chosen position y (say it places it at x ) and shifts all cities starting from x to the position y-1 one position right of there original position.
Parent Gene |
3 |
|
2 |
5 |
|
4 |
x Y
Child Gene |
3 |
7 |
6 |
2 |
5 |
4 |
2(b) Left displacement mutation - it selects a city at random among citties starting from 2nd position to the one position left to the end (say element at position x) . Then places it any where in the right of the chosen position x (say it places it at y ) and shifts all cities starting from x+1 to the position y one position left of there original position.
Parent Gene |
3 |
6 |
2 |
5 |
7 |
4 |
x Y
Child Gene |
6 |
2 |
5 |
3 |
7 |
4 |
Other mutation approaches
Inversion: two random points within the string are selected and
the segment between them is inverted. This operator put in two new edges in the
tour.
Insertion: a node is extracted from the tour and inserted in a
random position of the string. This operator introduces three new edges in the
offspring.
Neighbor exchange: it exchanges the contents
of two consecutive positions within
the string, so it may be considered as a restricted version of the reciprocal
exchange or the inversion operator, in that cut or exchange positions are
always consecutive.
Selection (my approach) –
1. Initialization
a. At first 30 chromosome are generated using random key presentation.
b. Then 12 chromosome are generated using OX operator and 8 by reciprocal exchange.
2. Next generation of
chromosome
a. Out of 50, 16 best solutions are selected.
b. 4 new chromosomes are generated using random key representation.
c. 16 chromosomes are created using OX (ordered cross over) operator.
d. 5 chromosomes are created using reciprocal mutation.
e. 4 chromosomes are created using right displacement mutation.
f. 5 chromosomes are created using left displacement mutation.
Why 4 new chromosomes - There are two reasons for that
1. Preventing super chromosomes from dominating population by maintaining too many copies in the population (which may cause it to fall in local minima.
2. Keep the diversity of the population so that the constant generation pool can contain much more information for generic search.
The selection is based on deterministic sampling and enlarged sampling space. Here m parents and l offspring compete for the next generation and m best out of offspring and old parents are selected as parents for the next generation.
Why deterministic sampling? This type of approach prohibit duplicate chromosomes from entering the population during selection. So it is a preferred method to deal with combinatorial optimization problem.
Enlarged Sampling
m best as Parents For the next generation m Parents |
||||
|
||||
|
|
|
|
l childrens |
||
|
||
|
||
|
Modified
Kruskal’s Algorithm for TSP
Original Kruskal’s algorithm is for MST (minimum spanning tree).
What is
Spanning Tree
A
spanning tree of a graph is just a subgraph that contains all the
vertices and is a tree. A graph may have many spanning trees; for instance the
complete graph on four vertices
o---o
|\ /|
| X |
|/ \|
o---o
has sixteen spanning trees:
o---o o---o
o o o---o
| |
| | |
|
| |
| | |
|
| |
| | |
|
o o
o---o o---o o---o
o---o o
o o o o o
\ / |\ /
\ / \ /|
X | X X X |
/ \ |/ \
/ \ / \|
o o
o o o---o o o
o o
o---o o o
o---o
|\ |
/ | /| \
| \ | /
| / | \
| \|
/ |/ |
\
o o
o---o o o
o---o
o---o o
o o o o---o
|\ |
/ \ |
/|
| \ | / \ | / |
| \
|/ \| /
|
o o
o---o o---o o
o
Minimum spanning trees
Now suppose the edges of the
graph have weights or lengths. The weight of a tree is just the sum of weights
of its edges. Obviously, different trees have different lengths. The problem:
how to find the minimum length spanning tree?
This problem can be solved by many different algorithms. It is the topic of some very recent research. There are several "best" algorithms, depending on the assumptions you make.
Why minimum spanning
trees?
The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn't a tree you can always remove some edges and save money.
A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once.
Note that if you have a path visiting all points exactly once, it's a special kind of tree. For instance in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it's a minimization over a strictly larger set.
On the other hand, if you draw a
path tracing around the minimum spanning tree, you trace each edge twice and
visit all points, so the TSP weight is less than twice the MST weight.
Therefore this tour is within a factor of two of optimal. There is a more
complicated way of using minimum spanning trees to find a tour within a factor
of 1.5 of optimal.
How to find minimum
spanning tree?
The stupid method is to list all spanning trees, and find minimum of list. We already know how to find minima. But there are far too many trees for this to be efficient. It's also not really an algorithm, because you'd still need to know how to list all the trees.
A better idea is to find some key
property of the MST that lets us be sure that some edge is part of it, and use
this property to build up the MST one edge at a time.
For simplicity, we assume
that there is a unique minimum spanning tree. You can get ideas like this to
work without this assumption but it becomes harder to state your theorems or
write your algorithms precisely.
Lemma: Let X be any subset of the vertices of G, and let edge e be the smallest edge connecting X to G-X. Then e is part of the minimum spanning tree.
Proof: Suppose you have a tree T not containing e; then I want to show that T is not the MST. Let e=(u, v), with u in X and v not in X. Then because T is a spanning tree it contains a unique path from u to v, which together with e forms a cycle in G. This path has to include another edge f connecting X to G-X. T+e-f is another spanning tree (it has the same number of edges, and remains connected since you can replace any path containing f by one going the other way around the cycle). It has smaller weight than t since e has smaller weight than f. So T was not minimum, which is what we wanted to prove.
Kruskal's algorithm
Kruskal's algorithm is easiest to understand and probably the best
one for solving problems by hand.
Kruskal's algorithm:
sort the edges of G in increasing order by
length
keep a subgraph S of G, initially empty
for each edge e in sorted order
if the endpoints of e are disconnected
in S
add e to S
return S
Note that, whenever you add
an edge (u,v), it's always the smallest connecting the part of S reachable from
u with the rest of G, so by the lemma it must be part of the MST.
This algorithm is known as a greedy
algorithm, because it chooses at each step the cheapest edge to add to
S. You should be very careful when trying to use greedy algorithms to solve
other problems, since it usually doesn't work. E.g. if you want to find a
shortest path from a to b, it might be a bad idea to keep taking the shortest
edges. The greedy idea only works in Kruskal's algorithm because of the key
property we proved.
Analysis: The line testing
whether two endpoints are disconnected looks like it should be slow (linear
time per iteration, or O(mn) total). But actually there are some complicated
data structures that let us perform each test in close to constant time; this
is known as the union-find problem The slowest part turns out to
be the sorting step, which takes O(m log n) time.
Modified Kruskal’s
algorithm for TSP
a. Sort the edges in increasing order of weights
b. Starting with the least cost edge, look at the edges one by one and select an edge only if the edge, together with already selected edges,
1. does not cause a vertex to have degree three or more
2. does not form a cycle, unless the number of selected edges equals the number of vertices in the graph.
Consider an example
c (1,7) d (15,7)
b (4,3) e(15,4) a (0,0) f (18,0)
Step1. {(d, e),3), ((b,
c),5), ((a, b),5), (e, f),5), ((a, c),7.08), ((d, f), 7.615), (b, e),Ö122), ((b, d), Ö 137),
((c, d), 14), ……………………..((a,
f),18)}
Step2.
Select (d, c)
Select (b, c)
Select (a, b)
Select (e, f)
Reject (d, c) (since it forms a cycle with (a, b) and
(b, c).
Reject (d, f) since it forms a cycle with (d, e) and (e, f).
Reject (b, e) since it would make the degree of b equal
to 3
Reject (b, d) since it would make the degree of b equal
to 3
.
.
.
Select (a, f)