The Traveling Salesman Problem

Want to skip directly to the map generator? Click here.

The Conundrum

A salesman is planning his next business trip. He has a list of cities he needs to make stops in, and he only needs to pass through each city once. To save time and money, he'd like to arrange his travel plans so that his journey covers as few miles as possible. How should he plan his trip?

The traveling salesman problem, often referred to as TSP, has been vexing mathematicians, computer scientists, and businesspeople alike for over 200 years. The origins of the TSP lay in the "Knight's Tour," a puzzle based on the movements of a knight in a chess game first analyzed mathematically by Leonhard Euler in 1759. Given that the knight can only move about the board in an "L" shape, is it possible for the knight to land on each of the 64 squares on a chessboard exactly once? It is, and it's also possible to finish on a square that would allow you to move back to your starting point in one move. This is referred to as a "re-entrant" path, and once the move is made the path becomes a "Knight's Circuit."

The Knight's Circuit is a type of Hamiltonian cycle, so named for the 19th century mathematician William R. Hamilton. In graph theory, a Hamiltonian cycle is a circuit that visits each vertice on the graph exactly once (excluding the vertice where the circuit starts/ends). Hamilton used this concept to design the Icosian Game, a puzzle in which the player attempts to find a Hamiltonian circuit in the edge graph of a dodecahedron. While the game itself was not a best seller, the concept was an important one.

The first known mention of the TSP in particular was in a book written in 1832 by - oddly enough - a traveling salesman. BF Voigt, a German businessman, was thinking of practical considerations rather than mathematical brainteasers when he wrote his guide for aspiring traveling salesmen. Voigt thought that the most important part of planning a journey was to try to visit as many places as possible without stopping in the same location twice. However, he didn't have any advice as to the best way to do this. Enter (again) the mathematicians.

In the 1920s and 30s, mathematicians such as Karl Menger, Merrill Flood, and A.W. Tucker brought the TSP to the attention of the academic community. From there, the difficulty of solving the TSP made it one of the most notorious problems in both graph theory and computer science. Salesmen aside, there are numerous practical instances of the TSP: planning truck routes, programming the movements of a robotic arm, designing fiber optic networks ... the list goes on.

Currently, the largest solved TSP provides the shortest path for visiting all 33,180 points on a circuit board. This solution was achieved using the Concorde TSP solver in March 2005. Previously, the record was a 72,500km tour visiting every city in Sweden - 24,978 cities, to be precise. The final stages of calculation for this particular tour ran on a network of Linux workstations for an aggregate total of 8 years of computation time.

NP-Completeness

On the surface, the traveling salesman problem seems simple: find the shortest round-trip journey passing through every city on the list. In practice, this turns out to be a difficult endeavor. If there are only a few cities to visit, the TSP is easily solved with brute force tactics: try every combination and go with the shortest. The larger the problem gets, however, the more infeasible this approach becomes. As an example, say you're planning two trips, one that will pass through 4 cities and one that will pass through 10 cities. Planning the 4 city trip is relatively easy. There are a total of 4 * 3 * 2 * 1 = 4! = 24 different ways you could plan your trip, and it doesn't take an excessive amount of time to test them all. The 10 city trip, on the other hand, proves to be a bit more involved. In that case, there are 10! different permutations you'd have to calculate - that's 3,628,800 trips! Woe betide the intrepid soul who wants to plan a really long trip.

As it turns out, the TSP is a textbook example of a NP-Complete problem. In complexity theory, "NP" stands for "Non-deterministic Polynomial time," and describes a class of problems in which the solution can be verified - but not necessarily computed - in polynomial time. "NP-Complete" refers to a subclass of problems within the class NP that are extremely difficult. While it is not currently known if any problem in NP can be solved in polynomial time, we know for certain that if one NP-Complete problem can be solved in polynomial time, all NP problems can be solved in polynomial time. As it is, all of the algorithms created to solve NP-Complete problems run in exponential time; that is, as the size of the input increases linearly, the time needed to solve the problem increases exponentially.

A proof that the TSP is NP Complete can be found in Cormen, Leiserson, Rivest, and Stein's Introduction to Algorithms (second edition) on page 1013. It takes as a given the NP-Completeness of the Hamiltonian cycle problem, which they refer to as HAM-CYCLE:

We first show that TSP belongs to NP. Given an instance of the problem, we use as a certificate the sequences of n vertices in the tour. The verification algorithm checks that this sequence contains each vertex exactly once, sums up the edge costs, and checks whether the sum is at most k. This process can certainly be done in polynomial time.

To prove that TSP is NP-hard, we show that HAM-CYCLE is reducible to TSP. Let G = (V,E) be an instance of HAM-CYCLE. We construct an instance of TSP as follows. We form the complete graph G' = (V,E'), where E' = {(i,j): i, j belong to V and i != j}, and we define the cost function c by c(i,j) = 0 if (i,j) belongs to E, 1 if (i,j) does not belong to E. (Note that because G is undirected, it has no self-loops, and so c(v,v) = 1 for all vertices v in V.) The instance of TSP is then <G', c, 0>, which is easily formed in polynomial time.

We now show that graph G has a hamiltonian cycle if and only if graph G' has a tour of cost at most 0. Suppose that graph G has a hamiltonian cycle h. Each edge in h belongs to E and thus has cost 0 in G'. Thus, h is a tour in G' with cost 0. Conversely, suppose that graph G' has a tour h' of cost at most 0. Since the costs of the edges in E' are 0 and 1, the cost of tour h' is exactly 0 and each edge on the tour must have cost 0. Therefore, h' contains only edges in E. We conclude that h' is a hamiltonian cycle in graph G.

Solving the Problem

The traveling salesman problem can be stated as either a decision problem or an optimization problem. The decision problem asks, "Is there a tour of n cities that is no longer than k units in length?" The optimization problem, on the other hand, is to find the smallest value of k possible for a tour of n cities. There are variations on the TSP that require different approaches. For instance, the bottleneck traveling salesman problem doesn't attempt to minimize the total cost of the trip, but rather the largest distance between any two cities in the tour. In an asymmetric TSP, the cost of going from city x to city y might not be the same as the cost of going from city y to city x. This discussion assumes a symmetric TSP, in which the distance from x to y is the same as the distance from y to x.

There are several optimization algorithms to choose from to find the best possible tour. A simple, brute force method is to calculate the cost of every possible tour. Even though this method is guaranteed to find the shortest trip, it's not an efficient approach by any means as it tries all of the possible combinations, a process which takes time O(n!). This algorithm also spends a significant amount of time calculating overlapping subproblems. If you're trying to find the shortest tour of cities v, w, x, y, and z, there are 6 combinations that include the sequence x-y-z (v-w-x-y-z, w-v-x-y-z, v-x-y-z-w, w-x-y-z-v, x-y-z-v-w, and x-y-z- w-v). The straightforward algorithm would calculate the cost of that sequence 6 times, once for each combination. Using a dynamic programming approach, this unnecessary work can be avoided by saving the cost the first time it is calculated and retrieving it the next time it is needed. This process is called memoization, and it brings the cost of this particular problem down to O(2ⁿ). This is an improvement, but it is still exponential time.

Another way to save calculations is to use a branch and bound algorithm. It's not particularly useful to continue calculating the cost of a tour that's already proven to be longer than the best one found so far, so branch and bound algorithms abandon these possibilities as soon as it becomes apparent that they will be inoptimal. For example, if the current best tour is 450 miles long, any tour that is 450 miles long (or longer) without even being complete can't possibly be better. While this method also eliminates a lot of unnecessary work, it suffers from the same exponential time requirements as the previous two methods.

Approximation

Since it is very difficult to find an optimal tour within a reasonable amount of time, several approximation algorithms have been developed to find tours that are not necessarily optimal, but are close enough. A few of them will be discussed below.

The most intuitive of these is known as the Nearest Neighbor algorithm, which functions exactly as it is named:

Start from a random city.
Find the closest city you haven't yet visited, and go there.
Repeat this process until you have visited all the cities.
Return to the start.

A variation is the Repetitive Nearest Neighbor algorithm, in which trips starting from every city are calculated and the shortest of these is returned. The Nearest Neighbor algorithm is relatively efficient, running in O(n²) time. It won't always produce the shortest trip, but generally the trip will be within 25% of the Held -Karp lower bound, an estimate of the optimal tour length that is often used as a benchmark for TSP algorithms. (The Held-Karp lower bound itself is often within 1% of the optimal tour length.)

A similar approach is the greedy heuristic:

Calculate the cost of each edge.
Select the shortest untraveled edge leading to an unvisited city.
Repeat until cities have been visited.

This algorithm can produce a better result than the standard Nearest Neighbor algorithm, usually within 15-20% of the Held-Karp lower bound. The tradeoff is that it takes longer, running in O(n² log₂ n) time.

Some other techniques involve constructing a minimum spanning tree. A spanning tree is a tree containing all of the vertices and some of the edges of an undirected graph. A minimum spanning tree is a spanning tree constructed from a weighted graph that has a weight that is equal to or lower than any other spanning tree produced from that graph. If we assume that the particular problem we are solving satisfies the triangle inequality (e.g., the cost of going from city x to city y is lower than the cost of going from x to y to z), a minimum spanning tree can be used to produce a tour that costs no more than twice the cost of the optimal tour. This is accomplished by performing a pre-order walk of the tree, which lists a vertex before any of its children have been visited. This approach runs in at best O (n²) time when using an algorithm such as Prim's to construct the tree, and at worst O(n² log₂ n).

Nicos Christofides demonstrated that there is a way to improve this algorithm such that the tour it produces is no more than 1.5 times longer than the optimal tour. His algorithm, referred to as Christofides Algorithm, employs a minimum weight matching on the set of vertices with an odd degree. In conjunction with the minimum spanning tree, an Eulerian cycle can be created. An Eulerian cycle uses every edge in the graph exactly once, so as it is it is obviously not a solution as it visits each vertex multiple times. Instead, the cycle is traversed so that any time a previously visited vertex is encountered it is skipped in favor of going on to the next one. This takes advantage of the triangle inequality, since any shortcut is bound to be shorter. Christofides Algorithm runs in O(n³) time, but the tours it produces are generally around 10% over the Held-Karp bound.

Nearest Neighbor and Google Maps

As a neat way of visualizing the outcome of one of the many TSP approximation algorithms, I've written a relatively simple program in Ruby that calculates a TSP tour using the Nearest Neighbor algorithm and then displays the route using Google Maps. The user can select one starting city and an arbitrary number of destinations, and the algorithm will calculate the route accordingly.

In the future I'd like to implement a step-by-step feature that will calculate the route one city at a time, and I'd also like to add an algorithm or two for comparative purposes. A cleaner site design is also in the works. In the meantime, I've listed a few sites below that feature a more diverse set of algorithm animations.

To see the Google Maps version, click here.

Resources

Georgia Tech has an excellent TSP site that is definitely worth checking out. They also have the Concorde solver that was used to calculate the largest TSP tour to date available for download.

Sean Forman has created a TSP generator that provides tours generated by two different approximation algorithms, Repetitive Nearest Neighbor and Cheapest Link.

A few Java applets demonstrating TSP algorithms.

Another site featuring TSP applets.

The TSPBIB Home Page is a comprehensive list of papers, reports, software, and variations.

TSPLIB is a library of sample TSP and TSP-related instances.

An archived American Mathematical Society Feature Column on the TSP by Joe Malkevich is a nice overview of the history and basic ideas behind the TSP.

Nigel Cummings has written a Brief History of TSP.

Wikipedia has an entry on the TSP.

The Traveling Monkey is a brief article about a study claiming that both chimpanzees and vervet monkeys also try to minimize the distance they travel between various caches of food.

Corman, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.

Gutin, G. and A. P. Punnen. The Traveling Salesman Problem and Its Variations. Springer, 2002.

Nilsson, Christian. "Heuristics for the Traveling Salesman Problem."

Laurie Jones, May 2006 (last updated May 17, 2006)