In this repository the TSPLib-based[1] test instances proposed in [2] are considered. In addition to that, the benchmark is extended in two directions. First, a new generation is defined involving instances with α!=0.5, where α is the ratio between the cost limit and the cost of TSP solution. Second, the set of test instances is extended to larger size problems, that is, problems involving up to 7397 nodes.
Generation | Score for the i-th node | α | # instances (n<=400) | # instances (n>400) |
---|---|---|---|---|
1 | 1 | 0.5 | 45 | 41 |
2 | 1 + (7141 · (i-1) + 73) mod 100 | 0.5 | 45 | 41 |
3 | 1 + floor( 99 · d_{1,i} / max_{j∈{1,2,...,n}} d_{1,j} ) | 0.5 | 45 | 41 |
4 | 1 + (7141 · (i-1) + 73) mod 100 | α∗ | 45 | 41 |
α∗ : α of the hardest instance for the B&C in [2]. |
Below you will find the details of the keywords and values in the instance files. Except COST_LIMIT, TSP_SOLUTION and NODE_SCORE_SECTION, all the keywords are inherited from TSPLib.
[1]: G. Reinelt. TSPLIB - A Traveling Salesman Problem Library. ORSA Journal on Computing, 3(4):376–384, 1991.
[2]: M. Fischetti, J. J. Salazar-González, and P. Toth. Solving the orienteering problem through branch-and-cut. INFORMS Journal on Computing, 10:133–148, 1998.
TSPLIB is a library of sample instances for the TSP (and related problems) from various sources and of various types. Instances of the following problem classes are available.
Symmetric traveling salesman problem (TSP)
Given a set of n nodes and distances for each pair of nodes, find a roundtrip of minimal total length visiting each node exactly once. The distance from node i to node j is the same as from node j to node i.
Hamiltonian cycle problem (HCP)
Given a graph, test if the graph contains a Hamiltonian cycle or not.
Asymmetric traveling salesman problem (ATSP)
Given a set of n nodes and distances for each pair of nodes, find a roundtrip of minimal total length visiting each node exactly once. In this case, the distance from node i to node j and the distance from node j to node i may be different.
Sequential ordering problem (SOP)
This problem is an asymmetric traveling salesman problem with additional constraints. Given a set of n nodes and distances for each pair of nodes, find a Hamiltonian path from node 1 to node n of minimal length which takes given precedence constraints into account. Each precedence constraint requires that some node i has to be visited before some other node j.
Orienteering problem (OP)
We are given a set of n nodes - one of which is the depot node - and the distances between nodes. As there is a limitation on the total route distance, each node has been assigned a profit. The goal is to find the route that maximises the total profit subject to this total cost limitation constraint.
Capacitated vehicle routing problem (CVRP)
We are given n - 1 nodes, one depot and distances from the nodes to the depot, as well as between nodes. All nodes have demands which can be satisfied by the depot. For delivery to the nodes, trucks with identical capacities are available. The problem is to find tours for the trucks of minimal total length that satisfy the node demands without violating truck capacity constraint. The number of trucks is not specified. Each tour visits a subset of the nodes and starts and terminates at the depot. (Remark: In some data files a collection of alternate depots is given. A CVRP is then given by selecting one of these depots.)
Except, for the Hamiltonian cycle problems, all problems are defined on a complete graph and, at present, all distances are integer numbers. There is a possibility to require that certain edges appear in the solution of a problem.
Each file consists of a specification and of a data part. The specification part contains information on the file format and on its contents. The data part contains explicit data.
All entries in this section are of the form <keyword> : <value>, where <keyword> denotes an alphanumerical keyword and <value> denotes alphanumerical or numerical data. The terms <string>, <integer> and <real> denote character string, integer or real data, respectively. The order of specification of the keywords in the data file is arbitrary (in principle), but must be consistent, i.e., whenever a keyword is specified, all necessary information for the correct interpretation of the keyword has to be known. Below we give a list of all available keywords.
Identifies the data file.
Specifies the type of the data. Possible types are
TSP Data for a symmetric traveling salesman problem
ATSP Data for an asymmetric traveling salesman problem
SOP Data for a sequential ordering problem
HCP Hamiltonian cycle problem data
OP Data for a symmetric orienteering problem
CVRP Capacitated vehicle routing problem data
TOUR A collection of tours
Additional comments (usually the name of the contributor or creator of the problem instance is given here).
For a TSP, OP or ATSP, the dimension is the number of its nodes. For a CVRP, it is the total number of nodes and depots. For a TOUR file it is the dimension of the corresponding problem.
Maximum distance of the total route in a OP.
Specifies the truck capacity in a CVRP.
Specifies how the edge weights (or distances) are given. The values are:
EXPLICIT Weights are listed explicitly in the corresponding section
EUC_2D Weights are Euclidean distances in 2-D
EUC_3D Weights are Euclidean distances in 3-D
MAX_2D Weights are maximum distances in 2-D
MAX_3D Weights are maximum distances in 3-D
MAN_2D Weights are Manhattan distances in 2-D
MAN_3D Weights are Manhattan distances in 3-D
CEIL_2D Weights are Euclidean distances in 2-D rounded up
GEO Weights are geographical distances
ATT Special distance function for problems att48 and att532
XRAY1 Special distance function for crystallography problems (Version 1)
XRAY2 Special distance function for crystallography problems (Version 2)
SPECIAL There is a special distance function documented elsewhere
Describes the format of the edge weights if they are given explicitly. The values are:
FUNCTION Weights are given by a function (see above)
FULL_MATRIX Weights are given by a full matrix
UPPER_ROW Upper triangular matrix (row-wise without diagonal entries)
LOWER_ROW Lower triangular matrix (row-wise without diagonal entries)
UPPER_DIAG_ROW Upper triangular matrix (row-wise including diagonal entries)
LOWER_DIAG_ROW Lower triangular matrix (row-wise including diagonal entries)
UPPER_COL Upper triangular matrix (column-wise without diagonal entries)
LOWER_COL Lower triangular matrix (column-wise without diagonal entries)
UPPER_DIAG_COL Upper triangular matrix (column-wise including diagonal entries)
LOWER_DIAG_COL Lower triangular matrix (column-wise including diagonal entries)
Describes the format in which the edges of a graph are given, if the graph is not complete. The values are:
EDGE_LIST The graph is given by an edge list
ADJ_LIST The graph is given as an adjacency list
Specifies whether coordinates are associated with each node (which, for example may be used for either graphical display or distance computations). The values are
TWOD_COORDS Nodes are specified by coordinates in 2-D
THREED_COORDS Nodes are specified by coordinates in 3-D
NO_COORDS The nodes do not have associated coordinates
The default value is NO_COORDS.
Specifies how a graphical display of the nodes can be obtained. The values are:
COORD_DISPLAY Display is generated from the node coordinates
TWOD_DISPLAY Explicit coordinates in 2-D are given
NO_DISPLAY No graphical display is possible
The default value is COORD_DISPLAY if node coordinates are specified and NO_DISPLAY otherwise.
Terminates the input data. This entry is optional.
Depending on the choice of specifications some additional data may be required. These data are given in corresponding data sections following the specification part. Each data section begins with the corresponding keyword. The length of the section is either implicitly known from the format specification, or the section is terminated by an appropriate end-of-section identifier.
Node coordinates are given in this section. Each line is of the form
<integer> <real> <real>
if NODE_COORD_TYPE is TWOD_COORDS, or
<integer> <real> <real> <real>
if NODE_COORD TYPE is THREED_COORDS. The integers give the number of the respective nodes. The real numbers give the associated coordinates.
Contains a list of possible alternate depot nodes. This list is terminated by a −1.
The demands of all nodes of a CVRP are given in the form (per line)
<integer> <integer>
The first integer specifies a node number, the second its demand. The depot nodes must also occur in this section. Their demands are 0.
The scores of the nodes of a OP are given in the form (per line)
<integer> <integer>
Edges of a graph are specified in either of the two formats allowed in the EDGE DATA FORMAT entry. If the type is EDGE LIST, then the edges are given as a sequence of lines of the form
<integer> <integer>
each entry giving the terminal nodes of some edge. The list is terminated by a −1. If the type is ADJ_LIST, the section consists of a list of adjacency lists for nodes. The adjacency list of a node x is specified as
<integer> <integer> . . . <integer> -1
where the first integer gives the number of node x and the following integers (terminated by -1 ) the numbers of nodes adjacent to x. The list of adjacency lists is terminated by an additional −1.
In this section, edges that are required to appear in each solution to the problem are listed. The edges to be fixed are given in the form (per line)
<integer> <integer>
meaning that the edge (arc) from the first node to the second node has to be contained in a solution. This section is terminated by a -1.
If DISPLAY_DATA_TYPE is TWOD_DISPLAY, the 2-dimensional coordinates from which a display can be generated are given in the form (per line)
<integer> <real> <real>
The integers specify the respective nodes and the real numbers give the associated coordinates.
A collection of tours is specified in this section. Each tour is given by a list of integers giving the sequence in which the nodes are visited in this tour. Every such tour is terminated by a -1. An additional -1 terminates this section.
The edge weights are given in the format specified by the EDGE_WEIGHT_FORMAT entry. At present, all explicit data is integral and is given in one of the (self-explanatory) matrix formats. with implicitly known lengths.
For the various choices of EGDE_WEIGHT_TYPE, we now describe the computations of the respective distances. In each case we give a (simplified) C-implementation for computing the distances from the input coordinates. All computations involving floating-point numbers are carried out in double precision arithmetic. The integers are assumed to be represented in 32-bit words. Since distances are required to be integral, we round to the nearest integer (in most cases).
For edge weight type EUC_2D and EUC_3D, floating point coordinates must be specified for each node. Let x[i], y[i], and z[i] be the coordinates of node i.
In the 2-dimensional case the distance between two points i and j is computed as follows:
xd = x[i] - x[j];
yd = y[i] - y[j];
dij = (int) (sqrt( xd*xd + yd*yd) + 0.5);
In the 3-dimensional case we have:
xd = x[i] - x[j];
yd = y[i] - y[j];
zd = z[i] - z[j];
dij = (int) (sqrt( xd*xd + yd*yd + zd*zd) + 0.5);
where sqrt is the C square root function.
Distances are given as Manhattan distances if the edge weight type is MAN_2D or MAN_3D. They are computed as follows.
2-dimensional case:
xd = abs( x[i] - x[j] );
yd = abs( y[i] - y[j] );
dij = (int) ( xd + yd + 0.5 );
3-dimensional case:
xd = abs( x[i] - x[j] );
yd = abs( y[i] - y[j] );
zd = abs( z[i] - z[j] );
dij = (int) ( xd + yd + zd + 0.5 );
Maximum distances are computed if the edge weight type is MAX_2D or MAX_3D.
2-dimensional case:
xd = abs( x[i] - x[j] );
yd = abs( y[i] - y[j] );
dij = (int) ( max( xd , yd ) + 0.5 );
3-dimensional case:
xd = abs( x[i] - x[j] );
yd = abs( y[i] - y[j] );
zd = abs( z[i] - z[j] );
dij = (int) ( max( xd , yd , zd ) + 0.5 );
If the traveling salesman problem is a geographical problem, then the nodes correspond to points on the earth and the distance between two points is their distance on the idealized sphere with radius 6378.388 kilometers. The node coordinates give the geographical latitude and longitude of the corresponding point on the earth. Latitude and longitude are given in the form DDD.MM where DDD are the degrees and MM the minutes. A positive latitude is assumed to be “North”, negative latitude means “South”. Positive longitude means “East”, negative latitude is assumed to be “West”. For example, the input coordinates for Augsburg are 48.23 and 10.53, meaning 48º 23´ North and 10º 53´ East.
Let x[i] and y[i] be coordinates for city i in the above format. First the input is converted to geographical latitude and longitude given in radians.
PI = 3.141592;
deg = dtrunc( x[i] );
min = x[i] - deg;
latitude[i] = PI * (deg + 5.0 * min / 3.0 ) / 180.0;
deg = dtrunc( y[i] );
min = y[i] - deg;
longitude[i] = PI * (deg + 5.0 * min / 3.0 ) / 180.0;
The distance between two different nodes i and j in kilometers is then computed as follows:
RRR = 6378.388;
q1 = cos( longitude[i] - longitude[j] );
q2 = cos( latitude[i] - latitude[j] );
q3 = cos( latitude[i] + latitude[j] );
dij = (int) ( RRR * acos( 0.5 * ( (1.0 + q1) * q2 - (1.0 - q1 ) * q3 ) ) + 1.0);
The “acos” function is the inverse of the cosine function. The "dtrunc" function truncates the value and converts it to double.
k = (int) x;
x = (double) k;
The edge weight type ATT corresponds to a special “pseudo-Euclidean” distance function. Let x[i] and y[i] be the coordinates of node i. The distance between two points i and j is computed as follows:
xd = x[i] - x[j];
yd = y[i] - y[j];
rij = sqrt( (xd*xd + yd*yd) / 10.0 );
tij = dtrunc( rij );
if ( tij < rij )
dij = tij + 1;
else
dij = tij;
The edge weight type CEIL_2D requires that the 2-dimensional Euclidean distances is rounded up to the next integer.
We have included into TSPLIB the crystallography problems as described in [1]. These problems are not explicitly given but subroutines are provided to generate the 12 problems mentioned in this reference and subproblems thereof (see section 3.2).
To compute distances for these problems the movement of three motors has to be taken into consideration. There are two types of distance functions: one that assumes equal speed of the motors (XRAY1) and one that uses different speeds (XRAY2). The corresponding distance functions are given as FORTRAN implementations (files deq.f, resp. duneq.f) in the distribution file.
For obtaining integer distances, we propose to multiply the distances computed by the original subroutines by 100.0 and round to the nearest integer. We list our modified distance function for the case of equal motor speeds in the FORTRAN version below.
INTEGER FUNCTION ICOST(V,W)
INTEGER V,W
DOUBLE PRECISION DMIN1,DMAX1,DABS
DOUBLE PRECISION DISTP,DISTC,DISTT,COST
DISTP=DMIN1(DABS(PHI(V)-PHI(W)),DABS(DABS(PHI(V)-PHI(W))-360.0E+0))
DISTC=DABS(CHI(V)-CHI(W))
DISTT=DABS(TWOTH(V)-TWOTH(W))
COST=DMAX1(DISTP/1.00E+0,DISTC/1.0E+0,DISTT/1.00E+0)
C *** Make integral distances ***
ICOST=AINT(100.0E+0*COST+0.5E+0)
RETURN
END
The numbers PHI(), CHI(), and TWOTH() are the respective x-, y-, and *z-*coordinates of the points in the generated traveling salesman problems. Note, that TSPLIB95 contains only the original distance computation without the above modification.