1
Mathematics HL AI Internal Assessment:
A graphical analysis of university options
Topic: Voronoi Diagrams and Graph Theory
Table of Contents
Introduction: ..................................................................................................................... 2
Exploration: ...................................................................................................................... 3
Bibliography .................................................................................................................... 20
2
Introduction:
Rationale:
As a student in their final year of the IB diploma, the thrill and terror of applying to
university is a constant pressure looming over my head. There are several factors which play
into this, such as the desire to achieve your best at IB to meet certain required grades or
writing a personal statement. However, the first hurdle is deciding where you want to apply
to. Of course, this depends on many variables, such as one’s academic level and their career
goals. For me, location is a major factor. As my family will continue to live abroad when I
move to the UK, it will be beneficial to have many airports nearby so that I can visit them
frequently or vice versa. Moreover, before choosing where I want to study, it is important for
me to get a feel of each area and campus, and so I must visit every university campus. As a
student on a limited budget, I desire to find the cheapest possible way to visit each of these
campuses.
Aim:
The objective of this investigation is to determine the optimal university based on the number
of airports nearest to each of the 9 shortlisted universities using a Voronoi diagram.
Furthermore, to give myself an idea about the range of money I would need to visit each
campus once on a campus tour, this investigation also approximates the solution to the
travelling salesman problem by using a weighted graph showing the price of direct train
routes to each university. While a range of values can be determined by finding a lower
bound and lower bound for the Hamiltonian cycle of least weight, it must be established that
due to the nature of the travelling salesman problem, being NP-hard (which means that it has
no efficient algorithm), any solution for the travelling salesman problem is only approximate.
3
Exploration:
Construction of the Voronoi Diagram:1
In order to construct a Voronoi Diagram, I first overlayed a map of England over a squared
grid using GeoGebra. Figure 1 has a scale of 50km per 2 units and displays the 9 shortlisted
universities as different sites {𝑁, 𝐷, 𝐿𝑜, 𝑀, 𝐿, 𝑂, 𝑊, 𝐿𝑆, 𝐸 } with their co-ordinates recorded in
Table 1.
Figure 1: Map of Potential Universities
1
The light blue boxes with red trim are used to hide personal information and aren’t of mathematical relevance.
4
Table 1: Co-ordinates of Universities
Site Name Site Co-ordinate
𝑁 (12.0, 19.9)
𝐷 (12.1, 18.8)
𝐿 (12.2,14.5)
𝑀 (10.4,13.0)
𝐿𝑜 (13.0,9.9)
𝑂 (12.1,8.2)
𝑊 (12.9,5.6)
𝐿𝑆 (15.9,4.6)
𝐸 (6.9,1.3)
As all points on a perpendicular bisector are equidistant to the sites which they are bisecting,
perpendicular bisectors are useful in the construction of Voronoi Diagrams. Thus, the
perpendicular bisectors between every site must be found, these will act as the edges for the
Voronoi Diagram.
Firstly, the edge between sites 𝑁(12.0,19.9) and 𝐷(12.1,18.8) was constructed. As the
perpendicular bisector runs through the midpoint, that is what must be calculated first.
!! "!" $! "$"
Midpoint of [𝑁𝐷] = 8 #
, #
9
12.0 + 12.1 19.9 + 18.8
= ; , =
2 2
= (12.05, 19.35)
The gradient of the perpendicular bisector between two points is equal to the negative
reciprocal of the gradient (𝑚) of the line between these points.
$ %$
𝑚 of [𝑁𝐷] = !" %!!
" !
18.8 − 19.9
=
12.1 − 12.0
5
𝑚 of [𝑁𝐷 ] = −11
𝟏
∴ Gradient of Perpendicular Bisector of [ND] = 𝟏𝟏
The linear equation form is 𝑦 = 𝑚𝑥 + 𝑐 so we substitute in the values of
𝑥 𝑎𝑛𝑑 𝑦 from the midpoint, and 𝑚 to find 𝑐 and thus the equation of the perpendicular
bisector.
𝑦 = 𝑚𝑥 + 𝑐
1
19.35 = (12.05) + 𝑐
11
12.05
𝑐 = 19.35 −
11
WWWW
𝑐 = 18.254
1
∴𝑦= WWWW
𝑥 + 18.254
11
This linear form represents the edge between sites N and D.
This process is repeated for all other edges between sites. To make the process more efficient,
I created a spreadsheet in Excel to find the midpoints and the gradients of the other
perpendicular bisectors. For these values I chose not to round them to ensure accuracy in the
rest of my calculations.
6
Sites 𝒙𝟏 𝒚𝟏 𝒙𝟐 𝒚𝟐 𝒎 𝟏 Midpoint Midpoint 𝒄
−
𝒎 𝒙 𝒚
DL 12.1 18.8 12.2 14.5 -43 0.02325581 12.15 16.65 16.3674419
LM 12.2 14.5 10.4 13 0.83333333 -1.2 11.3 13.75 27.31
MLo 10.4 13 13 9.9 -1.1923077 0.83870968 11.7 11.45 1.63709677
LoO 13 9.9 12.1 8.2 1.88888889 -0.5294118 12.55 9.05 15.6941176
OW 12.1 8.2 12.9 5.6 -3.25 0.30769231 12.5 6.9 3.05384615
WLS 12.9 5.6 15.9 4.6 -0.3333333 3 14.4 5.1 -38.1
LSE 15.9 4.6 6.9 1.3 0.36666667 -2.7272727 11.4 2.95 34.0409091
NL 12 19.9 12.2 14.5 -27 0.03703704 12.1 17.2 16.7518519
NM 12 19.9 10.4 13 4.3125 -0.2318841 11.2 16.45 19.0471014
NLo 12 19.9 13 9.9 -10 0.1 12.5 14.9 13.65
NO 12 19.9 12.1 8.2 -117 0.00854701 12.05 14.05 13.9470085
NW 12 19.9 12.9 5.6 -15.888889 0.06293706 12.45 12.75 11.9664336
NLS 12 19.9 15.9 4.6 -3.9230769 0.25490196 13.95 12.25 8.69411765
NE 12 19.9 6.9 1.3 3.64705882 -0.2741935 9.45 10.6 13.191129
DM 12.1 18.8 10.4 13 3.41176471 -0.2931034 11.25 15.9 19.1974138
DLo 12.1 18.8 13 9.9 -9.8888889 0.1011236 12.55 14.35 13.0808989
DO 12.1 18.8 12.1 8.2 Undefined Undefined 12.1 13.5 Undefined
DW 12.1 18.8 12.9 5.6 -16.5 0.06060606 12.5 12.2 11.4424242
DLS 12.1 18.8 15.9 4.6 -3.7368421 0.26760563 14 11.7 7.95352113
DE 12.1 18.8 6.9 1.3 3.36538462 -0.2971429 9.5 10.05 12.8728571
LLo 12.2 14.5 13 9.9 -5.75 0.17391304 12.6 12.2 10.0086957
LO 12.2 14.5 12.1 8.2 63 -0.015873 12.15 11.35 11.5428571
LW 12.2 14.5 12.9 5.6 -12.714286 0.07865169 12.55 10.05 9.06292135
LLS 12.2 14.5 15.9 4.6 -2.6756757 0.37373737 14.05 9.55 4.2989899
LE 12.2 14.5 6.9 1.3 2.49056604 -0.4015152 9.55 7.9 11.7344697
MO 10.4 13 12.1 8.2 -2.8235294 0.35416667 11.25 10.6 6.615625
MW 10.4 13 12.9 5.6 -2.96 0.33783784 11.65 9.3 5.36418919
MLS 10.4 13 15.9 4.6 -1.5272727 0.6547619 13.15 8.8 0.18988095
ME 10.4 13 6.9 1.3 3.34285714 -0.2991453 8.65 7.15 9.73760684
LoW 13 9.9 12.9 5.6 43 -0.0232558 12.95 7.75 8.05116279
LoLS 13 9.9 15.9 4.6 -1.8275862 0.54716981 14.45 7.25 -0.6566038
LoE 13 9.9 6.9 1.3 1.40983607 -0.7093023 9.95 5.6 12.6575581
OLS 12.1 8.2 15.9 4.6 -0.9473684 1.05555556 14 6.4 -8.3777778
OE 12.1 8.2 6.9 1.3 1.32692308 -0.7536232 9.5 4.75 11.9094203
WE 12.9 5.6 6.9 1.3 0.71666667 -1.3953488 9.9 3.45 17.2639535
Table 2: Spreadsheet to find Perpendicular Bisectors
7
$ %$
For the perpendicular bisector of sites D and O, using the formula: !" %!! , there is a
" !
denominator of 0, which must mean that the line has an undefined gradient and thus the
perpendicular bisector between these sites is horizontal.
Figure 2: Voronoi Diagram
Using these values for 𝑥, 𝑦, 𝑐, 𝑚; I created Figure 3. However, it must be noted that only a
section of the perpendicular bisectors are included as edges in this Voronoi diagram. The
edges represent the locus of equidistance points between two sites, but they stop when a new
site becomes closer to edge than the points which it bisects.
8
Analysis of the Voronoi Diagram to determine which cell contains the most
incentives:
As explained in the introduction, the location of airports is extremely important to my choice
of university. To determine which university has the most airports nearby, we simply need to
add the location of the airports to the grid. Whichever cell the airport is in, it will be closest to
the corresponding university.
Figure 3: Voronoi diagram containing airports
9
Thus, I plotted the location of England’s major airports onto the Voronoi Diagram illustrated
by the red dots. The cell which the dot is in represents which university the airport must be
closest to. I recorded how many dots were in each cell, via inspection, and recorded my
results in table 3.
In the case of any airport which lay on the edge between two cells, I counted them as in both
cells due to the equidistance from any point on an edge to either site. This was the case for
the airport lying on the edge between sites E and W
Table 3: Number of Airports in each cell
Cell Numbers of Airports
𝑁 1
𝐷 1
𝐿 0
𝑀 2
𝐿𝑜 2
𝑂 0
𝑊 2
𝐿𝑆 4
𝐸 3
From these results, we can determine that cell LS contains the largest number of airports and
thus site LS is closest to the largest number of airports. However, this conclusion maybe
flawed as the diagram only accounts for the distance from the co-ordinates and doesn’t
consider the size of the airports, the availability of transport to the airports and whether the
airport hosts international flights.
Solving the Travelling Salesman Problem
In order to further determine which university suits me the most, a trip to each university
campus is necessary. However, as I will be on a tight student budget, finding the Hamiltonian
cycle of least weight starting and ending at LS (the site with the largest number of airports
10
closest to it) is important. This will be done by approximating a solution to the practical
travelling salesman problem (TSP). To discover the best route to travel, I created a weighted
graph showing the costs (in £) of direct trains from the train stations closest to the
universities. In this case, the cost of the trains represents the weight of the edges, and the
different train stations represent the vertices. It must be noted that the vertices on this graph
do not represent the geographical location of the train stations.
Figure 4: Weighted Graph of Direct Trains
To simplify this, I used the weight of the edges between each vertex to construct a weighted
adjacency table.
Table 4: Weighted Adjacency Table
Vertex E B O LS W LO M L D N
E 0 16.50 - 40.30 - - - 166.50 205.20 206.80
B 16.50 0 - 33.20 - - - - - -
O - - 0 10.40 - - 74.50 - - -
LS 40.30 33.20 10.40 0 16.00 66.50 49.20 46.00 50.50 59.00
11
W - - - 16.00 0 - - - - -
LO - - - 66.50 - 0 - - - -
M - - 74.50 49.20 - - 0 17.70 39.40 39.40
L 166.5 - - 46.00 - - 17.70 0 20.90 25.00
D 205.2 - - 50.50 - - 39.40 20.90 0 3.50
N 206.80 - - 59.00 - - 39.40 25.00 3.50 0
The symbol ‘-’ indicates that there are no direct train routes between these stations. To solve
the classical TSP, we must find a Hamiltonian cycle (a circuit which visits every vertex
exactly once before returning to the starting vertex) of least weight for a given weighted
complete graph. A complete graph is a simple, undirected graph in which every vertex is
connected by an edge to every other vertex. As indicated by Table 3, several vertices are not
connected by a single edge to every other vertex (for example, vertex W is only directly
connected to Vertex LS) and thus figure 4 is not a complete graph. Hence, the classical TSP
cannot be solved and instead, the practical TSP must be solved.
To solve the practical TSP for a given weighted graph, the route of least weight which starts
and finishes at the same vertex and visits each vertex at least once must be found. To
accomplish this, we can convert the practical TSP into a classical TSP by adding or replacing
edges to show the path of least weight between the vertices which weren’t connected to every
other vertex. For example, the route of least weight between vertex E and vertex O passes
through vertex LS.
E → LS → O = 40.30 + 10.40. Thus, the weight of the added edge between vertex E and
vertex O is 50.70. This process of adding a new edge connecting every vertex to every other
vertex enabled the creation of the complete weighted graph shown in figure 5, and its
corresponding adjacency table
12
Figure 5: Complete Weighted Graph
Table 5: Weighted Adjacency Table for Complete Graph
Vertex E B O LS W LO M L D N
E 0 16.50 50.70 40.30 57.30 106.80 89.50 166.50 205.20 206.80
B 16.50 0 43.60 33.20 49.20 99.70 82.40 79.20 83.70 92.20
O 50.70 43.60 0 10.40 26.40 76.90 74.50 56.40 60.90 69.40
LS 40.30 33.20 10.40 0 16.00 66.50 49.20 46.00 50.50 59.00
W 57.30 49.20 26.40 16.00 0 82.50 65.20 62.00 66.50 75.00
LO 106.80 99.70 76.90 66.50 82.50 0 115.70 112.50 117.00 125.00
M 89.50 82.40 74.50 49.20 65.20 115.70 0 17.70 39.40 39.40
L 166.5 79.20 56.40 46.00 62.00 112.50 17.70 0 20.90 25.00
D 205.2 83.70 60.90 50.50 66.50 117.00 39.40 20.90 0 3.50
N 206.80 92.20 69.40 59.00 75.00 125.00 39.40 25.00 3.50 0
Now that we have turned the graph into a complete graph, we can find an approximate for the
classical TSP. For a smaller graph with a fewer number of vertices, it may be practical to find
13
all the possible Hamiltonian cycles by inspection of the graph and determine the cycle of
least weight, however as this graph has many vertices of high order, this method is
impractical and thus an estimate of the minimum weight of the Hamiltonian cycle will be
found. To accomplish this, we must find an upper bound and a lower bound for the weight of
the Hamiltonian cycle. To find an upper bound, the nearest neighbour algorithm can be used.
This algorithm has 4 steps:
• Step 1: Choose a starting vertex (for the purpose of our graph, the starting vertex shall
be LS)
• Step 2: Follow the edge of least weight to an unvisited vertex.
• Step 3: Repeat step 2 until all vertices have been visited.
• Step 4: Return to the starting vertex by adding the corresponding edge.
Using table 5, we can follow this algorithm by creating another table (table 6) which displays
the edges of least weight, their weight and whether a cycle has been formed. If a cycle has
been formed, such as in the case of edge OLS, the weight of this edge is ignored and the edge
of second least weight is found instead. For the rest of the table, edges of least weight which
resulted in a cycle were ignored until every vertex had been visited once.
Table 6: Nearest Neighbour Algorithm
Edge of Least Weight Vertex Is there a cycle?
Weight
N/A N/A LS N/A
LSO 10.40 O No
OLS 10.40 LS Yes, so ignore
and find the
edge of second
least weight.2
OW 26.40 W No
OB 49.20 B No
BE 16.50 E No
2
This same assumption was made for all other edges which would make a cycle, but they are not shown.
14
EM 89.50 M No
ML 17.70 L No
LD 20.90 D No
DN 3.50 N No
NLO 125.00 LO No
NLS 59.00 LS Yes (cycle
complete)
This Hamiltonian cycle can be written as this:
LS → O → W→ B → E → M → L → D → N → LO → LS
However, because some of these edges are not direct, this isn’t the actual solution. Instead,
we have to include all the train routes which go through vertex LS.
LS → O → LS →W → LS → B → E → LS → M → L → D → N → LS → LO → LS
Let 𝑚 be minimum weight (of graph? Check one note) (re-write)
𝑚 ≤ 10.40 + 26.40 + 49.20 + 16.50 + 89.50 + 17.70 + 20.90 + 3.50 + 125.00 + 59.00
𝑚 ≤ £418.10
Now that we have our upper bound for 𝑚, we must use the deleted vertex algorithm to find a
lower bound for 𝑚.
• Step 1: Delete a vertex, and the edges connected to it, from the original graph.
• Step 2: Find the minimum spanning tree (MST) for the remaining graph.
• Step 3: Add the length of the two shortest deleted edges to the weight of the minimum
spanning tree.
This table below is what table 5 looks like once vertex L has been deleted from the table.
Table 7: Weighted Adjacency Table with Vertex L deleted
Vertex E B O LS W LO M D N
15
E 0 16.50 50.70 40.30 57.30 106.80 89.50 205.20 206.80
B 16.50 0 43.60 33.20 49.20 99.70 82.40 83.70 92.20
O 50.70 43.60 0 10.40 26.40 76.90 74.50 60.90 69.40
LS 40.30 33.20 10.40 0 16.00 66.50 49.20 50.50 59.00
W 57.30 49.20 26.40 16.00 0 82.50 65.20 66.50 75.00
LO 106.80 99.70 76.90 66.50 82.50 0 115.70 117.00 125.00
M 89.50 82.40 74.50 49.20 65.20 115.70 0 39.40 39.40
D 205.2 83.70 60.90 50.50 66.50 117.00 39.40 0 3.50
N 206.80 92.20 69.40 59.00 75.00 125.00 39.40 3.50 0
The shortest edges from the deleted vertex L were LM and LD, with weights of £17.70 and
£20.90 respectively. These are the weights which will be added to the weight of the minimum
spanning tree.
With Vertex L deleted, we can now find the minimum spanning tree using Prim’s algorithm.
Prim’s algorithm can be utilized using two different methods. In the first method, the closest
edges to each vertex are chosen via inspection of the graph. However, because the vertices in
the graph (illustrated including Vertex L in figure 4), all have an order of 9, I believe it was
too complex to use inspection and thus a different method must be used. In this method,
Prim’s algorithm is applied to a weighted adjacency table using these steps:
• Step 1: Select any vertex at random.
• Step 2: Delete the row of the chosen vertex
• Step 3: Number the column of the chosen vertex
• Step 4: Circle the lowest undeleted entry in any of the numbered columns. If there is
more than one possible choice, pick one at random. The row of the circled entry
decides the new chosen vertex.
• Step 5: Repeats steps 2-4 until all rows are deleted. The circled entries will give the
edges to the minimum spanning tree.
Working:
16
5 4 2 1 3 9 6 7 8
Vertex E B O LS W LO M D N
E 0 16.50 50.70 40.30 57.30 106.80 89.50 205.20 206.80
B 16.50 0 43.60 33.20 49.20 99.70 82.40 83.70 92.20
O 50.70 43.60 0 10.40 26.40 76.90 74.50 60.90 69.40
LS 40.30 33.20 10.40 0 16.00 66.50 49.20 50.50 59.00
W 57.30 49.20 26.40 16.00 0 82.50 65.20 66.50 75.00
LO 106.80 99.70 76.90 66.50 82.50 0 115.70 117.00 125.00
M 89.50 82.40 74.50 49.20 65.20 115.70 0 39.40 39.40
D 205.2 83.70 60.90 50.50 66.50 117.00 39.40 0 3.50
N 206.80 92.20 69.40 59.00 75.00 125.00 39.40 3.50 0
The minimum spanning tree has weight = 10.40 + 16.00 + 33.20 + 16.50 + 49.20 +
39.40 + 3.50 + 66.50 = £234.70
Now that the MST has been found, the length of two shortest deleted edges must be added to
£234.70 in order to complete the deleted vertex algorithm. These edges were the previously
mentioned LM and LD, which have a combined weight of £38.50. This answer to the deleted
vertex algorithm is equal to the lower bound for the minimum weight to solve the TSP.
£234.70 + £38.50 = £273.20
𝑚 ≥ £273.20
∴ £273.20 ≤ 𝑚 ≤ £418.10
However…
(MORE THINGS TO DO:
17
EVALUATION OF IT: Implications of orders of the verticies maybe; implications of
what im discovering (couple sentences)
Reflection: doesn’t neccesarily represent how easy it is to get to places as it only shows
direct trains. Also, prices fluxuate and that will not always be the minimum price.
Airports aren’t built the same – airports in Exeter may not fly the same as an airport in
London (however doesn’t affect the conclusion)
18
Why am I using the TSP, which method. (by inspection we can tell this path is not
Hamiltonian)
Our graph is not complete and not eucledian (explain why with example) thus must use
practical TSPxs
We must convert pracrtical into classical
Convert to table and then adjacency matrix
Turn from practical into classical TSP.
19
Then use classical TSP method to find upper and lower bounds
Graph:
http://graphonline.ru/en/?graph=NzBEdFqPOhtzPBGn
Conclusion and Evaluation:
In conclusion
This investigation was beneficial for me outside of mathematics. While researching the
locations of the universities, I ended up discovering certain features within the cells of the
universities which I didn’t know existed.
However, this investigation had some limitations to it. Firstly, due to the large scale of the
map, it was near impossible to make the location of the sites 100% accurate and so there will
be an error with calculations such as the radius. Therefore, I chose to round the radius to the
nearest kilometer, as I could not possibly know the answer to a higher level of precision.
20
Bibliography
www.geogebra.com. Date Accessed 5 Sep 2021.
https://mpra.ub.uni-muenchen.de/53026/1/MPRA_paper_53026.pdf. Date Accessed 6 Sep
2021.
http://graphonline.ru/en/?graph=gwVmUZRnolTiamri
21