Michael A. Kappler
Daylight CIS, Inc.
An algorithm has been created to generate all possible simple graphs (no atom or bond types). Topological indices are used to characterize frameworks and select "drug-like" scaffolds. As a result of processing a massive data set, Daylight tools have improved.
At EuroMUG '03, a challenge was offered ``to construct all possible compounds containing carbon, nitrogen, and oxygen with a molecular weight between 150 and 250.'' Such a challenge arises from the exploratory nature of drug discovery endeavors such as the "Screensaver Lifesaver" project, a community effort aimed at discovery of new drug candidates and involves over a 2.5 million computers, 3.5 billion molecules and 16 protein targets. Motivation stems from the implied fact that the challenge hasn't been done before, and the possibility to suggest new "drug-like" candidates. To rise to the challenge of the latter, a collaboration was initiated between people at Daylight and the University of New Mexico School of Medicine.
Counting
Enumerating
Sampling
Duplication
Applications
Goals
1 Routine: Main
2 G <- NewGraph(1,0)
3 Output(G)
4 Enumerate-Vertices(G,1)
5 done
6
7 Routine: EnumerateVertices(G,k)
8 set <- SelectVertices(G,k)
9 for i <- set[1], ... set[m]
10 if AcceptableVertex(G,i)
11 G <- AddVertex(G,i)
12 if GraphIsUnique(G)
13 Output(G)
14 EnumerateEdge(G,i)
15 EnumerateVertices(G,i)
16 end
17 G <- RemoveVertex(G,i)
18 end
19 end
20 return
21
22 Routine: EnumerateEdges(G,k)
23 setA <- SelectVertices(G,k)
24 setB <- SelectVertices2(G,k)
25 for i <- setA[1], ... setA[m]
26 for j <- setB[1], ... setB[n]
27 if AcceptableEdge(G,i,j)
28 G <- AddEdge(G,i,j)
29 if GraphIsUnique(G)
30 Output(G)
31 EnumerateEdges(G,i)
32 end
33 G <- RemoveEdge(G,i,j)
34 end
35 end
36 end
37 return
Algorithm Description
Graphs are enumerated by systematic assembly of acyclic graphs, and as they are constructed, each one is used to enumerate cyclic graphs. The algorithm begins by connecting a vertex to a graph in all possible ways, resulting in a set of acyclic graphs, one of which is G. As each acyclic graph is constructed, an edge is added in all possible ways resulting in set of monocyclic graphs. As each monocyclic graph is constructed, an edge is added in all possible ways resulting in a set of bicyclic graphs, and so on, until constraints are violated. Eventually, the algorithm enumerates all cyclic graphs based on G. Then, another vertex is connected in all possible ways, resulting in a new set of acyclic graphs, one of which is G'. As each G' is constructed, it is treated as G and the process recurs until constraints are violated. Eventually, the algorithm enumerates all acyclic graphs based on G. This implementation enumerates acyclic branched graphs before linear ones. For example, neopentane and 2,2-dimethylbutane occur early in the enumeration and n-pentane and n-hexane occur late.
Use of Symmetry
Order of Vertices and Edges
Use of Constraints
From Graph Theory to Chemistry
Table 1: Number of Simple Graphs with a Given Vertex and Cycle Count Exhaustive Enumeration |
---|
|
Table 2: Number of Simple Graphs with a Given Vertex and Cycle Count World Drug Index 2004.2 Database |
---|
|
Table 3: Number of Simple Graphs with a Given Vertex and Cycle Count MedChem 2003 Database |
---|
|
Table 4: Number of Simple Graphs with a Given Vertex and Cycle Count National Cancer Institute 2000 Database |
---|
|
Table 5: Number of Simple Graphs with a Given Vertex and Cycle Count Wombat 2004.1 Database |
---|
|
Table 6: Number of Simple Graphs with a Given Vertex and Cycle Count Spresi 1995 Database |
---|
|
Table 7: Number of Simple Graphs with a Given Vertex and Cycle Count ChemNavigator 2004 Database |
---|
|
Table 8: Characterization of Databases All Compounds |
---|
|
|
Table 9: Characterization of Databases Compounds With 20 Vertices and 8 Cycles or Less |
---|
|
|
Table 10: Number of Simple Graphs with a Given Vertex and Cycle Count Exhaustive Enumeration Within Constraints |
---|
|
|
Findings