Our goal is to compare a metabolic pathway of a certain organism against the same metabolic pathways in Each enzyme is characterized by the reaction it catalyzes, or compare a metabolic pathway against other metabolic pathways in the same organism.
We represent a metabolic pathway by a graph whose nodes correspond to enzymes that catalyze the pathway's reactions, and the edges connect two nodes if for the corresponding enzymes the product of one serves as the substrate of the other. Thus, the problem is: Given a pattern tree P and a Text Tree T, find a subtree of T which is isomorphic to P or decide that there is no such tree.
When looking for the best match, we take into account both label similarity and topology. We are permited to delete only degree 2 vertexes from the text tree. And we are NOT permited to delete vertexes from the pattern tree.
A previous algorithm to solve the problem ONLY for treelike graphs already exists today.
We used the Modified decision Tree algorithm in order to solve the given problem.
The idea is to perform a decision tree search over the space {u,v½uÎV1; vÎV2 È {f} }, when Each node in the search tree represents a vertex in V1 that has either been matched with a vertex in V2 or with f (deletion). Each pathway from root to leaves in the search tree, represents a match of all vertices in V1 to vertices in V2 (or f).
The search tree advances from root to leaves, trying to find a new G1, G2 match: A vertex (V1,V2) is added to the tree if it V1 can be matched to V2 , when all pathway matches have already been made. In order to allow deletions, we store all simple paths of length up to a predefined length in a preprocessing procedure.
A cost is assigned to the matching between every two vertices and the cost of a partial matching consists of the sum of the costs of the matched vertices minus the deletion penalty. The tree search is expanded only to states which have the potential to overcome the score of the final states already found in the search tree.
The Decision Tree advances in a DFS style, causing it to store only one tree path (only one match between the pattern and tree graphs) at each step. Thus the space complexity of the algorithm is O(m), when 'm' is the number of nodes in a graph.
Example of a decision tree:
When running the program on a variety of Pattern and Text graphs, which have from 1 to 20 nodes, some containing inner circles and some don't, we found out the actual time results are fast (faster than the existing algorithm) , although the time complexity of the algorithm is high (exponential). The reason for this is that most of the Decision tree paths are not executed at all, causing the actual tree to be sparse.
We tested the Decision tree algorithm against the existing algorithm on many treelike graphs and found the results to be the same (except some minor bugs found in the existing algorithm).
The final test was done by taking 80 E.Coli metabolic pathways (some of them include inner circles) and trying to match them against each other. The 6400 results include for each Pattern and Text graph the size of the graphs, the algorithm running time, and the 5 best matches found for them.
The full results file can be found here.
Here is a selection of the results, which have the worst timings:
Text File: 
Pattern File: 
Text Nodes Num 
Pattern Nodes Num 
Time [sec] 
score1 
score2 
score3 
score4 
score5 
yeast64.grp 
yeast57.grp 
20 
14 
0.344 





yeast64.grp 
yeast65.grp 
20 
17 
0.344 





yeast64.grp 
yeast64.grp 
20 
20 
0.297 
0 




yeast64.grp 
yeast17.grp 
20 
15 
0.281 





yeast64.grp 
yeast79.grp 
20 
10 
0.25 
4.143 
0 
6.917 
6.917 
6.917 
yeast65.grp 
yeast79.grp 
17 
10 
0.25 
63.017 
61.779 
63.779 
63.017 
65.017 
yeast64.grp 
yeast77.grp 
20 
9 
0.219 





yeast64.grp 
yeast31.grp 
20 
10 
0.203 
74.517 
74.116 
74.517 
74.116 
74.517 
yeast64.grp 
yeast80.grp 
20 
7 
0.203 





yeast65.grp 
yeast31.grp 
17 
10 
0.203 
75.66 
75.66 
75.66 
75.66 
75.66 
yeast65.grp 
yeast65.grp 
17 
17 
0.187 
0 




yeast57.grp 
yeast79.grp 
14 
10 
0.172 





yeast64.grp 
yeast9.grp 
20 
9 
0.172 
65.714 
65.714 
65.714 
65.714 
66.952 
yeast64.grp 
yeast56.grp 
20 
6 
0.172 





yeast65.grp 
yeast17.grp 
17 
15 
0.172 





yeast65.grp 
yeast64.grp 
17 
20 
0.172 





yeast65.grp 
yeast80.grp 
17 
7 
0.172 





yeast65.grp 
yeast77.grp 
17 
9 
0.171 





yeast65.grp 
yeast2.grp 
17 
9 
0.141 
62.1 
62.1 
62.1 
62.1 
62.1 
yeast65.grp 
yeast56.grp 
17 
6 
0.141 
46.11 
46.11 
46.11 
46.11 
46.11 
yeast17.grp 
yeast79.grp 
15 
10 
0.14 





yeast64.grp 
yeast1.grp 
20 
9 
0.14 
24.908 
67.613 
30.417 
29.176 
29.176 
yeast17.grp 
yeast65.grp 
15 
17 
0.125 





yeast42.grp 
yeast52.grp 
6 
3 
0.125 
24.652 
24.652 
24.652 
24.652 
24.652 
Full report (PDF, in Hebrew)
Source code (.zip)