LAB IN BIOINFORMATICS

Preprocessing Reduction of Weighted Graphs

under the guidance of  Dan Geiger

by Nickolay Dovgolevsky


Purpose

Reduction of a weighted graph without increasing its weighted treewidth by set of safe reduction rules defined in [1].

Description of the problem

We can preprocess a graph by applying reduction rules that reduce it. This reduction has a property that a tree decomposition for the reduced graph with minimum treewidth can easily be extended to a tree decomposition for the original graph that also has minimum treewidth. A rule that does not change the treewidth of the graph is called safe. To allow more reduction rules we define a variable low, that is a lower bound of the treewidth of the graph. A reduction rule is safe if and only if the maximum of low is not changed. Preprocessing applies a series of safe reduction rules, taken from a set, on a graph, until no more reduction rules can be applied. When the preprocessing results in an empty graph, we can trivially find a tree decomposition with minimum treewidth for the original graph. When the graph resulting after preprocessing is not empty, either the treewidth can be computed for the remaining graph by an exact, but possibly computational expensive algorithm (when the resulting graph is relatively small), or the treewidth can be approximated using heuristic algorithms.

Preprocessing a graph by applying safe reduction rules takes on additional significance when using stochastic version of an algorithm. Because stochastic algorithms are usually applied a large number of times (at least 100), applying them on the reduced graph for the same number of iterations saves time. Using the saved time reduces the elimination cost found.

Preprocessing a graph may be used not only for finding treewidth or its approximation. For example, in many applications the problem is  to find the sum of neighborhood weights of eliminated vertices, also called the total state size. Despite the fact that safe rules do not change the treewidth of a graph, they nevertheless may increase the total state size. However, in practice, also for this problem, reduction rules reduce the running time and improve the cost.

I implemented the following reduction rules, that were defined in [1]:

  1. Simplicial rule.
  2. Almost simplicial rule.
  3. Buddies rule (generalized).
  4. Cube rule.

The variable low is an updated lower bound on treewidth computed while applying the simplicial reduction rule. The other rules remove vertices, the elimination cost of which is less or equal to low.

 

Experimental Results

 

First Experiment

This experiment is conducted on 20 probabilistic networks. The Diabetes, Link, Munin networks are taken from medical applications, Barley is used for agricultural purposes, the Water network models a water purification, and Superlink networks were generated by Superlink - a genetic linkage analysis program. Table 1 shows some results of the preprocessing technique. The preprocessing consists of 3 phases; for each phase a set of reduction rules is employed. In the first phase the set Prepro-1 = {simplicial rule} is used, for the second we applied the set Prepro-2 = {simplicial rule}, and in the third - the set Prepro-3 = {simplicial rule, Almost simplicial rule, Buddies rule and Cube rule}. The Buddies rule is applied separately for "buddies set" size = 2, 3, and 4. For each instance the size of the graph is shown after moralization, and after first, second and third phases. After each phase we also provide the percentage of vertices deleted by each reduction rule in the set, the value of variable low, representing the lower bound currently known for the weighted treewidth, and the running time of the preprocessing. The Buddies rule  with "buddies set" size = 4, and the Cube rule did not result in elimination of vertices, so there is no need in generalization of the Cube rule, as well as is running Buddies rule with set size more then 4. A summary explaining all table entries is available here.

 

Instance

Undirected graph

Prepro-1

Prepro-2

Prepro-3

|V|

|E|

%V

#smp

low

t

%t

%V

#smp

#asm

low

t

%t

%V

#smp

#asm

#bud

#bud3

#bud4

#cub

low

t

%t

Barley

48

126

27.1

13

40320

0.00

0.00

39.6

14

5

40320

0.00

0.00

39.6

14

5

0

0

0

0

40320

0.00

0.00

Diabetes

   413

    819

18.8

78

7056

0.00

0.00

19.6

78

3

7056

0.00

0.00

19.6

78

3

0

0

0

0

7056

0.01

4.76

Link

   724

  1738

31.7

230

128

0.00

0.00

53.2

230

155

128

0.00

0.00

53.2

230

155

0

0

0

0

128

0.02

7.13

Munin1

   189

366

42.9

81

600

0.00

0.00

52.2

82

17

600

0.00

0.00

52.2

82

17

0

0

0

0

600

0.00

0.00

Munin2

 1003

1662

55.2

554

600

0.00

0.00

68.4

564

122

600

0.00

0.00

68.4

564

122

0

0

0

0

600

0.03

9.56

Munin3

 1044

1745

59.9

625

600

0.00

0.00

82.4

652

208

2000

0.00

0.00

83.5

652

208

12

0

0

0

2000

0.03

9.18

Munin4

 1041

1843

58.1

605

600

0.00

0.00

74.0

614

156

2000

0.00

0.00

74.0

614

156

0

0

0

0

2000

0.02

6.83

Water

     32

123

25.0

8

3072

0.00

0.00

31.3

8

2

3072

0.00

0.00

31.3

8

2

0

0

0

0

3072

0.00

0.00

Superlink0_0

 1676

  2987

19.9

333

2000

0.02

1.40

58.0

407

578

2000

0.03

5.54

58.0

407

578

0

0

0

0

2000

0.10

18.7

Superlink1_1

 2449

  4320

19.3

473

2000

0.00

0.00

57.7

598

817

2000

0.06

6.74

57.7

596

817

2

0

0

0

2000

0.34

33.2

Superlink2_2

 3224

  5641

19.9

643

2000

0.02

1.19

54.5

779

979

2000

0.08

7.14

54.7

779

981

2

0

0

0

2000

0.30

16.7

Superlink0_19

 3254

  5754

23.9

779

2000

0.02

1.59

59.0

906

1013

2000

0.06

4.41

59.0

906

1013

0

0

0

0

2000

0.30

21.9

Superlink1_20

 4720

  8211

24.2

1142

2000

0.01

0.75

57.8

1339

1391

2000

0.15

7.58

57.7

1332

1391

2

0

0

0

2000

0.75

29.8

Superlink2_21

 6242

10814

24.1

1503

2000

0.03

0.77

54.7

1722

1694

2000

0.16

4.98

55.2

1724

1717

2

3

0

0

2000

0.63

18.3

Superlink0_38

 4859

  8517

25.2

1226

2000

0.03

1.32

59.6

1428

1469

2000

0.12

6.85

59.6

1426

1469

2

0

0

0

2000

0.56

23.6

Superlink1_39

 7040

12104

25.0

1762

2000

0.05

1.17

60.0

2106

2123

2000

0.28

9.14

60.0

2090

2123

16

0

0

0

2000

1.32

31.7

Superlink2_40

 9276

15926

24.7

2293

2000

0.05

0.77

55.3

2654

2479

2000

0.30

7.47

55.6

2650

2502

8

3

0

0

2000

1.56

24.3

Superlink0_57

 6372

11198

26.0

1659

2000

0.05

1.76

57.8

1883

1798

2000

0.17

5.87

57.8

1881

1798

2

3

0

0

2000

0.78

26.9

Superlink1_58

 9292

16096

25.8

2401

2000

0.06

1.05

57.9

2784

2594

2000

0.37

7.73

57.9

2768

2594

16

3

0

0

2000

2.23

38.9

Superlink2_59

12278

21215

25.6

3140

2000

0.06

0.67

54.3

3560

3104

2000

0.54

7.78

54.5

3554

3127

10

6

0

0

2000

6.02

73.2

Average:

3759

6560

30.1

977

3849

0.02

0.54

55.4

1120

1035

3.989

0.12

4.06

55.5

1118

1039

3.7

0.9

0

0

3989

0.75

19.8

 

 

Second Experiment

The next experiment has been conducted on 100 graphs to check whether the time saved due to the reductions can be used by running more iterations to improve the cost. Each graph was processed in four different ways: using the minimum fill-in algorithm with and without reductions and using the weighted minimum fill-in algorithm with and without reductions where the reductions used are the simplicial vertex elimination, the almost simplicial vertex elimination, and the buddy rule (with buddy sets of 3 or 4). The first experiment clearly shows that this is an appropriate choice of reduction rules. Each of the four runs was given 1000 seconds.

Summary of Results

The results show that these reductions use a very small amount of time and increase dramatically the number of stochastic iterations that the two min-fill algorithms can run in this time framework of 1000 seconds. Furthermore, the resulting cost was also improved in most cases. In particular, Min-Fill with reductions (as preprocessing) is superior over Min-Fill without reductions in 68% of the graphs. In this case the average improvement in cost is ~4.2% which means an average reduction of ~60% in memory requirements. If reductions are applied on all graphs the average improvement is ~2.9% which means an average reduction of ~43% in memory requirements. In 11% of the graphs Min-Fill found the same cost with and without reduction.
The result for the weighted version are similar. The weighted Min-Fill algorithm with reductions is superior over the weighted Min-Fill algorithm without reductions in 67% of the graphs. In this case the average improvement in cost is ~4.4%. Assuming the reductions are applied on all graphs (namely, also when they do not help) the average improvement reduces to ~2.8%. In 12% of the graphs the algorithm weighted Min-Fill found the same cost with and without reduction.

Detailed Results

Each cell in the table below contains two numbers. The cost found, namely, log10 of the sum of table sizes produced during elimination, and the number of iterations that fit into 1000 seconds. In addition, for each algorithm the lowest cost, with versus without reductions, is underlined. If equal, both are underlined.

Weighted graphs

pedfile

datafile

Number of loci

Number of people

% of typed persons

MIN_FILL

WMIN_FILL

No prepr

With prepr

No prepr

prepr

graph00

pedfile00

datafile00

12

25

80

3.95 6676

3.95 38385

3.94 3634

3.95 20616

graph01

pedfile01

datafile01

14

27

63

14.10 495

14.46 519

13.63 264

13.64 282

graph02

pedfile02

datafile02

16

22

89

5.28 4415

5.18 13066

5.24 2259

5.08 6494

graph03

pedfile03

datafile03

20

31

71

10.22 1167

10.12 1597

10.09 593

9.92 810

graph04

pedfile04

datafile04

17

29

56

11.35 1077

11.16 1263

10.11 597

10.15 712

graph05

pedfile05

datafile05

19

33

95

5.51 2561

5.46 8495

5.49 1410

5.37 4674

graph06

pedfile06

datafile06

18

31

47

8.84 1628

7.81 3202

8.43 905

7.53 1879

graph07

pedfile07

datafile07

13

30

30

2.80 22253

2.81 1301257

2.80 12772

2.81 733295

graph08

pedfile08

datafile08

22

38

86

6.56 1473

6.42 4127

6.32 817

6.35 2215

graph09

pedfile09

datafile09

20

48

91

8.12 820

7.80 1590

8.00 463

7.92 900

graph10

pedfile10

datafile10

25

38

84

13.45 395

12.39 608

11.89 232

11.67 370

graph11

pedfile11

datafile11

23

45

78

12.66 393

12.87 529

13.42 215

12.90 295

graph12

pedfile12

datafile12

19

47

74

6.25 1325

6.09 3114

6.23 718

6.07 1677

graph13

pedfile13

datafile13

21

44

80

10.95 583

11.01 873

11.00 314

10.94 469

graph14

pedfile14

datafile14

27

42

90

17.17 315

16.80 436

16.52 166

16.53 234

graph15

pedfile15

datafile15

25

42

70

13.49 381

13.58 489

13.35 203

13.57 268

graph16

pedfile16

datafile16

21

37

87

7.24 1521

7.05 5795

7.11 829

6.77 3189

graph17

pedfile17

datafile17

24

43

93

6.03 1240

5.81 4764

5.95 691

5.84 2605

graph18

pedfile18

datafile18

30

40

80

10.14 480

10.27 847

10.08 264

9.92 481

graph19

pedfile19

datafile19

34

38

100

6.58 791

6.55 2434

6.43 444

6.38 1355

graph20

pedfile20

datafile20

32

40

85

15.56 193

13.13 293

15.69 103

14.11 162

graph21

pedfile21

datafile21

31

56

77

13.58 255

13.51 372

13.56 139

13.45 199

graph22

pedfile22

datafile22

24

38

69

12.04 540

11.53 722

12.77 284

11.56 384

graph23

pedfile23

datafile23

29

41

74

6.94 784

6.84 3707

7.01 429

6.73 2014

graph24

pedfile24

datafile24

26

34

88

9.60 726

8.99 1745

9.64 378

8.77 911

graph25

pedfile25

datafile25

28

37

64

8.81 669

8.26 1764

8.93 344

8.18 879

graph26

pedfile26

datafile26

20

50

80

13.97 367

13.60 628

14.40 193

12.52 338

graph27

pedfile27

datafile27

18

51

97

8.72 1200

8.43 2980

8.22 661

7.66 1729

graph28

pedfile28

datafile28

37

32

83

9.68 473

9.56 766

9.63 241

9.57 396

graph29

pedfile29

datafile29

20

36

57

12.32 660

11.77 877

12.60 304

11.88 418

graph30

pedfile30

datafile30

14

49

67

15.42 603

15.16 784

14.46 269

13.67 334

graph31

pedfile31

datafile31

17

26

73

4.39 4016

4.29 17593

4.43 2082

4.26 9062

graph32

pedfile32

datafile32

15

24

36

3.09 41589

3.10 1339162

3.09 22362

3.10 680463

graph33

pedfile33

datafile33

18

25

19, 36, 62, 84

11.17 1165

10.26 1758

10.33 590

10.26 877

graph34

pedfile34

datafile34

19

33

14, 31, 48, 66, 85

18.89 165

19.31 172

16.54 94

16.86 100

graph35

pedfile35

datafile35

15

38

22, 44, 66, 88

17.04 344

16.90 366

16.48 153

15.91 162

graph36

pedfile36

datafile36

16

23

13, 29, 40, 55, 74, 87

10.64 1728

10.34 2463

9.17 940

8.95 1357

graph37

pedfile37

datafile37

24

30

15, 29, 49, 66, 79

22.52 93

21.84 100

17.84 68

17.03 75

graph38

pedfile38

datafile38

21

35

22, 37, 62, 83

23.20 100

22.46 103

17.37 62

17.27 68

graph39

pedfile39

datafile39

27

28

15, 30, 49, 61, 80

13.59 380

13.35 421

13.52 194

12.97 220

graph40

pedfile40

datafile40

17

22

12, 26, 44, 56, 69, 82

17.53 491

16.96 547

13.49 283

13.32 315

graph41

pedfile41

datafile41

25

32

18, 34, 45, 67, 83

18.30 191

17.20 215

17.97 106

16.97 118

graph42

pedfile42

datafile42

20

27

18, 39, 62, 81

19.07 147

18.56 156

16.33 92

16.37 99

graph43

pedfile43

datafile43

25

22

17, 32, 49, 60, 85

11.17 600

11.37 695

11.50 315

11.22 370

graph44

pedfile44

datafile44

23

19

21, 37, 61, 78

13.63 516

13.98 554

10.63 325

10.73 341

graph45

pedfile45

datafile45

30

18

15, 32, 48, 62, 79

12.89 380

12.52 439

12.10 220

11.92 264

graph46

pedfile46

datafile46

26

22

23, 54, 78

12.09 621

10.81 839

11.28 398

10.87 539

graph47

pedfile47

datafile47

21

33

10, 23, 36, 47, 63, 72, 88

5.15 2572

4.99 9475

5.06 1513

4.85 5736

graph48

pedfile48

datafile48

34

20

10, 29, 54, 69, 90

8.48 1130

7.72 2264

7.99 678

6.95 1421

graph49

pedfile49

datafile49

19

26

18, 35, 65, 87

20.42 170

19.69 183

16.18 111

16.12 119

graph50

pedfile50

datafile50

24

24

19, 25, 54, 70, 91

15.38 239

15.59 251

15.26 141

14.76 148

graph51

pedfile51

datafile51

25

28

15, 50, 86

18.57 146

17.64 167

16.67 95

15.59 107

graph52

pedfile52

datafile52

15

18

9, 20, 33, 44, 53, 78, 98

8.27 3829

8.29 4877

6.59 2540

6.52 3293

graph53

pedfile53

datafile53

23

33

10, 20, 40, 80

17.05 235

16.22 245

14.22 154

14.55 164

graph54

pedfile54

datafile54

19

40

89

6.60 2820

6.23 14032

6.79 1661

6.34 7260

graph55

pedfile55

datafile55

24

21

74

9.53 960

9.25 1425

9.38 560

9.32 833

graph56

pedfile56

datafile56

24

40

50

11.87 332

11.76 400

11.83 199

11.65 232

graph57

pedfile57

datafile57

25

20

27

4.83 3233

4.81 12811

4.78 1862

4.78 7254

graph58

pedfile58

datafile58

24

33

64

14.20 341

13.78 394

13.41 182

14.16 211

graph59

pedfile59

datafile59

22

22

14

3.36 6089

3.36 1311527

3.36 3554

3.36 744369

graph60

pedfile60

datafile60

14

20

6

3.19 9009

3.19 1294078

3.19 5297

3.19 743450

graph61

pedfile61

datafile61

27

16

3

3.69 18845

3.82 145720

3.69 10809

3.81 82540

graph62

pedfile62

datafile62

27

22

84

12.13 891

12.33 1316

11.39 504

11.39 748

graph63

pedfile63

datafile63

24

31

7

3.63 2418

3.63 1306060

3.63 1435

3.63 749090

graph64

pedfile64

datafile64

28

25

51

10.49 794

10.15 1045

9.89 466

9.05 597

graph65

pedfile65

datafile65

25

31

41

12.05 332

11.49 405

11.39 184

11.79 222

graph66

pedfile66

datafile66

26

45

75

8.67 672

8.45 1967

8.54 383

8.47 1101

graph67

pedfile67

datafile67

23

36

73

12.06 468

12.46 639

11.75 271

11.50 369

graph68

pedfile68

datafile68

15

26

77

4.11 5428

4.12 38726

4.09 3072

4.11 21133

graph69

pedfile69

datafile69

22

42

10

3.31 16105

3.61 348323

3.31 9594

3.61 207446

graph70

pedfile70

datafile70

20

21

96

3.96 6290

3.97 75238

3.94 3868

3.95 45068

graph71

pedfile71

datafile71

27

15

79

6.29 2567

6.33 3748

6.35 1510

6.26 2201

graph72

pedfile72

datafile72

29

41

44

10.37 732

9.26 1645

10.13 426

9.14 962

graph73

pedfile73

datafile73

21

24

10

3.24 7858

3.24 1288974

3.24 4850

3.24 797134

graph74

pedfile74

datafile74

21

19

52

3.35 5958

3.35 1306149

3.35 3665

3.35 788481

graph75

pedfile75

datafile75

26

38

58

7.10 912

6.96 2588

7.03 545

6.96 1526

graph76

pedfile76

datafile76

17

41

98

13.18 452

12.48 752

13.03 255

11.91 440

graph77

pedfile77

datafile77

24

35

50

16.07 348

15.07 545

14.67 202

13.86 333

graph78

pedfile78

datafile78

28

43

41

11.58 888

11.55 1977

10.85 534

10.61 1005

graph79

pedfile79

datafile79

28

26

37

8.24 1182

7.59 3425

8.11 727

7.39 2057

graph80

pedfile80

datafile80

24

27

52

3.66 2816

3.66 223139

3.66 1692

3.66 133853

graph81

pedfile81

datafile81

11

22

68

5.99 5348

5.75 23217

5.87 2926

5.63 12470

graph82

pedfile82

datafile82

21

27

95

5.90 1925

5.78 3656

5.73 1071

5.74 2500

graph83

pedfile83

datafile83

14

47

79

4.98 2095

4.95 4987

4.96 1129

4.94 2685

graph84

pedfile84

datafile84

15

22

93

6.48 3610

6.51 5640

6.39 2008

6.40 3097

graph85

pedfile85

datafile85

19

15

82

6.46 3977

6.24 7187

6.26 2317

6.22 4159

graph86

pedfile86

datafile86

28

38

90

4.27 1365

4.15 39776

4.30 785

4.15 22725

graph87

pedfile87

datafile87

10

20

45

5.54 5998

5.48 14932

5.69 3462

5.36 8506

graph88

pedfile88

datafile88

15

45

22

2.09 162067

2.09 1266975

2.09 98110

2.09 757484

graph89

pedfile89

datafile89

25

32

31

15.29 314

15.08 344

15.11 139

14.72 156

graph90

pedfile90

datafile90

12

34

76

4.68 2762

4.55 10122

4.66 1872

4.53 6737

graph91

pedfile91

datafile91

12

15

80

2.85 14328

2.85 679530

2.85 15322

2.85 755786

graph92

pedfile92

datafile92

25

49

3

3.89 524

3.89 670371

3.89 580

3.89 751241

graph93

pedfile93

datafile93

21

24

11

3.50 2310

3.50 703264

3.50 2444

3.50 752771

graph94

pedfile94

datafile94

21

42

12

7.47 982

6.60 2495

7.52 970

6.91 2424

graph95

pedfile95

datafile95

13

28

16

4.40 3070

4.25 16803

4.40 2967

4.25 16196

graph96

pedfile96

datafile96

13

42

33

8.13 1349

7.32 3705

8.48 1205

7.45 3116

graph97

pedfile97

datafile97

23

33

43

15.50 125

14.46 144

14.27 122

14.69 142

graph98

pedfile98

datafile98

27

22

44

6.29 1873

6.13 5823

6.34 1796

6.15 5412

graph99

pedfile99

datafile99

27

31

8

3.72 7480

3.83 88146

3.72 7259

3.83 85595

 

Conclusions

Experiments were conducted on graphs of a set of probabilistic networks taken from real-life applications. These experiments revealed that a subset of the identified reduction rules were able to reduce the graphs significantly. Therefore, preprocessing by applying reduction rules is a useful technique for the problem of constructing tree decompositions with minimum weighted treewidth.
In addition, one of the goals of this project was to find a good upper bound for the "buddies set" size and to check how often the Cube rule is applied. The Buddies rule with "buddies set" size = 4, and the Cube rule are not very useful, since they were not applied even once in all the experiments and hence the time to check their prerequisites is likely to be higher than the time saved on performing the iterations.


Acknowledgements
:

I thank Dan Geiger for his help and guidance. I also thank Ma'ayan Fishelson and the Research Unit of Decision Support Systems Aalborg University for providing instances of probabilistic networks.

 

Definitions, algorithms and some of the wordings were taken from the following references:

References

1. Hans L. Bodlaender, Arie M. C. A. Koster, Frank van den Eijkof, Linda C. van der Gaag: Pre-processing for Triangulation of Probabilistic Networks (ZIB Report 01-39, appeared in Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, J. Breese and D. Koller Eds., (2001), pp. 32-39, Published by Morgan Kaufmann Publishers, San Francisco)

2. Frank van den Eijkhof, Hans L. Bodlaender, Arie M.C.A. Koster: Safe reduction rules for weighted treewidth (Konrad-Zuse-Zentrum für Informationstechnik Berlin, ZIB PaperWeb Reports / Preprints 2002 )