- Input Files : Datafile and Pedfile - in the same format as in original SuperLink.
- -optimize : flag that specifies whether the optimization should be activated in current run.
If this flag is provided, the optimization over the variable elimination order will be performed.
Otherwise, variable elimination order will be determined by the original greedy algorithm.
We determine the problem complexity
coefficient by the following formula :
complexityCoefficient
= log10 (sum of the table sizes * the total complexity of break variables)
.
We strive to reduce this coefficient during the optimization process.
We believe that this coefficient yields better evaluation of the program run time than the sum of the table sizes alone. The graphs below illustrate this conclusion.
Optimization
Modes
Tests show that the sum of table
sizes has greater impact on the program run time than the maximal table
size. But there are cases when a certain elimination sequence slightly
improves the first parameter, but drastically worsens the second. The experience
shows that in most cases it is still preferable as far as run time is concerned.
However, maximal table size is a potential bottleneck on small RAM machines.
Therefore, we define the following modes of optimization process :
Loose mode - we accept any improvement in sum of table sizes, regardless of its impact on the maximal table size.
Determine Optimization Policy
We use the complexity of the greedy
sequence in order to calculate the complexityCoefficient by the formula
above. We access the matching entry in the OptimizationPolicy
array where the optimization policy for this coefficient is defined.
Run the optimization process
according to the found optimization policy
In each step of the algorithm we
choose several variables with the least elimination complexity. They form
a variables
pool. The pool size is defined by the current OptimizationPolicy.
Among them, we choose the next variable to be eliminated by tossing a weighted
coin. A weight given to each variable is a function of its elimination
complexity. We chose 3 following functions to be applied : LOG10, SQRT
and the function that gives equal weight to all the variables. Once a variable
is chosen, we update the total sum of the table sizes and the maximal table
size (if needed) by its elimination complexity. Once the whole elimination
sequence is determined, we compare its complexity to the current minimum.
If we found a less complex sequence, we save it and discard the previous
minimum. Otherwise, we discard this sequence.
We repeat the search for elimination sequence a number of times. The number of iterations is determined by the current OptimizationPolicy. During the optimization process we update the number of iterations needed in accordance with reduction of the problem complexity. That is ,if the problem complexity is reduced to some pre-defined minimum, the optimization process is stopped in order to reduce optimization overhead for the total run time.
During the search for the elimination sequence we check that the intermediate results are better than the current record. If they are not, we stop the current iteration, as it has no chances to improve the results of the previous champion and proceed to the next iteration. The check is performed in accordance with the currently defined Optimization Mode. If it is STRICT_OPTIMIZATION - we discard any sequence that enlarges maximal table size more than 10% from the previous champion. Otherwise, we only check the total tables sizes sum.
Currently, we start the optimization using the SQRT- based weights for the coin (which generally yields better results). However, there are cases when one of the other functions conquers, so we repeat the optimization with the two other functions subsequently with the same number of iterations.
In order to further reduce the optimization
overhead for the total run time of heavy files, we apply the following
policy: we check an improvement rate every 200 iterations. If it
is insufficiently low, we discard the currently used heuristics and progress
to the next one.
typedef struct optimizationPolicyDef {
int numOfIterations;
int maxNumChoosenVars;
int skip;
int herNum;
}optimizationPolicyDef;optimizationPolicyDef optimizationPolicy[MAX_COMPLEXITY];
The data structure above defines the optimization policy. The optimizationPolicy array is indexed by a complexity coefficient. Each entry specifies the policy for its index. The optimizationPolicy array entries and size may be configured in accordance with user-preferred policy. Our current configuration is based on multiple experiments in our environment. This configuration should be revised while running on more powerful machines.
numOfIterations specifies the number of optimization iterations for each heuristics to be run.
The fields maxNumChoosenVars and skip define the sizes of the variables pools that will be used. The smallest pool size is hard-coded as 3 (which usually is efficiently enough), but in heavy files with a large amount of variables it is worth to expand the variables pool. The sizes of the variable pools that will be sequentially used start from 3 and increase in steps of skip till the maxNumChoosenVars.
herNum specifies the total number of variable pools to be used. This field is used in order to evaluate the time that will be dedicated to the optimization process.
typedef struct Heuristics {
int optimize;
int func;
int bestFunc;
int numBestVars;
int iteration;
double origProblemComplexity;
double newProblemComplexity;
BestResults *bestResults[3];
}Heuristics;The data structure above contains the details about the currently applied heuristics. optimize flag indicates that the optimization process is activated. func indicates the weight function applied. The functions are defined as following :
#define SQRT 0
#define LOG 1
#define EQ 2bestFunc
numBestVars is the size of current variables pool. iteration is the current iteration number.
origProblemComplexity is the original problem complexity calculated after running greedy algorithm. newProblemComplexity is the best complexity achieved by now.
BestResults *bestResults[3] points to the structure that holds the best results achieved by each heuristics. This information is needed only for statistics gathering and analyzing the performances of different heuristics.typedef struct BestResults {
double maxTable;
double sumTable;
double maxTableImprovement;
double sumTableImprovement;
int last_improvement_sum;
int last_improvement_max;
boolean used;
}BestResults;maxTable holds the maximal table size of the best sequence found by matching heuristics.
sumTable holds the sum of the table sizes of the best sequence found by matching heuristics.
used flag indicates that this heuristics is in use.The following fields are used for the dynamic evaluation of the heuristics performance:
last_improvement_sum is the last iteration that improved the sum of table sizes for the matching heuristics.
last_improvement_max is the last iteration that improved the maximal table size for the matching heuristics.
maxTableImprovement is the improvement rate in maximal table size.
sumTableImprovement is the improvement rate in the sum of table sizes.During the search for the next to be eliminated variable we create the variables pool (which is implemented as a bi-directional sorted linked list with entries of the type struct scoreData below). The size of the pool depends on the current optimization policy. We iterate over all the variables that are still not eliminated and insert them to the pool until it reaches its bounds. After that, each new variable's eliminate complexity is compared against the variables in the pool. If it is better than that of one of the variables in the pool, the new variable is added to the pool and the worst variable in the pool is removed.
typedef struct scoreData {
elimOrderVar *bestVar;
double score;
struct scoreData *prev,*next;
double relProb;
} scoreData;The data structure below holds information for a single variable in the variables pool. bestVar is a pointer to the variable.score is its eliminate complexity. relProb is the probability of the variable to be chosen. prev and next are used for iterating over the variables pool.
Specify
probability functions to apply
Default: Apply all existing
functions sequentially.
Configuration Switch: RUN
macro in Optimization.h
Currently the following functions
are available : LOG10, SQRT and the function that gives equal weight to
all the variables.
Use RUN macro in Optimization.h
to turn specific functions ON/OFF :
#define RUN(LOG) 1
#define RUN(SQRT) 1
#define RUN(EQ) 1
Specify
Optimization Mode (See explanations about
various modes)
Default: Loose Mode .
Configuration Switch: STRICT_OPTIMIZATION
constant in the Optimization.
To turn on Strict Mode , set the
STRICT_OPTIMIZATION to 1.
Specify
Optimization Policy (based on current h/w parameters)
Default: Defined in function
configureOptimizationPolicy() in EliminationOrder.c
Configuration Switch: OptimizationPolicy
array
View
Policy Configuration details
Turn
Debug messages ON/OFF
Default: Print all Debug
Messages
Configuration Switch:
in EliminationOrder.c
To turn debug messages OFF , set
DEBUG_OPTIMIZATION to 0;
Modifications performed in the original code of SuperLink:
Note: all the modifications are marked by // NGMS
added 3 functions: printBestResults ,configureOptimizationPolicy,getEliminateVarElimOrderProb to EliminationOrder.c
added the following fields to struct
elimOrderPed in "EliminationOrderStructures.h" :
double maxTableSize;
double sumTableSizes;
double problemComplexity;
The fields are used for storing the elimination complexity parameters for an elimination sequence found by the optimization procedure.
added struct scoreData to EliminationOrder.h
modified function main in main.c - added support for -optimize parameter and statistics gathering
modified function calcLikelihoodSMlink in Prog.c added support for -optimize parameter and statistics gathering.
modified function determineEliminationOrder in EliminationOrder.c - added implementation of the optimization algorithm.
modified function determineEliminationOrderPed - in EliminationOrder.c - moved the elimOrderArr array allocation to the determineEliminationOrder().
modified function createElimOrderArr - in EliminationOrder.c - added implementation of the probabilistic search for the elimination sequence.
BestResultsMax.csv - contains the best sum of tables sizes that was found by each one of the heuristics. This information is used to determine which heuristics is preferable for a given pedigree characteristics.
TotalResults.csv - contains various
data about the optimization process performance : the optimization policy
that was applied on current file, the time consumed by the optimization
process, the complexity of the best elimination sequence that was found
by the optimization procedure and the total run time. We use this file
to compare the performance of the optimized program with the performance
of the original program. The file is created in the following way :
Invoke the program on a chosen input files with -optimize flag first. After
it finishes, invoke it on the same files but without the -optimize flag
(this will run original greedy algorithm). The last run will only add its
total run time to the TotalResults.csv. Thus the two runs create one row
in this file that contains all the necessary data for analyzing the optimization
performance.