Calculate the expectation: number of palindromes to be found in a string,
of length n.
The equation, includes the following parameters:
n – length of string.
l – length of palindrome (one side of pal’).
G – maximum length of gap.
mis – max number of mismatches allowed.
p - number of palindromes in a string of length n.
First we choose x places to locate the mismatches in the second part of the palindrome (which contains l nucleotides).
Since there are four nucleotides and x mismatches allowed, we obtain 1/(4l-x).
For each mismatch there are ¾ possibilities.
Summing up maximum “mis” mismatches multiply it by (G+1) possible gap sizes, and by (n-2*l+1) number of places to start the palindrome.
The subtraction of ((1+G)*G/2) is for the pal’ in the end of the string, which can’t have all possibilities of gap size.
Calculate the same expectation like in equation a, but now, consulting the background distribution of the specific sequence, meaning the probability of having a mach between two bases, is not ¼ as we consider in equation a, but depended on the probability of each base to appear in the sequence.
For example if we have a sequence length of 4000 bases and base A is appear in it500timewecanassumep(A)=500/4000=1/8 .
Computing this probability for each one of the DNA bases, so that the probability for a mach between two bases is now:
R = ( p(A) 2+p(C) 2+ p(G) 2+p(T)2 ) (and not ¼ as in equation a)
(The probability for a mismatch in now: (1-R) and not ¾ as in equation a).
The new expectation: (number of palindromes to be found in a string, of length n
with consolation with background distribution) is given by:
Calculating the probability to a find a specific palindrome of length l, k times in a string of length n.
·These methods help us to evaluate the significance of the palindromes we have found.