Applying Ck Hybridization Arrays in Expression Profiling –
a Theoretical Study
In any living cell that undergoes a biological process; different subsets of the total set of genes encoded in the organism’s genome are expressed in different stages of the process. The particular subset expressed at a given stage and its quantitative composition is of extreme importance. Being able to measure subsets of genes that express themselves in different stages, different cells, and different organisms is instrumental in understanding biological processes. Such information can help the characterization of sequence-to-function relationship and the determination of effects (and side effects) of experimental treatment. The most successful and most widely used techniques for measuring expression profiles utilize specifically designed surface-bound probes in an assay based on hybridization arrays. One example of an existing generic method that doesn’t require prior determination of the RNA to be measured is SAGE.
In this work we study theoretical and feasibility aspects of a generic micro-array based approach to expression profiling, from the computational point of view.
We examine the following question: what is the quantitative effect of the noise variance on the hybridization array’s performance? To be more specific: how large can random (Gaussian) noise in the fluorescence-pattern get, and still be tolerated by a generic hybridization array? (Tolerated noise here means that the array still yields the right answer, with high probability, measured according to some reasonable probability measures on the input space).
Given a mixture of many different RNA strands (with known sequences), we want to determine the expression levels of each sequence in the mixture, using a generic array based hybridization assay, and our knowledge of the hybridization signatures of each component of the mixture.
Consider a mixture of known RNA sequences. We try to determine the expression levels of each RNA molecule in the mixture by performing the following:
This will work if the hybridization signature is linear in the relative concentration of the different RNA molecules in the mixture. In reality this is not the case, but we assume that it is approximately linear. If we had an "ideal" system, under the linearity assumption, and the matrix A was non-singular, the process described above would give us the exact and unique concentration vector b.
Unfortunately we have some factors that can cause an error in our results:
We treat all these factors as noise, and want to find out how this noise affects the accuracy of our calculated concentration vector.
We performed simulations of the proposed method to find out the correlation between the noise STD and the accuracy of the result expression vector. We simulated the experiment on Yeast genes sequences, with different values of noise STD, and different numbers of sequences.
We found out that there is a linear relation between the STD of the noise, and the average distance of the result expression vector from the original one.
We also found out that the accuracy of the
results improves when using a larger Ck array (k = 7-mers instead of 6-mers), or
when we perform the assay on a smaller number of gene sequences.