BioInformatics & BestKeeeper Software & Gene Quantification Homepage

Posttranscriptional Expression Regulation -- What Determines Translation Rates ?
Regina Brockmann, Andreas Beyer, Juergen J. Heinisch, Thomas Wilhelm
Theoretical Systems Biology, Leibniz Institute for Age Research, Fritz Lipmann Institute, Jena, Germany,
Fachbereich Biologie/Chemie, AG Genetik, Universitaet Osnabrueck, Osnabrueck, Germany,
Department of Bioengineering, University of California San Diego, La Jolla, California, USA

Recent analyses indicate that differences in protein concentrations are only 20%–40% attributable to variable mRNA levels, underlining the importance of posttranscriptional regulation. Generally, protein concentrations depend on the translation rate (which is proportional to the translational activity, TA) and the degradation rate. By integrating 12 publicly available large-scale datasets and additional database information of the yeast Saccharomyces cerevisiae, wesystematically analyzed five factors contributing to TA: mRNA concentration, ribosome density, ribosome occupancy,the codon adaptation index, and a newly developed ‘‘tRNA adaptation index.’’ Our analysis of the functionalrelationship between the TA and measured protein concentrations suggests that the TA follows Michaelis–Mentenkinetics. The calculated TA, together with measured protein concentrations, allowed us to estimate degradation ratesfor 4,125 proteins under standard conditions. A significant correlation to recently published degradation ratessupports our approach. Moreover, based on a newly developed scoring system, we identified and analyzed genessubjected to the posttranscriptional regulation mechanism, translation on demand. Next we applied these findings topublicly available data of protein and mRNA concentrations under four stress conditions. The integration of thesemeasurements allowed us to compare the condition-specific responses at the posttranscriptional level. Our analysis ofall 62 proteins that have been measured under all four conditions revealed proteins with very specific posttranscriptionalstress response, in contrast to more generic responders, which were nonspecifically regulated under severalconditions. The concept of specific and generic responders is known for transcriptional regulation. Here we show that it also holds true at the posttranscriptional level.

How does gene expression clustering work?
Patrik D'haeseleer, Nature Biotechnology 23, 1499 - 1501 (2005)

Patrik D'haeseleer is in the Microbial Systems Division, Biosciences Directorate, Lawrence Livermore National Laboratory, PO Box 808, L-448, Livermore, California 94551, USA.

Clustering is often one of the first steps in gene expression analysis. How do clustering algorithms work, which ones should we use and what can we expect from them?

Our ability to gather genome-wide expression data has far outstripped the ability of our puny human brains to process the raw data. We can distill the data down to a more comprehensible level by subdividing the genes into a smaller number of categories and then analyzing those. This is where clustering comes in. The goal of clustering is to subdivide a set of items (in our case, genes) in such a way that similar items fall into the same cluster, whereas dissimilar items fall in different clusters. This brings up two questions: first, how do we decide what is similar; and second, how do we use this to cluster the items? The fact that these two questions can often be answered independently contributes to the bewildering variety of clustering algorithms. Gene expression clustering allows an open-ended exploration of the data, without getting lost among the thousands of individual genes. Beyond simple visualization, there are also some important computational applications for gene clusters. For example, Tavazoie et al.1 used clustering to identify cis-regulatory sequences in the promoters of tightly coexpressed genes. Gene expression clusters also tend to be significantly enriched for specific functional categories—which may be used to infer a functional role for unknown genes in the same cluster.In this primer, I focus specifically on clustering genes that show similar expression patterns across a number of samples, rather than clustering the samples themselves (or both). I hope to leave you with some understanding of clustering in general and three of the more popular algorithms in particular. Where possible, I also attempt to provide some practical guidelines for applying cluster analysis to your own gene expression data sets.

Evaluation of gene-expression clustering via mutual information distance measure.
Priness I, Maimon O, Ben-Gal I.; BMC Bioinformatics. 2007 Mar 30;8(1): 111

BACKGROUND: The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI) measure versus the use of the well known Euclidean distance and Pearson correlation coefficient. RESULTS: Relying on several public gene expression datasets, we evaluate the homogeneity and separation scores of different clustering solutions. It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions. The proposed measure was also used to analyze the performance of several known clustering algorithms. A comparative study of these algorithms reveals that their best solutions are ranked almost oppositely when using different distance measures, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions. CONCLUSIONS: In view of the results, further attention should be paid in the selection of proper distance measure for analyzing the clustering of gene expression data.

How to infer gene networks from expression profiles.
Mukesh Bansal, Vincenzo Belcastro, Alberto Ambesi-Impiombato & Diego di Bernardo
Telethon Institute of Genetics and Medicine, Via P Castellino, Naples, Italy,
European School of Molecular Medicine, Naples, Italy,
Department of Natural Sciences, University of Naples ‘Federico II’, Naples, Italy and
Department of Neuroscience, University of Naples ‘Federico II’, Naples, Italy
These authors contributed equally to this work
Systems Biology Lab, Telethon Institute of Genetics and Medicine, Via P Castellino 111, Naples 18131, Italy.
Molecular Systems Biology 13 February 2007

Inferring, or ‘reverse-engineering’, gene networks can be defined as the process of identifying gene interactions from experimental data through computational analysis. Gene expression data from microarrays are typically used for this purpose. Here we compared different reverseengineering algorithms for which ready-to-use softwarewas available and that had been tested on experimental data sets. We show that reverse-engineering algorithmsare indeed able to correctly infer regulatory interactions among genes, at least when one performs perturbationexperiments complying with the algorithm requirements. These algorithms are superior to classic clustering algorithmsfor the purpose of finding regulatory interactions among genes.

Distribution-insensitive cluster analysis in SAS on real-time PCR gene
expression data of steadily expressed genes.
Tichopad A, Pecen L, Pfaffl MW.
Comput Methods Programs Biomed. 2006 Apr;82(1):44-50. Epub 2006

Cluster analysis is a tool often employed in the micro-array techniques but used less in the real-time PCR. Herein we present core SAS code that instead of the Euclidian distances takes correlation coefficient as a dissimilarity measure. The dissimilarity measure is made robust using a rank-order correlation coefficient rather than a parametric one. There is no need for an overall probability adjustment like in scoring methods based on repeated pair-wise comparisons. The rank-order correlation matrix gives a good base for the clustering procedure of gene expression data obtained by real-time RT-PCR as it disregards the different expression levels. Associated with each cluster is a linear combination of the variables in the cluster, which is the first principal component. Large set of variables can then be replaced by the set of cluster components with little loss of information. In this way, distinct clusters containing unregulated housekeeping genes along with other steadily expressed genes can be disclosed and utilized for standardization purposes. Simulated data in parallel with the data from a biological experiment were taken to validate the SAS macro. For both cases, good intuitive results were obtained.and, although further improvements are needed, have reached a discreet performance for being practically useful.

A new molecular breast cancer subclass defined from a large scale real-time quantitative RT-PCR study.
Maïa Chanrion, Hélène Fontaine, Carmen Rodriguez, Vincent Negre, Frédéric Bibeau,
Charles Theillet, Alain Hénaut and Jean-Marie Darbon
1INSERM U868, Cancer Research Centre, CRLC Val d'Aurelle-Paul Lamarque, Montpellier, France
2UMS 2293 CNRS, University of Evry-Val d'Essonne, France
BMC Cancer 2007, 7:39

Background
Current histo-pathological prognostic factors are not very helpful in predicting the clinical outcome of breast cancer due to the disease's heterogeneity. Molecular profiling using a large panel of genes could help to classify breast tumours and to define signatures which are predictive of their clinical behaviour.
Methods
To this aim, quantitative RT-PCR amplification was used to study the RNA expression levels of 47 genes in 199 primary breast tumours and 6 normal breast tissues. Genes were selected on the basis of their potential implication in hormonal sensitivity of breast tumours. Normalized RT-PCR data were analysed in an unsupervised manner by pairwise hierarchical clustering, and the statistical relevance of the defined subclasses was assessed by Chi2 analysis. The robustness of the selected subgroups was evaluated by classifying an external and independent set of tumours using these Chi2-defined molecular signatures.
Results
Hierarchical clustering of gene expression data allowed us to define a series of tumour subgroups that were either reminiscent of previously reported classifications, or represented putative new subtypes. The Chi2 analysis of these subgroups allowed us to define specific molecular signatures for some of them whose reliability was further demonstrated by using the validation data set. A new breast cancer subclass, called subgroup 7, that we defined in that way, was particularly interesting as it gathered tumours with specific bioclinical features including a low rate of recurrence during a 5 year follow-up.
Conclusion
The analysis of the expression of 47 genes in 199 primary breast tumours allowed classifying them into a series of molecular subgroups. The subgroup 7, which has been highlighted by our study, was remarkable as it gathered tumours with specific bioclinical features including a low rate of recurrence. Although this finding should be confirmed by using a larger tumour cohort, it suggests that gene expression profiling using a minimal set of genes may allow the discovery of new subclasses of breast cancer that are characterized by specific molecular signatures and exhibit specific bioclinical features.

Transcriptional regulatory network analysis of developing human erythroid
progenitors reveals patterns of coregulation and potential transcriptional regulators.
Keller MA, Addya S, Vadigepalli R, Banini B, Delgrosso K, Huang H, Surrey S.
Cardeza Foundation of Hematologic Research, Jefferson Medical College,
Philadelphia, Pennsylvania 19107, USA.
Physiol Genomics. 2006 Dec 13;28(1): 114-128.

Deciphering the molecular basis for human erythropoiesis should yield information benefiting studies of the hemoglobinopathies and other erythroid disorders. We used an in vitro erythroid differentiation system to study the developing red blood cell transcriptome derived from adult CD34+ hematopoietic progenitor cells. mRNA expression profiling was used to characterize developing erythroid cells at six time points during differentiation (days 1, 3, 5, 7, 9, and 11). Eleven thousand seven hundred sixty-three genes (20,963 Affymetrix probe sets) were expressed on day 1, and 1,504 genes, represented by 1,953 probe sets, were differentially expressed (DE) with 537 upregulated and 969 downregulated. A subset of the DE genes was validated using real-time RT-PCR. The DE probe sets were subjected to a cluster metric and could be divided into two, three, four, five, or six clusters of genes with different expression patterns in each cluster. Genes in these clusters were examined for shared transcription factor binding sites (TFBS) in their promoters by comparing enrichment of each TFBS relative to a reference set using transcriptional regulatory network analysis. The sets of TFBS enriched in genes up- and downregulated during erythropoiesis were distinct. This analysis identified transcriptional regulators critical to erythroid development, factors recently found to play a role, as well as a new list of potential candidates, including Evi-1, a potential silencer of genes upregulated during erythropoiesis. Thus this transcriptional regulatory network analysis has yielded a focused set of factors and their target genes whose role in differentiation of the hematopoietic stem cell into distinct blood cell lineages can be elucidated.

An approach for clustering gene expression data with error information.
Tjaden B.
Computer Science Department, Wellesley College, Wellesley, MA 02481, USA.
BMC Bioinformatics. 2006 Jan 12;7:17.

BACKGROUND: Clustering of gene expression patterns is a well-studied technique for elucidating trends across large numbers of transcripts and for identifying likely co-regulated genes. Even the best clustering methods, however, are unlikely to provide meaningful results if too much of the data is unreliable. With the maturation of microarray technology, a wealth of research on statistical analysis of gene expression data has encouraged researchers to consider error and uncertainty in their microarray experiments, so that experiments are being performed increasingly with repeat spots per gene per chip and with repeat experiments. One of the challenges is to incorporate the measurement error information into downstream analyses of gene expression data, such as traditional clustering techniques.
RESULTS: In this study, a clustering approach is presented which incorporates both gene expression values and error information about the expression measurements. Using repeat expression measurements, the error of each gene expression measurement in each experiment condition is estimated, and this measurement error information is incorporated directly into the clustering algorithm. The algorithm, CORE (Clustering Of Repeat Expression data), is presented and its performance is validated using statistical measures. By using error information about gene expression measurements, the clustering approach is less sensitive to noise in the underlying data and it is able to achieve more accurate clusterings. Results are described for both synthetic expression data as well as real gene expression data from Escherichia coli and Saccharomyces cerevisiae.
CONCLUSION: The additional information provided by replicate gene expression measurements is a valuable asset in effective clustering. Gene expression profiles with high errors, as determined from repeat measurements, may be unreliable and may associate with different clusters, whereas gene expression profiles with low errors can be clustered with higher specificity. Results indicate that including error information from repeat gene expression measurements can lead to significant improvements in clustering accuracy.

Expression profiles and biological function.
Oliveros JC, Blaschke C, Herrero J, Dopazo J, Valencia A.
Protein Design Group, Centro Nacional de Biotecnolog i a (CNB-CSIC), Campus de Cantoblanco, 28049 Madrid, Spain.
Genome Inform Ser Workshop Genome Inform. 2000; 11: 106-117.

Expression arrays facilitate the monitoring of changes in expression patterns of large collections of genes. It is generally expected that genes with similar expression patterns would correspond to proteins of common biological function. We assess this common assumption by comparing levels of similarity of expression patterns and statistical significance of biological terms that describe the corresponding protein functions. Terms are automatically obtained by mining large collections of Medline abstracts. We propose that the combined use of the tools for expression profiles clustering and automatic function retrieval, can be useful tools for the detection of biologically relevant associations between genes in complex gene expression experiments. The results obtained using publicly available experimental data show how, in general, an increase in the similarity of the expression patterns is accompanied by an enhancement of the amount of specific functional information or, in other words, how the selected terms became more specific following an increase in the specificity of the expression patterns. Particularly interesting are the discrepancies from this general trend, i.e. groups of genes with similar expression patterns but very little in common at the functional level. In these cases the similarity of their expression profiles becomes the first link between previously unrelated genes.

Smoking and cancer-related gene expression in bronchial epithelium and non-small-cell lung cancers.
Woenckhaus M, Klein-Hitpass L, Grepmeier U, Merk J, Pfeifer M, Wild P,
Bettstetter M, Wuensch P, Blaszyk H, Hartmann A, Hofstaedter F, Dietmaier W.
Department of Pathology, University of Regensburg, Franz-Josef-Strauss-Allee 11,
D 93053 Regensburg, Germany.
J Pathol. 2006 210(2):192-204.

Tobacco smoking is the leading cause of lung cancer worldwide. Gene expression in surgically resected and microdissected samples of non-small-cell lung cancers (18 squamous cell carcinomas and nine adenocarcinomas), matched normal bronchial epithelium, and peripheral lung tissue from both smokers (n = 22) and non-smokers (n = 5) was studied using the Affymetrix U133A array. A subset of 15 differentially regulated genes was validated by real-time PCR or immunohistochemistry. Hierarchical cluster analysis clearly distinguished between benign and malignant tissue and between squamous cell carcinomas and adenocarcinomas. The bronchial epithelium and adenocarcinomas could be divided into the two subgroups of smokers and non-smokers. By comparison of the gene expression profiles in the bronchial epithelium of non-smokers, smokers, and matched cancer tissues, it was possible to identify a signature of 23 differentially expressed genes, which might reflect early cigarette smoke-induced and cancer-relevant molecular lesions in the central bronchial epithelium of smokers. Ten of these genes are involved in xenobiotic metabolism and redox stress (eg AKR1B10, AKR1C1, and MT1K). One gene is a tumour suppressor gene (HLF); two genes act as oncogenes (FGFR3 and LMO3); two genes are involved in matrix degradation (MMP12 and PTHLH); three genes are related to cell differentiation (SPRR1B, RTN1, and MUC7); and five genes have not been well characterized to date. By comparison of the tobacco-exposed peripheral alveolar lung tissue of smokers with non-smokers and with adenocarcinomas from smokers, it was possible to identify a signature of 27 other differentially expressed genes. These genes are involved in the metabolism of xenobiotics (eg GPX2 and FMO3) and may represent cigarette smoke-induced, cancer-related molecular targets that may be utilized to identify smokers with increased risk for lung cancer.

Error propagation in real-time PCR.

A general model of error-prone PCR
Leighton Pritcharda, Dave Corneb, Douglas Kella, Jem Rowlandc, Mike Winsona
Institute of Biological Sciences, University of Wales, Aberystwyth, Ceredigion, SY23 3DD,Wales,UK
School of Systems Engineering, University of Reading, P.O. Box 225, Whiteknights, Reading,Berkshire RG6 6AY, UK
Department of Computer Science, University of Wales, Aberystwyth, Ceredigion, SY23 3DB, UK

In this paper, we generalise a previously-described model of the error-prone polymerase chain reaction (PCR) reaction to conditions of arbitrarily variable ampliﬁcation efﬁciency and initial population size. Generalisation of the model to these conditions improves the correspondence to observed and expected behaviours of PCR, and restricts the extent to which the model may explore sequence space for a prescribed set of parameters. Error-prone PCR in realistic reaction conditions is predicted to be less effective at generating grossly divergent sequences than the original model. The estimate of mutation rate per cycle by sampling sequences from an in vitro PCR experiment is correspondingly affected by the choice of model and parameters.

Error propagation in relative real-time reverse transcription polymerase chain reaction quantiﬁcation models: The balance between accuracy and precision.
Oddmund Nordgad, Jan Terje Kvaløy, Ragne Kristin Farmen, Reino Heikkila
Department of Hematology and Oncology, Stavanger University Hospital, 4068 Stavanger, Norway
Department of Mathematics and Natural Science, University of Stavanger, 4036 Stavanger, Norway
Division of Research and Human Resources, Stavanger University Hospital, 4068 Stavanger, Norway

Real-time reverse transcription polymerase chain reaction (RT-PCR) has gained wide popularity as a sensitive and reliable technique for mRNA quantiﬁcation. The development of new mathematical models for such quantiﬁcations has generally paid little attention to the aspect of error propagation. In this study we evaluate, both theoretically and experimentally, several recent models for relative realtime RT-PCR quantiﬁcation of mRNA with respect to random error accumulation. We present error propagation expressions for the most common quantiﬁcation models and discuss the inﬂuence of the various components on the total random error. Normalization against a calibrator sample to improve comparability between diﬀerent runs is shown to increase the overall random error in our system. On the other hand, normalization against multiple reference genes, introduced to improve accuracy, does not increase error propagation compared to normalization against a single reference gene. Finally, we present evidence that sample-speciﬁc ampliﬁcation eﬃciencies determined from individual ampliﬁcation curves primarily increase the random error of real-time RT-PCR quantiﬁcations and should be avoided. Our data emphasize that the gain of accuracy associated with new quantiﬁcation models should be validated against the corresponding loss of precision.

A quantitative model of error accumulation during PCR amplification.
E. Pienaar a, M. Theron b, M. Nelson c, H.J. Viljoen a,∗
a Department of Chemical Engineering, University of Nebraska, Lincoln, NE 68588, USA
b Department of Human Genetics, University of the Free State, NHLS, Bloemfontein 9330, South Africa
c Megabase Research Products, Lincoln, NE 68504, USA

The amplification of target DNA by the polymerase chain reaction (PCR) produces copies which may contain errors. Two sources of errors are associated with the PCR process: (1) editing errors that occur during DNA polymerase-catalyzed enzymatic copying and (2) errors due to DNA thermal damage. In this study a quantitative model of error frequencies is proposed and the role of reaction conditions is investigated. The errors which are ascribed to the polymerase depend on the efficiency of its editing function as well as the reaction conditions; specifically the temperature and the dNTP pool composition. Thermally induced errors stem mostly from three sources: A+G depurination, oxidative damage of guanine to 8-oxoG and cytosine deamination to uracil. The post-PCR modifications of sequences are primarily due to exposure of nucleic acids to elevated temperatures, especially if the DNA is in a single-stranded form. The proposed quantitative model predicts the accumulation of errors over the course of a PCR cycle. Thermal damage contributes significantly to the total errors; therefore consideration must be given to thermal management of the PCR process.

Real Time PCR: A useful new approach? Statistical Problems?
by Peter Avery

Standardisation of data from real-time quantitative PCR methods
– evaluation of outliers and comparison of calibration curves.
Malcolm J Burns*, Gavin J Nixon, Carole A Foy and Neil Harris
Address: Bio-Molecular Innovation, LGC Limited, Queens Road, Teddington, Middlesex, TW11 0LY, UK

Background: As real-time quantitative PCR (RT-QPCR) is increasingly being relied upon for the enforcement of legislation and regulations dependent upon the trace detection of DNA, focus has increased on the quality issues related to the technique. Recent work has focused on the identification of factors that contribute towards significant measurement uncertainty in the realtime quantitative PCR technique, through investigation of the experimental design and operating procedure. However, measurement uncertainty contributions made during the data analysis procedure have not been studied in detail. This paper presents two additional approaches for standardising data analysis through the novel application of statistical methods to RT-QPCR, in order to minimise potential uncertainty in results. Results: Experimental data was generated in order to develop the two aspects of data handling and analysis that can contribute towards measurement uncertainty in results. This paper describes preliminary aspects in standardising data through the application of statistical techniques to the area of RT-QPCR. The first aspect concerns the statistical identification and subsequent handling of outlying values arising from RT-QPCR, and discusses the implementation of ISO guidelines in relation to acceptance or rejection of outlying values. The second aspect relates to the development of an objective statistical test for the comparison of calibration curves. Conclusion: The preliminary statistical tests for outlying values and comparisons between calibration curves can be applied using basic functions found in standard spreadsheet software. These two aspects emphasise that the comparability of results arising from RT-QPCR needs further refinement and development at the data-handling phase. The implementation of standardised approaches to data analysis should further help minimise variation due to subjective judgements. The aspects described in this paper will help contribute towards the development of a set of best practice guidelines regarding standardising handling and interpretation of data arising from RT-QPCR experiments.

Accurate and statistically verified quantification of relative mRNA abundances using
SYBR Green I and real-time RT-PCR.
Julie H. Marinoa, Peyton Cook, Kenton S. Millera
aculty of Biological Sciences, The University of Tulsa, 600 S. College Avenue, Tulsa, OK 74104-3189, USA
Kervin Bovaird Center for Studies in Molecular Biology and Biotechnology, The University of Tulsa, Tulsa, OK 74104-3189, USA; Department of Mathematical and Computer Sciences, The University of Tulsa, Tulsa, OK 74104-3189, USA
Journal of Immunological Methods 283 (2003) 291– 306

Among the many methods currently available for quantifying mRNA transcript abundance, reverse transcription-polymerase
chain reaction (RT-PCR) has proved to be the most sensitive. Recently, several protocols for real-time relative RT-PCR using
the reporter dye SYBR Green I have appeared in the literature. In these methods, sample and control mRNA abundance is
quantified relative to an internal reference RNA whose abundance is known not to change under the differing experimental
conditions. We have developed new data analysis procedures for the two most promising of these methodologies and generated
data appropriate to assess both the accuracy and precision of the two protocols. We demonstrate that while both methods
produce results that are precise when 18S rRNA is used as an internal reference, only one of these methods produces
consistently accurate results. We have used this latter system to show that mRNA abundances can be accurately measured and strongly correlate with cell surface protein and carbohydrate expression as assessed by flow cytometry under different
conditions of B cell activation.

Customized Molecular Phenotyping by Quantitative Gene Expression and Pattern Recognition Analysis.
Shreeram Akilesh, Daniel J. Shaffer, and Derry Roopenian1
The Jackson Laboratory, Bar Harbor, Maine 04609, USA

Description of the molecular phenotypes of pathobiological processes in vivo is a pressing need in genomic biology.We have implemented a high-throughput real-time PCR strategy to establish quantitative expression profiles of a customized set of target genes.It enables rapid, reproducible data acquisition from limited quantities of RNA, permitting serial sampling of mouse blood during disease progression.We developed an easy to use statistical algorithm—Global Pattern Recognition—to readily identify genes whose expression has changed significantly from healthy baseline profiles.This approach provides unique molecular signatures for rheumatoid arthritis, systemic lupus erythematosus, and graft versus host disease, and can also be applied to defining the molecular phenotype of a variety of other normal and pathological processes.

Mathematical Model of Real-Time PCR Kinetics.
Jana L. Gevertz,1 Stanley M. Dunn,1 Charles M. Roth1,2
1Department of Biomedical Engineering, Rutgers University, Piscataway, New Jersey
2Department of Chemical & Biochemical Engineering, Rutgers University,
98 Brett Road, Piscataway, New Jersey 08854
Biotechnol Bioeng. 2005 Nov 5;92(3):346-55.

Abstract: Several real-time PCR (rtPCR) quantification techniques are currently used to determine the expression levels of individual genes from rtPCR data in the form of fluorescence intensities. In most of these quantification techniques, it is assumed that the efficiency of rtPCR is constant. Our analysis of rtPCR data shows, however, that even during the exponential phase of rtPCR, the efficiency of the reaction is not constant, but is instead a function of cycle number. In order to understand better the mechanisms belying this behavior, we have developed a mathematical model of the annealing and extension phases of the PCR process. Using the model, we can simulate the PCR process over a series of reaction cycles. The model thus allows us to predict the efficiency of rtPCR at any cycle number, given a set of initial conditions and parameter values, which can mostly be estimated from biophysical data. The model predicts a precipitous decrease in cycle efficiency when the product concentration reaches a sufficient level for template–template reannealing to compete with primer-template annealing; this behavior is consistent with available experimental data. The quantitative understanding of rtPCR provided by this model can allow us to develop more accurate methods to quantify gene expression levels from rtPCR data.

Evaluation of absolute quantitation by nonlinear regression in probe-based real-time PCR.
Rasmus Goll, Trine Olsen, Guanglin Cui and Jon Florholmen
Institute of Clinical Medicine, University of Tromso, Tromso, Norway
Department of gastroenterology, University hospital of Northern Norway, Tromso, Norway
BMC Bioinformatics 2006, 7:107 doi:10.1186/1471-2105-7-107

In real-time PCR data analysis, the cycle threshold (CT) method is currently the gold standard. This method is based on an assumption of equal PCR efficiency in all reactions, and precision may suffer if this condition is not met. Nonlinear regressionanalysis (NLR) or curve fitting has therefore been suggested as an alternative to the cycle threshold method for absolute quantitation. The advantages of NLR are that the individual sample efficiency is simulated by the model and that absolute quantitation is possible without a standard curve, releasing reaction wells for unknown samples. However, the calculation method has not been evaluated systematically and has not previously been applied to a TaqMan platform. Aim: To develop and evaluate an automated NLR algorithm capable of generating batch production regression analysis.Total RNA samples extracted from human gastric mucosa were reverse transcribed and analysed for TNFA, IL18 and ACTB by TaqMan real-time PCR. Fluorescence data were analysed by the regular CT method with a standard curve, and by NLR with a positive control for conversion of fluorescence intensity to copy number, and for this purpose an automated algorithm was written in SPSS syntax. Eleven separate regression models were tested, and the output data was subjected to Altman-Bland analysis. The Altman-Bland analysis showed that the best regression model yielded quantitative data with an intra-assay variation of 58% vs. 24% for the CT derived copy numbers, and with a mean inter-method deviation of x0.8. NLR can be automated for batch production analysis, but the CT method is more precise for absolute quantitation in the present setting. The observed inter-method deviation is an indication that assessment of the fluorescence conversion factor used in the regression method can be improved. However, the versatility depends on the level of precision required, and in some settings the increased cost effectiveness of NLR may justify the lower precision.