PCR/qPCR Data Analysis

A Technical Guide to PCR Technologies

  • PCR/qPCR Qualitative Analysis

qPCR Data Analysis

  • Deriving Accurate Cq Values

Setting the Threshold

Qpcr quantification strategies, standard curve quantification, relative/comparative quantification, normalization, reference gene selection, analysis of reference gene stability, alternative normalization methods, statistical analysis and data visualization, visualization techniques for univariate analysis, statistical tests, hierarchical clustering, principal component analysis, pcr/qpcr qualitative data analysis.

After a traditional PCR has been completed, the data are analyzed by resolution through an agarose gel or, more recently, through a capillary electrophoresis system. For some applications, a qPCR will be run with the end-point data used for analysis, such as for SNP genotyping. In each case, endpoint data provides a qualitative analysis after the PCR has reached plateau phase. In some cases, it may be possible to analyze end-point data to make a semi-quantitative analysis of the PCR yield, but quantitative measurements are more often made using qPCR and analysis of quantification cycle values (C q ) 1 values.

Throughout this guide, the factors that contribute to variations in the measurement of nucleic acid using PCR or qPCR have been highlighted. Each of these factors should be optimized to result in an assay that provides the closest possible value to the actual quantity of gene (target) in the reaction. The result of these processes is the generation of a set of C q values for each target in each sample. The process of deriving and analyzing those C q values to provide reliable data that represent the biological story is presented in this chapter.

Deriving Accurate C q Values

Baseline correction.

A C q value is determined for each target in each sample. Different analysis packages that are associated with different instruments, have alternative approaches for determining the C q (and also use alternative names, e.g., C t , C p , take off point). It is beyond the scope of this guide to delve into the fine details of all of these algorithms. However, qPCR measurements that are based on amplification curves are sensitive to background fluorescence. The background fluorescence may be caused by a range of factors, which include choice of plasticware, remaining probe fluorescence that is not quenched, light leaking into the sample well, and differences in the optical detection for a given microtiter plate well. In well-designed assays, the background is low when compared to the amplified signal. However, variation in background signal may hinder quantitative comparison of different samples. Therefore, it is important to correct for background fluorescence variations that cause differences in the baseline ( Figure 10.1 ).

The components of amplification plots

Figure 10.1 The components of amplification plots. This graph shows the increase of fluorescence with the number of cycles for different samples. The threshold is set above the detection limit but well below the plateau phase during which the amplification rate slows down.

A common approach is to use the fluorescence intensity during early cycles, such as between cycles 5 to15, to identify a constant and linear component of the background fluorescence. This is then defined as the background or baseline for the amplification plot. Due to transient effects, it is advisable to avoid the first few cycles (e.g., cycles 1 to 5) for baseline definition because these often show reaction stabilizing artefacts. The more cycles that are used for the baseline correction, the better the potential accuracy of the linear component of the baseline variations. Many instrument software packages allow manual setting of the cycles to be considered for baseline definition. These functions should be explored by the user and the temptation to accept default settings strongly resisted.

An example of the effect of baseline setting is shown in Figure 10.1 . As can be seen, C q values and the apparent shape of the amplification plot are affected by accurate baseline setting. In the example, the baseline for the curve labeled C3 has been incorrectly adjusted manually so that the baseline cycles calculated from the data in cycles 5 to cycle 31. This causes the curve to dip blow the zero baseline level ( Figure 10.2A ) with a C q of 28.80. To correct this, the raw data, R, are viewed and the last cycle of the linear background (the last cycle before amplification) is identified. In Figure 10.2B , this can be seen to be cycle 22. The baseline is correctly set to be zero between cycle 5 and cycle 22 ( Figure 10.2C ), and the amplification plot is then corrected ( Figure 10.2D ). The corrected C q is 26.12. Therefore, note that there was a substantial difference between the C q values with the incorrect and correct baselines settings, demonstrating that setting the correct baseline is an important component of data analysis.

Typical example of data dropping below the zero normalized fluorescence reading when the baseline setting is incorrect

Figure 10.2A–B. A) Typical example of data dropping below the zero normalized fluorescence reading when the baseline setting is incorrect (blue amplification plot). B) Raw data of the same amplification plots showing the limit of the linear baseline and that the data are not at fault.

The limits of the start and end of the baseline are defined using the appropriate software settings

Figure 10.2C–D. C) The limits of the start and end of the baseline are defined using the appropriate software settings. D) Application of the corrected baseline setting results in good quality data

Although some researchers advocate mapping individual amplification plot to estimate amplification efficiency and target quantities in measured samples 2,3,4 , the original and most common approach to deriving the C q is to use a threshold. The wide adoption of this approach is likely to be due to the threshold method being a simple and effective quantification method.

The principle behind the threshold method is that; in order to visualize the associated fluorescent signal from the qPCR amplification, the signal must increase so that it is above the detection limit of the instrument (and therefore, the baseline; Figure 10.1 ). The number of cycles required for this to occur is proportional to the initial starting copy number of the target in the sample. Hence, more cycles are required for the signal to increase above the baseline if the original copy number is low and fewer cycles if the copy number is high. Since the baseline is set at the limit of detection for the system, measurements at the baseline would be very inaccurate. Therefore, rather than measuring to the intensity of minimum fluorescence that the system can detect, a higher fluorescence is selected and an artificial threshold is introduced.

The selection of the threshold intensity requires adherence to some fundamental principles. It is important that the threshold is set at a fixed intensity for a given target and for all samples that are to be compared. If there are too many samples to fit on a single plate, then an inter-plate calibration scheme must be adopted, e.g., inclusion of a replicated control that serves as an inter-plate control or a standard curve serial dilution. In theory, the threshold can be set anywhere on the log-linear phase of the amplification curve. However, in practice, the log-linear phase of the amplification may be disturbed by the background fluorescence baseline drifting, the plateau phase, or differences in assay efficiency and therefore amplification plot gradient at higher cycles. It is recommended that the threshold is set as follows:

  • Sufficiently above the background fluorescence baseline to be confident of avoiding the amplification plot crossing the threshold prematurely due to background fluorescence.
  • In the log phase of the amplification plot where it is unaffected by the plateau phase (this is most easily seen by viewing the amplification plots on a log view, Figure 10.3A ).
  • At a position where the log phases of all amplification plots are parallel.

The process of threshold setting is demonstrated in Figure 10.3 . In Figure 10.3A , the amplification plots are viewed on a Y axis log scale, thus providing a visual expansion of the log phase of amplification and presenting this as a linear portion of the amplification plot. The threshold is set at the highest fluorescence intensity (refer to Y axis) that is within this log phase and where all amplification plots are parallel. The scale is then returned to the linear view ( Figure 10.3B ) showing the highest setting that fulfils the threshold setting requirements. Alternatively the threshold may be set at the lower end of this log phase ( Figures 10.3C and 10.3D ). As long as the log phase of the amplification plots are parallel, the ΔC q between samples is unaffected by the threshold setting.

The threshold setting influences the absolute Cq recorded and can influence ΔCq between samples.

Figure 10.3 The threshold setting influences the absolute Cq recorded and can influence ΔCq between samples. A). Using a log vs linear plot of the data, the threshold is set at the highest fluorescence intensity but where the amplification plots show parallel log phases. B). The threshold setting is maintained from A) and is displayed on the linear vs linear plot. C). Using a log vs linear plot of the data, the threshold is set at the lowest fluorescence intensity but where the amplification plots show parallel log phases. D). The threshold setting is maintained from C) and is displayed on the linear vs linear plot. In each case, the ΔCq values between samples are the same.

The requirement for a threshold setting at a position where the log-linear phases of the amplification plots are parallel becomes more pertinent when data at higher cycles are included in the analysis. The threshold setting procedure that was described for the data in Figure 10.3 was repeated on a data set of higher C q and the results presented in Figure 10.4 . The resulting C q data in Table 10.1 serve to illustrate the variability in the C q , and more importantly, the ΔC q values for three amplification plots with three threshold settings ( Figure 10.4 ). The ΔC q values and therefore the estimate of the relative quantity of target in each sample are highly dependent on the setting of the threshold ( Figure 10.4 ) because the amplification plots are not parallel.

The analysis that was performed and demonstrated

Figure 10.4. The analysis that was performed and demonstrated in Figure 10.3 was repeated using a different data set. In this case, the amplification plots are not parallel due to a difference in efficiency of the reaction at high Cq. The lowest settings for A) and B) result in different ΔCq values than the highest settings for C) and D) (Summarized in Table 10.1).

Accurate baseline and threshold setting is imperative for reliable quantification. After setting each of these, a C q value is generated and this is used as the basis for quantification. The quantity of target in a given sample is then determined using either a standard curve or relative/comparative quantification.

As the name implies, standard curve quantification requires the use of a standard curve to determine quantities of targets in test samples. All quantities determined for samples are, therefore, relative to the quantity assigned to the standard curve. This requires running additional, external standards alongside every set of sample reactions. The choice of material for the standard curve is important for eliminating potential differences in quantification due to differences between assay efficiencies in the samples and in the standards. The primer binding sites of the external standards must be the same as those in the target, contain sequences that are the same as the target, have similar complexity and be handled in as similar a manner as possible. Therefore, when measuring the concentration of a target in cDNA, it is preferable to measure the same cDNA in a serial dilution of a control sample. However, for some studies there are practical reasons that prevent this, so it is important to reproduce the sample conditions as closely as possible, e.g., by adding gDNA from a species unrelated to the test species, to an artificial oligonucleotide standard or linearized plasmid carrying the standard sequence. Once a suitable construct or amplicon is identified, a standard curve of serial dilutions is generated. The C q for the target is determined for each of the standards and plotted against the concentration or relative concentration/dilution factor on a log scale. This results in a standard curve that is then used to determine the concentrations of test samples by comparison of the C q values derived from amplification of the unknown samples. When using a standard curve for quantification, the threshold setting must be kept constant for determination of C q for the standard and for the samples on the same plate. The threshold can differ between plates.

Relative or comparative quantification uses the difference in C q as a determinant of the differences in concentration of the target sequence in different samples. Rather than measuring quantities of target per sample as with the standard curve method, this leads to sets of data showing fold changes between samples.

In the original form of this approach 5 , the efficiency of all of the assays was assumed to be 100%, leading to the assumption that a C q difference of 1 (ΔC q = 1) was as the result of a 2-fold difference in target. To determine a fold change in the target or gene of interest (GOI), the data must also be referenced to a loading control (reference gene, ref; see the following for a discussion regarding data normalization).

Construction of a Standard Curve.

Figure 10.5. Construction of a Standard Curve. The Cq recorded for each sample of a dilution series is plotted on a log linear scale against the relative concentration.

In Equation 1 , the ratio of the GOI, after correction to the ref gene, in 2 samples (A relative to B) is measured as: 2 (assuming 100% efficient reactions) raised to the power of the differences in the C q values for the GOI divided by 2 raised to the power of the differences in the C q values for the ref gene

Original (Livak) Relative Quantification Model.

Equation 1. Original (Livak) Relative Quantification Model.

However, as illustrated in Assay Optimization and Validation , the efficiencies of reactions vary considerably and this can have a large impact on data. Therefore, the assumptions in Equation 1 were addressed ( Equation 2 ) 6 , so that the differences in reaction efficiencies could be incorporated into the analyses. In this case, the amplification factor 2 is replaced by the actual efficiency of the PCR (as determined by a standard curve analysis; see Assay Optimization and Validation ).

Efficiency Adapted (Pfaffl) Relative Quantification Model

Equation 2. Efficiency Adapted (Pfaffl) Relative Quantification Model

As an example of using the efficiency adapted ( Equation 2 ) relative quantification model, a set of C q values are presented in Table 10.2 . The efficiency for the GOI is 1.8 and for the ref gene 1.94.

This is a very simple example of a study with the requirement to measure the fold difference between one gene in two samples and after normalization to a single reference gene. The ratio shows the fold change of the GOI in sample 2 relative to sample 1, after correction to the single Ref gene. However, it has become apparent that selection of a single, suitable reference gene is often impossible and, therefore, more sophisticated approaches for normalization have been suggested.

The major objective of most PCR-based experiments is to address the basic question of whether the target is present in the sample (unknown, UNK). At the very simplest level, this is answered by running a gel and examining the fragments for the presence or absence of the desired GOI. When the fragment is present, the confirmation of fragment size gives reassurance of a positive result. However, when absent, there is the potential of a false negative result. Therefore, it is critical to repeat the test assay and also perform at least one additional PCR to serve as a loading and positive PCR control. The universal, inhibition control assay, SPUD (see Sample Purification and Quality Assessment ), can be used to support confidence in a negative result. An alternative approach is to run an assay that is specific to a reference gene or genes. Traditionally, PCR assays detecting the reference genes, GAPDH, 18S ribosomal RNA, or β actin were run alongside those for the GOI and the resulting fragments visualized on a gel. GAPDH, 18S ribosomal RNA, and β actin are constitutively expressed and were therefore used as loading controls in semi-quantitative analyses. However, it soon became apparent that these genes are not ubiquitously expressed at the same concentration in all cells, regardless of experimental design. Therefore, the need arose for a stable reference when the objective was to measure relative nucleic acid concentrations, usually cDNA but also gDNA when, for example, examining the copy number variation of a gene.

Normalization is the process of correcting technical measurements to a stable reference in order to examine true biological variation. There are many methods for normalizing technical differences which means that the appropriate approach for the specific experiment must be selected and validated 7 . It is critical to recognize that adoption of inappropriate normalization techniques may be more detrimental to the overall analytical process than not normalizing at all 8 .

The Effect of Sample Quality On Assay Normalization

The effect of sample integrity and purity on target quantity measurements by qPCR and RT-qPCR was discussed at length ( Sample Purification and Quality Assessment , Sample Quality Control and Reverse Transcription , Reverse Transcription). It was demonstrated that inhibitors in the sample and RNA degradation have a differential effect on the measurement of a given target 9 . Inhibitors effect the measurement of any target but to a different degree, depending on the assay design. Degradation of total RNA effects the measurement of mRNA and miRNA 10 , again being highly dependent on the overall experimental design. Therefore, it is critical to consider the effect of template concentration on the RT reaction and the effect of the sample quality on data after normalization. Normalization will not counter the effect of low quality assays or samples (see Assay Optimization and Validation ).

Normalization Approaches

Ideally, normalization methods counteract variability that may be introduced during the multi-step process that is required to perform a qPCR analysis ( Figure 10.6 ). However, applying normalization at any one stage in the process may not control for technical error and/or bias that was, or will be, introduced at an earlier or later stage, respectively. Normalization methods are not mutually exclusive and so adopting a combination of controls is recommended 11 .

qPCR is a multistep process and each step must be controlled

Figure 10.6. qPCR is a multistep process and each step must be controlled. Normalization must be considered within a series of controls.

The objective of normalization is to provide a stable reference point against which the measurements can be referred; therefore, the choice of normalization factor must be a measurement which is stable throughout the experiment. This may be stable reference gene(s), or one of the alternatives, such as cell number, tissue mass, RNA/DNA concentration, an external spike 12 , or a representative measure of the global expressed genes.

Reference genes are targets whose quantity does not change as a result of the experiment. When quantifying DNA copy number variation in which the number of copies of the sequence of interest may change, the measurement is simply normalized by targeting an alternative genomic region that is known not to change. An example of how this may be applied is when measuring Human Epidermal Growth Factor Receptor 2 (HER-2) genomic amplification 13 . HER-2 genomic instability is a prognostic indicator in breast cancer and accurate measurement of HER-2 amplification status is important in patient management. HER-2 status can be measured by qPCR by comparing the copies of HER-2 with another genomic target that is acting as a control.

When measuring gene expression, reference genes are targets with mRNA concentrations that do not change as a result of the experiment. An example study would be one in which the effect on the expression of gene X is being measured after addition of a mitogenic compound to a cell monolayer. A reference point is required in order to measure the change in gene X. Therefore, another gene (or genes) that are known not to be affected by the mitogen in question is also measured. This provides the researcher with the immediate challenge of finding a mRNA target that is not affected by the experimental procedure, before being able to study the GOI. This process of validation of reference genes is fundamental for an accurate measurement of the GOI. The most widely used approach to normalization is to ignore this process and normalize the gene expression data to a single, unvalidated reference gene. This practice is not recommended and is in direct opposition to the MIQE guidelines 1 . The quantification of mRNA by RT-qPCR has routinely been compromised by the incorrect choice of reference genes. It is not acceptable to follow the relatively common practices of using a reference gene because the primers are in the freezer already, it was used historically on Northern blots, it is used by a colleague, or used in another laboratory for a different experiment. Reference genes need to be validated under specific experimental scenarios to be assured that the reference gene in question is not affected by the experiment. If this validation is not carried out and the reference gene is affected by the experiment, the results could be incorrect and subsequent interpretations are likely to result in meaningless data 8 .

There is a range of scientific literature describing different methods for normalization 7-14 as well as a plethora of publications describing the protocols required to identify the most appropriate normalizer genes for a given experimental scenario. While in the past, a key question was whether to select single or multiple reference genes, lower running costs means that current best practices have moved towards measuring multiple reference genes.

Selection of stable reference genes requires the analyst to evaluate the stability of qPCR for a number (usually 10 to 20 genes) of candidate mRNA targets 7 on a subset of samples that represent the test and control mRNAs. A full protocol is provided in Appendix A , Protocols, of this guide and may be used in combination with different analytical methods using programs such as REST 15 , GeNorm 14 , Bestkeeper 16 , or NormFinder 17 . This procedure is described in more detail in the following section, Analysis of Reference Gene Stability.

The reference gene is literally the pivot point for qPCR relative quantification assays. It is therefore critical for the reliability of the entire assay that the reference gene is stable. If the reference gene expression varies between samples, the variation will be directly transferred to the quantification results and the added variability may obscure the desired observable biological effect or, even worse, may create an entirely artificial appearance of a biological effect, one that is unrelated to the actual gene of interest. For these reasons, it is strongly recommended that several safety measures are followed to render reference gene variability insignificant and make measures of biological effects as significant as possible.

Arguably, the most important safety measure is to use not only one, but two or more, reference genes. The expression of several reference genes can be averaged to reduce technical variability due to normalization. This can be useful to improve significance in measurements of small biological effects. However, more importantly, two or more reference genes provide mutual controls for maintained stability and control for unexpected occurrences that may influence the expression levels of one of the reference genes. With a single reference gene, there is a risk that unexpected influences of gene expression may be undetected in the assay.

Another safety measure is to use more than one method of identifying stable reference genes. The following is an example to illustrate several aspects of reference gene normalization, including a possible advantage of using both geNorm and NormFinder methods on the same data set.

Table 10.3 holds a list of reference gene candidates that were evaluated during a workshop we previously conducted with EMBL. Samples were collected from a human cell culture in two different treatment groups. This data set will be used to demonstrate aspects of reference gene validation.

Both the NormFinder and geNorm algorithms have been developed with the assumption that testing a multitude of reference gene candidates can be used to rank the stability of individual reference gene candidates. The assumption may be true if, for example, all reference gene candidates vary stochastically around stable expression levels. However, this may not necessarily be true in reality. To avoid misleading results, it is therefore prudent to avoid regulated and in particular co-regulated reference gene candidates.

The list of reference gene candidates shown in Table 10.3 was specifically chosen to select genes that belong to different functional classes, reducing the chance that the genes may be co-regulated. A notable exception is GAPDH, which is present here in two versions. Although this does not affect this analysis, it is best practice is to avoid multiple entries of genes that may be suspected of being co-regulated.

The first algorithm to be demonstrated is geNorm. This provides an evaluation of gene stabilities by calculating a gene stability measure called the M-value, which is based on pairwise comparisons between the analyzed reference gene candidate and all other reference gene candidates in the data set. It is performed in an iterative fashion, meaning that in this example, the procedure is first performed on all 15 reference gene candidates, the least stable is removed, the process is repeated on the remaining 14, the second least stable candidate is removed, and so on until two reference genes remain.

There may be times when identification of the most stable reference gene may be particularly challenging. One case may be when all reference gene candidates perform poorly. Another case may be if all reference gene candidates perform well. To distinguish between these two cases, a useful guideline is that reference genes with an M-value below 0.5 may be considered stably expressed.

The second algorithm to be demonstrated is NormFinder, which is a freely available reference gene analysis package (Appendix B, Additional Resources). The underlying algorithm takes a ANOVA-like approach to reference gene stability evaluation in that the whole and subgroups are analyzed for variations. One advantage of this is that the obtained measures are directly related to gene expression levels. A standard deviation of 0.20 in C q units therefore represents about 15% variation in copy number expression levels of the particular reference gene candidate.

For convenience, in this demonstration, both of these analysis packages are accessed using GenEx (MultiD) data analysis software, but they are also available as independent packages (Appendix B, Additional Resources).

The bar diagrams shown in Figure 10.7 illustrate reference genes ranked according to their respective stability measures using both algorithms. In addition, a graph showing the accumulated standard deviation from NormFinder indicates that a combination of up to the three best reference genes may yield stability improvements.

Bar diagrams showing stability measures

Figure 10.7. Bar diagrams showing stability measures: M-values for geNorm and standard deviations for NormFinder. In addition, a graph showing the accumulated standard deviation from NormFinder indicates that a combination of up to the three best reference genes may yield stability improvements. The data set was generated from assays designed for the reference gene candidates shown in Table 10.3 and measured on a human cell culture in two different treatment groups. Notice that, in this instance, the reference gene stability algorithms geNorm and NormFinder do not agree about the best reference genes.

Mean centered expression profile of the reference gene candidates of the two samples in each treatment group.

Figure 10.8. Mean centered expression profile of the reference gene candidates of the two samples in each treatment group. Samples 1 and 2 belong to the first treatment group and samples 3 and 4 belong to the second treatment group. Expression profiles of SDHA and CANX are indicated in red. Expression profile of UBC is indicated in yellow. The table lists the measured Cq values in the data set.

Due to the deviating expression profiles, it is possible that SDHA and CANX are regulated by the different treatment alternatives and therefore, are not suitable as reference genes. Removing these from the data set and repeating the analysis results in agreement between both algorithms and that the best choice of reference genes is EIF4A2 and ATP53 ( Figure 10.9 ). In the NormFinder calculation of accumulated standard deviations, it is also apparent that the addition of more reference genes does not improve stability.

Inspection of the expression profiles and measured Cq values

Figure 10.9. Inspection of the expression profiles and measured Cq values (Figure 10.8) raised concern that SDHA and CANX may be co-regulated in the applied assay. The co-regulation may disrupt reference gene stability algorithms. Bar diagrams showing stability measures: A) M-values for geNorm and B) standard deviations for NormFinder. The data set is the same as the one used in Figure 10.8 except that the data for SDHA and CANX have been removed. Notice that with this reduced data set the reference gene stability algorithms geNorm and NormFinder do agree about the best reference genes.

The analysis of data in this example serves to illustrate that using geNorm and NormFinder in parallel allows for identification of co-regulated reference gene candidates and that removing these genes from further studies provides a final identification of reference genes that can be adopted with more confidence than after using a single analysis. Identification and selection of stable reference genes leads to greater security of data analysis.

While normalization to reference genes is the most common method for assay normalization, there are situations where this approach is not suitable, such as when a large number of genes in a heterogeneous group of samples is to be compared, or when profiling miRNA. In these scenarios it is necessary to adopt an alternative strategy.

Normalization to Tissue Mass or Cell Number

Measurement of cell number or tissue mass to use as a normalization factor is not as simple as it may first appear. Cell culture experiments are relatively easy to normalize based on cell count. However, addition of a treatment might impact cell morphology, complicating the ratio of cell number to total RNA/genes expressed when compared with a control culture. The experimental treatment may result in the production of extra cellular matrix causing differences in nucleic acid extraction efficiencies.

Biological tissues can be highly heterogeneous within and between subjects, with more variation being apparent when healthy tissue is compared with diseased tissue. Even apparently less complex tissues, such as blood, can differ considerably in cell count and composition such that gene expression varies considerably between apparently healthy donors 18 .

Any delays in the processes used to purify nucleic acid will result in alterations in the measured RNA. For example, delays in processing peripheral blood mononuclear cells and extracting RNA from cells, results in considerable changes in gene expression 19 . The methods underlying the extraction procedures are also major sources of technical variation. Even the isolation process selected for sampling blood derived cells and RNA purification result in differences in apparent gene expression profiles 20 . Therefore, the first normalization consideration is to ensure that collection and processing is absolutely identical for all samples. It is then critical to perform sufficient quality control to be certain of the sample concentration, integrity, and purity ( Sample Purification and Quality Assessment and associated protocols in Appendix A ).

Normalization to RNA Concentration

As a minimum, an estimation of template concentration (DNA for qPCR or RNA for RT-qPCR) is important and, as mentioned in Sample Purification and Quality Assessment , it is critical to ensure that the same instrument is used for all measurements because the determination of nucleic acid concentration is also variable and technique dependent.

When measuring total RNA concentration, the vast majority of the sample is composed of rRNA, with only a small fraction consisting of the mRNA of interest when examining gene expression, or the sncRNA when examining gene expression regulation. This means that if the rRNA concentration increases a small amount but the mRNA remains constant, the total RNA concentration will increase. The mRNA concentration must increase a significant amount to cause an apparent increase in the total RNA concentration. Hence, rRNA concentration is an unreliable measure of the mRNA concentration, but for many protocols, equal RNA concentration is required to ensure accurate reverse transcription (see Reverse Transcription ).

Normalization to Global Gene Expression

When measuring large numbers of targets, the analyst can estimate the global mean of the total gene expression and identify regulated RNA sequences that deviate from this mean. This approach is conventionally used for normalization of gene expression arrays. It is a valuable alternative to using reference genes and may be preferable where many targets are being measured.

Another recently explored approach is the measurement of endogenously expressed repeat elements (ERE) that are present within many of the mRNAs. Many species contain these repeat elements (ALU in primates, B elements in mice), which can provide an estimation of the mRNA fraction. Measurement of these target sequences has been shown to perform as conventional normalizing systems 9 (Le Bert, et al., in preparation) and may offer a universal solution or an alternative for complex experiments where stable reference gene combinations are unavailable.

Normalization of miRNA Data

As yet there have been no reports of a miRNA universal reference gene. Therefore, the selection of normalization system is still rather empirical. When possible, stable invariant miRNAs may be identified from genome-wide approaches, i.e., microarrays. Small nucleolar RNAs (snoRNAs) have also been used as reference genes. Global gene expression is also a useful method of normalizing miRNA expression when a stable reference is unknown and several hundred targets have been analyzed 21,22,23 . This method is more appropriate for those using approaches resulting in capture of all miRNAs as cDNA in a multiplexed form, e.g., Exiqon and miQPCR systems (refer to Castoldi et al. in PCR Technologies, Current Innovations 24 ).

Biological and Technical Replicates

The purpose of normalization is to avoid systematic errors and to reduce data variability for the eventual statistical analysis. Another important aspect of setting up data for statistical analysis is the use of data replicates.

Biological replicates are absolutely necessary for statistical analysis. Statistical significance levels are often set at a 5% significance cut-off. For biological effects close to such a significance level, it may be necessary to have at least 20 biological replicates to determine the assays significance level (1:20 corresponding to 5%). In fact, it has been suggested that at least 50 times the number of observations are required to be recorded for an accurate estimate of significance 25 , i.e., on the order of a thousand biological samples. Naturally, practical limitations seldom allow for biological replicates at these levels. Furthermore, accurate estimates of the number of necessary biological replicates to meet a given significance level also depend on the level of variability of the data. Nevertheless, it is important to realize that a common mistake is to underestimate the necessary number of biological replicates to be able to arrive at reliable conclusions. It is recommended to perform an initial pilot study to evaluate the assay’s inherent variability and the potential size of the observable biological effect in order to have a good basis to estimate the necessary number of biological replicates 26 .

Technical replicates are not used directly for the statistical analysis. Instead, technical replicates are used to backup samples (in case some samples are lost in the technical handling process) and to improve assessment of data accuracy. Technical replicates can improve data accuracy if the assumption holds true that they vary stochastically around the accurate measurement at each stage of the technical handling process. The average of the technical replicates is closer to the accurate measurement. The effect of averaging technical replicates can be illustrated by noting the size of the confidence interval in a simulated data set with a predetermined variability, i.e., standard deviation set at one. As seen in Table 10.4 , the confidence interval becomes smaller with an increasing number of technical replicates (samples), indicating a more precise estimate of the accurate measurement. Furthermore, the narrowing of the confidence interval is most dramatic at the low number of technical replicates. Increasing the replicate number from 2–3 decreases the confidence interval from 8.99–2.48, i.e., a more than 3-fold improvement of the precision in the estimate of the accurate measurement. While additional replicates continue to improve the estimate of the accuracy of the measurement, the effect is at a decreasing magnitude. Therefore, it is apparent that in cases where technical handling variability is an issue, it may be a great advantage to use triplicates rather than duplicates.

Technical replicates can be collected at several stages throughout the sample handling process, including RNA extraction, reverse transcription and qPCR detection. If technical replicates are detected at several stages, a nested experimental design is generated. A pilot study that takes advantage of a nested experimental design may help to identify sample handling stages that contribute the most to technical handling errors and an optimal sampling plan can be calculated based on this information 27 .

Scientific analysis of biological data centers on the formulation and testing of hypotheses. The formulation of a hypothesis requires a detailed understanding of the conditions and variables of the assay. Successful testing of a hypothesis involves careful execution and an appropriate experimental design to maximize the desired observable signal while minimizing technical variability. In this context, it is useful to distinguish between exploratory and confirmatory studies ( Figure 10.10 ).

Flowchart illustrating operations involved in exploratory and confirmatory statistical analyses.

Figure 10.10. Flowchart illustrating operations involved in exploratory and confirmatory statistical analyses. The left-hand side of the figure, before the dashed arrow, shows operations in an exploratory statistical study. The right-hand side of the figure, after the dashed arrow, shows operations in a confirmatory statistical study.

The purpose of the exploratory study is to analyze data with one or several different techniques in order to substantiate a hypothesis. The data set may be redefined and/or different analysis techniques may be employed repeatedly in order to support one or several hypotheses. The exploratory study is thus very flexible to the specifics of any scientific question. However, the repeated probing of hypotheses testing on one data set may lead to issues that undermine statistical conclusions. This is due to multiple testing, which refers to the fact that a statistical test with several independent hypotheses is more likely to yield a positive significance and that the chances of this increases as additional hypotheses are tested, even if the underlying probability distributions are identical. To avoid misleading statistical results, the exploratory study is therefore often combined with a confirmatory study.

The requirements for a confirmatory study are based on much stricter statistical criteria. First, the hypothesis of study, including criteria for significance, needs to be defined before the collection of data and before the analysis. In addition, the data set for analysis needs to have been collected exclusively for this purpose. It is statistically incorrect to reuse the data set from the exploratory study in the confirmatory study since that data set would inherently favor the proposed hypothesis. The end result of the confirmatory study is a rejected or accepted hypothesis according to the pre-stated criteria.

For statistical testing, the likelihood that an observed phenomenon occurred by random chance is analyzed. This is called the Null hypothesis 28 . If the observed phenomenon is rare according to the Null hypothesis, the conclusion is that it is unlikely that the Null hypothesis is valid. The Null hypothesis is rejected and the likelihood of the alternative hypothesis as significant is accepted.

The estimated likelihood that the observed phenomenon occurred by random chance is called the p -value. The p -value is measured in a range from 0 to 1, or equivalently, in percentage units. The statistical criteria for a confirmatory study include an alpha cut-off under which calculated p -values would indicate significance for the observed phenomenon. An alpha cut-off of 5% is commonly used, although this must be adjusted to fit desired and necessary criteria that are specific to the subject of study.

Many algorithms have been developed for calculating p -values under various assumptions and for different purposes. A common algorithm is the Student’s t-test. The Student’s t-test is used to calculate a p -value based on the difference in the mean values between two groups of data. The main assumption of Student’s t-test is that the two groups of data are independent and conform to normal distributions. An advantage of the Student’s t-test is that it is powerful, compared to nonparametric statistical tests 29 . A non-parametric test that is equivalent to the Student’s t-test may be one of the most well-known non-parametric statistical tests; the Wilcoxon rank-sum test (sometimes called Mann-Whitney U test; not to be confused with Wilcoxon signed-rank test which is used to compare two paired groups). Non‑parametric statistical tests, such as the Wilcoxon ranksum test, have an advantage over parametric statistical tests, such as the Student’s t-test, in that they do not depend on prior assumptions of the data set distributions. A Kolmogorov- Smirnov’s test for normal distribution may be used to decide whether to apply the Student’s t-test or one of the nonparametric tests

In addition to the choice of algorithm for p -value calculation, data sets that are fed into the p -value calculation algorithm may be manipulated to facilitate observation of desired properties in the data set. The combination of raw data manipulation steps and choice of p -value calculation algorithm is part of building a hypothesis model.

There is a high level of freedom in building hypothesis models in the exploratory phase of a statistical analysis and this is an important part of scientific inquiry. However, a hypothesis is never proven using a scientific, statistical approach. A correct scientific approach is to formulate a Null hypothesis, use an independent (preferably a newly collected) data set, and accept or reject the Null hypothesis according to the confirmatory study flowchart ( Figure 10.10 ).

Just as there are many analysis methods available, there are also many data visualization techniques from which to choose. For univariate data analysis, a simple bar diagram with associated error bars is an appropriate visualization technique. Even though this is a common and simple visualization technique, there are issues that are worth emphasizing. First, error bars may illustrate different sources of variability; the inherent variability of the data (the standard deviation, SD) or the precision by which the mean value has been determined. Secondly, the precision by which the mean value has been determined can be illustrated in different ways, but it ultimately depends on a combination of the inherent variability of the data together with the number of samples (N) and in its raw form, it is called the standard error of the mean (SEM, Equation 1 ):

SEM

Equation 1. SEM

However, the SEM is not a very intuitive measure and it is not straight forward to compare SEMs from different experiments in a meaningful way. A more popular way of illustrating the precision of the estimated mean and indicating statistical significance in a graphical way, is the confidence interval (CI, Equation 2 ):

Cl

Equation 10-2. Cl

The presence of the SEM can be recognized in the equation for the confidence interval as the ratio between the standard deviation (SD) and the square root of the number of samples (N) and thus it is evident that the confidence interval is based upon the SEM. The lower limit of the confidence interval is constructed by subtracting the SEM multiplied by a percentile of a t-distribution from the mean. The upper limit of the confidence interval is constructed by adding the SEM multiplied by a percentile of a t-distribution from the mean. The confidence level of the confidence interval is set by the confidence level associated with the critical value t*; typically a 95% confidence level.

Figure 10.11 shows a bar graph with error bars denoting the 95% confidence interval within each experimental group, highlighting the uncertainty associated with the mean estimate for an example gene expression in samples from different organs after treatment with several drug doses. In addition, the t-test statistical significance p -values are shown for the difference in gene expression between the control samples and each of the three different samples from different drug dose responses, indicated by means of an asterisk notation. It is customary to have one asterisk correspond to a p -value below 0.05, two asterisks correspond to a p -value below 0.01 and three asterisks correspond to a p -value below 0.001.

Fold change (log2) expression of a gene of interest relative to a pair of reference genes

Figure 10.11. Fold change (log2) expression of a gene of interest relative to a pair of reference genes, relative to the expression in the sample with lowest expression within each organ type. Bar heights indicate mean expression of the gene in several samples in groups of non-treated (Dose 0) samples or samples treated at one of three different drug doses (Dose 1, Dose 2, and Dose 3). Error bars indicate 95% confidence interval estimates of the mean expressions. One asterisk indicates statistically significant difference between the means of a treated sample set compared to the mean of the non-treated sample set to 5%; two asterisks indicate statistically significant difference to 1%; three asterisks indicate statistically significant difference to 0.1%.

Given that the asterisk notation hides the absolute value of p , it is often encouraged to include a table with the absolute values of p , as shown in the example in Table 10.5 . One reason behind this is that a p -value of for example 0.032 is only slightly more “significant” than a p -value of 0.055. Borderline cases like this can lead to some confusion when deciding precisely what cut-off to use when classifying data as significant. In realistic cases, a p -value of 0.051 could be just as significant as a p -value of 0.049, yet a strict (although fundamentally arbitrary) cut-off of 0.05 would classify one as significant and the other not.

However, there is a variant of the bar diagram visualization that takes advantage of the confidence interval of the difference between means to avoid many, if not all, of the disadvantages of traditional bar diagrams 24 . With the confidence interval of the difference between means, it is possible to estimate directly the statistical significance with associated error bars while at the same time highlight biological effect size and data variability. Figure 10.12 shows the variant with the confidence interval of the difference between means of the data used in Figure 10.11 . Notice that confidence intervals that do not encompass the zero difference between means correspond to significant results at the confidence level corresponding to the p -value cut-off (5% in Figure 10.11 and Table 10.5 ).

Bar diagram showing the difference between means of the nontreated sample set

Figure 10.12. Bar diagram showing the difference between means of the nontreated sample set (Dose 0) and one of the treated sample sets (Dose 1, Dose 2 or Dose 3) in the data set from Figure 10.11. Error bars show the confidence interval of the difference between means. Error bars that do not cross the x-axis indicate the corresponding means comparison is statistically significant to 5% in a t-test. PCR Technology, Current Innovations-3rd ed. by Taylor and Francis Group LLC Books. Reproduced with permission of Taylor and Francis Group LLC Books in the format reuse in a book/e-book via Copyright Clearance Center.

Multivariate data are data collected on several variables for each sampling unit. The data used in Figures 10.11 and 10.12 are multivariate in that they depend on variables such as dose and organ type. However, the statistical analyses in Figures 10.11 and 10.12 are nevertheless univariate in that each representation (bar) only illustrates one variable, gene expression, relative to fixed measures of the other variables. For multivariate data analysis techniques, hierarchical clustering and principal component analysis are good options for data representation.

One of the easiest and useful methods to characterize data is by plotting the data in a scatterplot (for example plotting measured C q values of one gene against the corresponding C q values of another gene for a set of biological samples in a 2D plot). Plots in one or two dimensions are conveniently visualized by human eyes. Plots in three dimensions may also be possible with appropriate tools, but higher dimensional plots are significantly harder to visualize. However, for exploratory studies, the data set is inherently multidimensional and scatterplots of whole data sets may thus become impractical. From a qPCR data set, there may be, for example, several genes and/or several types of biological samples represented.

A popular, alternative way of characterizing and visualizing data from exploratory studies is to analyze measures of distances between data points in the scatterplot. Different distance measures exist, including Euclidean, Manhattan and Pearson correlations. With computational power, it is straightforward to calculate distances, even for multidimensional data of much higher dimensionality than three dimensions. For agglomerative hierarchical clustering, the following iterative process is performed: 1) Find the two closest objects and merge them into a cluster; 2) Define the new cluster as a new object through a clustering method; 3) Repeat from 1) until all objects have been combined into clusters 30 . Alternatives for clustering methods include Ward’s method, Single linkage and Average linkage 31 . A dendrogram is often used to visualize results from hierarchical clustering.

Interpretation of hierarchical clustering dendrograms of qPCR data often results in conclusions about gene expression profile similarities. In an exploratory study, these similarities may then be used to formulate hypotheses about gene expression coregulation, which may be accepted or rejected in subsequent confirmatory studies. The advantages of hierarchical clustering dendrograms include the clarity by which similarity relationships are visualized. On the other hand, the strong emphasis on similarity measures may be perceived as limiting with respect to formulating hypotheses, since similar expression profiles may be redundant attributes in hypotheses. It may be of higher value to identify sets of expression profiles that complement each other in a specific combination, to answer the desired hypothesis.

Another popular, alternative way to characterize and visualize data from exploratory studies is to take advantage of the information contained in the whole, multidimensional data set, select desired properties and project it to a lower dimensional scatterplot, such as a 2D or 3D plot. This can be achieved using principal components analysis (PCA) 32,33,34, 35 . Here, the original coordinate system of the data set (i.e., the expression profiles measured by qPCR) is transformed onto a new multidimensional space where new variables (principal components: PC or factors) are constructed. Each PC is a linear combination of the subjects in the original data set. By mathematical definition, the PC’s are extracted in successive order of importance. This means that the first PC explains most of the information (variance) present in the data, the second less and so forth. Therefore, the first two or three PC coordinates (termed scores) can be used to obtain a projection of the whole data set onto a conveniently small dimension, suitable for visualization in a 2D or 3D plot. By using the first two or three PCs for representation, the projection that accounts for the most variability in the data set is obtained. Variance from experimental design conditions is expected to be systematic, while confounding variance is expected to be random, so this representation may be desired under appropriate conditions.

As previously noted for hierarchical clustering, the interpretation of qPCR PCA often results in conclusions about gene expression profile similarities. Although PCA and hierarchical clustering may yield complementary insights into gene expression co-regulation patterns, both techniques focus on gene expression profile similarities. This places limitations on the types of hypotheses that can be found in exploratory studies using these techniques alone. To expand on the reach of generated hypotheses in exploratory studies, a hypothesisdriven approach to multivariate analysis was recently proposed 24 . Hypothesis-driven, custom-designed algorithms may identify biologically relevant hypotheses that may otherwise be missed by commonly used techniques for multivariate data analysis.

Network error: Failed to fetch

  • PCR/qPCR/dPCR Assay Design
  • PCR-based Assay Regulations and Validation
  • Mgat4 May Play a Role in Increased Sialylation by Overexpressing Functional MGAT1 in Mgat1-Disrupted Chinese Hamster Ovary (CHO) Cells
  • Complete Solutions for PCR Assay Development
  • PCR Assay Optimization and Validation
  • DNA Oligonucleotide Synthesis
  • Locked Nucleic Acid
  • Technical Guide to PCR Technologies

To continue reading please sign in or create an account.

Research. Development. Production.

We are a leading supplier to the global Life Science industry with solutions and services for research, biotechnology development and production, and pharmaceutical drug therapy development and production.

Vibrant M

© 2024 Merck KGaA, Darmstadt, Germany and/or its affiliates. All Rights Reserved.

Reproduction of any materials from the site is strictly forbidden without permission.

  • English - EN
  • Español - ES

Analysis of real-time qPCR data

Mahmoud ahmed, getting started, analysis methods, quality assessment of qpcr, testing statistical significance, assess pcr_assess, analyze pcr_analyze, test pcr_test, comparison with existing pacakges, testing advanced desgins and hypotheses, varying rna quality among samples, combining data from multiple runs, bug reporting, code contributions.

Quantitative real-time PCR is an important technique in medical and bio-medical applications. The pcr package provides a unified interface for quality assessing, analyzing and testing qPCR data for statistical significance. The aim of this document is to describe the different methods and modes used to relatively quantify gene expression of qPCR and their implementation in the pcr package.

The pcr is available on CRAN. To install it, use:

The development version of the package can be obtained through:

The following chunk of code locates a dataset of \(C_T\) values of two genes from 12 different samples and performs a quick analysis to obtain the expression of a target gene c-myc normalized by a control GAPDH in the Kidney samples relative to the brain samples. pcr_analyze provides different methods, the default one that is used here is ‘delta_delta_ct’ applies the popular ( \(\Delta\Delta C_T\) ) method.

The output of pcr_analyze is explained in the documentation of the function ?pcr_analyze and the method it calls ?pcr_ddct as well as in a later scion of this document. Briefly, the input includes the \(C_T\) value of c-myc normalized to the control GAPDH, The calibrated value of c-myc in the kidney relative to the brain samples and the final relative_expression of c-myc. In addition, an error term and a lower and upper intervals are provided.

The previous analysis makes a few assumptions that will be explained later in this document. One of which is a perfect amplification efficiency of the PCR reaction. To assess the validity of this assumption, pcr_assess provides a method called efficiency . The input data.frame is the \(C_T\) values of c-myc and GAPDH at different input amounts/dilutions.

In the case of using the \(\Delta\Delta C_T\) , the assumption of the amplification efficiency is critical for the reliability of the model. In particular, the slope and the R^2 of the line between the log input amount and \(\Delta C_T\) or the difference between the \(C_T\) value of the target c-myc and GAPDH. Typically, The slope should be very small (less than 0.01). slope here is appropriate 0.0264562, A value of the amplification efficiency itself is given by \(10 ^ {-1/slope}\) , so the assumption holds true.

  • Amplification efficiency : The ability of the reaction to amplify a certain amount of input RNA in a sample
  • \(C_T\) : Cycle Threshold is the number of cycles required for the fluorescent signal to cross the threshold
  • \(\Delta C_T\) : Difference between two \(C_T\) values (e.g. \(C_{T, c-myc} - C_{T, GAPDH}\) )
  • \(\Delta \Delta C_T\) : Difference between two \(\Delta C_T\) values (e.g. \(\Delta C_{T, Treatment} - \Delta C_{T, Treatment}\) )
  • Reference gene : A gene known not to change its expression between the groups of interest, so any change in its signal should be due to the amplification of the PCR. Used for normalization . (e.g. GAPDH or \(\beta\) -actin)
  • Reference group : An experimental group used to express mRNA level in comparison to. Used for caliberation . (e.g. control or time point 0)
  • Standard : A sample of known concentration

In contrast with the absolute quantification of the amount of mRNA in a sample, relative quantification uses a internal control (reference gene) and/or a control group (reference group) to quantify the mRNA of interest relative to these references. This relative quantification is sufficient to draw conclusions in most of the bio-medical applications involving qPCR. A few methods were developed to perform these relative quantification. These methods require different assumptions and models. The most common two of these methods are explained here.

Comparative \(C_T\) methods

The comparative \(C_T\) methods assume that the cDNA templates of the gene/s of interest as well as the control/reference gene have similar amplification efficiency. And that this amplification efficiency is near perfect. Meaning, at a certain threshold during the linear portion of the PCR reaction, the amount of the gene of the interest and the control double each cycle. Another assumptions is that, the expression difference between two genes or two samples can be captured by subtracting one (gene or sample of interest) from another (reference). This final assumption requires also that these references doesn’t change with the treatment or the course in question. The formal derivation of the double delta \(C_T\) model is described here (Livak and Schmittgen 2001) . Briefly, The \(\Delta\Delta C_T\) is given by:

\[ \Delta\Delta C_T = \Delta C_{T,q} - \Delta C_{T,cb} \]

And the relative expression by:

\[ 2^{-\Delta\Delta C_T} \]

  • \(\Delta C_{T,q}\) is the difference in the \(C_T\) (or their ) of a gene of interest and a reference gene in a group of interest
  • \(\Delta C_{T,cb}\) is the the difference in the \(C_T\) (or their ) of a gene of interest and a reference gene in a reference group

And the error term is given by:

\[ s = \sqrt{s_1^2 + s_2^2} \]

  • \(s_1\) is the of a gene of interest
  • \(s_2\) is the of a reference gene

Standard curve methods

In comparison, this model doesn’t assume perfect amplification but rather actively use the amplification in calculating the relative expression. So when the amplification efficiency of all genes are 100% both methods should give similar results. The standard curve method is applied using two steps. First, serial dilutions of the mRNA from the samples of interest are used as input to the PCR reaction. The linear trend of the log input amount and the resulting \(C_T\) values for each gene are used to calculate an intercept and a slope. Secondly, these intercepts and slopes are used to calculate the amounts of mRNA of the genes of interest and the control/reference in the samples of interest and the control sample/reference. These amounts are finally used to calculate the relative expression in a manner similar to the later method, just using division instead of subtraction. The formal derivation of the model is described here (Yuan et al. 2006) . Briefly, The amount of RNA in a sample is given by:

\[ \log amount = \dfrac{C_T - b}{m} \]

And the relative expression is given by:

\[ 10^{\log amount} \]

\(C_T\) is the cycle threshold of a gene

\(b\) is the of \(C_T\) ~ log10 input amount

\(m\) is the of \(C_T\) ~ log10 input amount

\[ s = (cv)(\bar X) \]

\[ cv = \sqrt{cv_1^2 + cv_2^2} \]

\(s\) is the

\(\bar X\) is the

\(cv\) is the or relative standard deviation

Fortunately, regardless of the method used in the analysis of qPCR data, The quality assessment are done in a similar way. It requires an experiment similar to that of calculating the standard curve. Serial dilutions of the genes of interest and controls are used as input to the reaction and different calculations are made.

  • The amplification efficiency is approximated be the linear trend between the difference between the \(C_T\) value of a gene of interest and a control/reference ( \(\Delta C_T\) ) and the log input amount. This piece of information is required when using the \(\Delta\Delta C_T\) model. Typically, the slope of the curve should be very small and the \(R^2\) value should be very close to one. A value of the amplification efficiency itself is given by \(10 ^ {-1/slope}\) and should be close to 2. Other analysis methods are recommended when this is not the case.
  • Similar curves are required for each gene using the \(C_T\) value instead of the difference for applying the standard curve method. In this case, a separate slope and intercept are required for the calculation of the relative expression.

Using the later two methods and there assumptions, useful statistics such as p-values and confidence intervals can be calculated.

Two-group tests

Assuming that the assumptions of the first methods are holding true, the simple t-test can be used to test the significance of the difference between two conditions ( \(\Delta C_T\) ). t-test assumes in addition, that the input \(C_T\) values are normally distributed and the variance between conditions are comparable. Wilcoxon test can be used when sample size is small and those two last assumptions are hard to achieve.

Linear regression

Two use the linear regression here. A null hypothesis is formulated as following,

\[ C_{T, target, treatment} - C_{T, control, treatment} = C_{T, target, control} - C_{T, control, control} \quad \textrm{or} \quad \Delta\Delta C_T \]

This is exactly the ( \(\Delta\Delta C_T\) ) as explained earlier. So the \(\Delta\Delta C_T\) is estimated and the null is rejected when \(\Delta\Delta C_T \ne 0\) .

The pcr package

To illustrate the use of the pcr package in applying these methods on qPCR data, we use real qPCR real qPCR datasets from two published papers. In addition, we compare the results obtained by the pcr package to that of the original paper to ensure the reliability. First, Livak et al. (Livak and Schmittgen 2001) obtained total RNA from human tissues; brain and kidney. c-myc and GAPDH primers were then used for cDNA synthesis and used as input in the PCR reaction. 6 replicates for each tissue were run in separate sample. This dataset is referred to as ct1 through this document and is shown along with the difference calculations in Table 1 and 2. Another dataset was generated from separate assay. Only running the samples in the same tube this time with pairs of primates that has different reporting dyes. This is referred to as ct2 and is shown in Table 3 and 4. Finally, \(C_T\) values from a qPCR experiment using different input amounts of c-myc and GAPDH was conducted. The dataset is referred to as ct3 and is shown in Table 5. Secondly, Yuan et al. (Yuan et al. 2006) extracted total RNA from Arabidopsis thaliana plant treated and control samples, 24 samples each. And performed a qPCR to using MT7 and ubiquitin primers. This dataset is referred to as ct4 and Table 6 shows the results of the different testing methods that were applied in the original paper.

pcr_assess is a wrapper for the implemented quality assessment methods; pcr_efficiency and pcr_standard . Both methods can be called directly using the method names or through pcr_assess by passing a string to the argument method ; ‘efficiency’ or ‘standard_curve’ for calculating the amplification efficiency or the standard curve for each gene respectively.

Amplification efficiency pcr_efficiency

To calculate the amplification efficiency in a qPCR experiment, the main input is a data.frame with columns contain the \(C_T\) values for each gene and rows correspond to the different input amounts/dilutions (Table 5).

The following code apply the calculation on the data.frame , ct3 . It has two columns c_myc and GAPDH and 3 rows for each of the input amounts coded in the variable amounts . A reference gene passed to the reference_gene argument, in this case, the column name GAPDH.

The output of pcr_assess is a data.frame of 4 columns and n rows equals the input genes except for the reference. For each gene an intercept , slope and \(R^2\) is calculated for a the difference between it and the reference ( \(\Delta C_T\) ) (Table 7).

When the argument plot is TRUE a graph is produced instead shows the average and standard deviation of of the \(\Delta C_T\) at difference input amounts. In addition, a linear trend line is drawn (Fig 1).

Figure 1: Amplification efficiency of c-myc

The relevant summaries in calculating efficiency is the slope . Typically, slope should be very low (less than 0.1). Means the \(\Delta C_T\) are not changing much as a consequence of changing the input concentration. A value of the amplification efficiency itself is given by \(10 ^ {-1/slope}\) and should be close to 2.

Standard curve pcr_standard

To calculate the standard curve for individual genes, pcr_assess takes a data.frame similar to that described above as input ct3 , and the same amount variable. The following code calculates the curves for the two columns/genes by fitting a line between their \(C_T\) values and the log input amounts.

The output put is similar to the previous call, except when ‘standard_curve’ is passed to method curves are calculated for individual genes, column gene (Table 8).

The information of the standard curves are required when using the standard curve methods, so we retain the relevant ones in the variables intercept and slope . Typically, the r_squared should be close to 1.

When the argument plot is TRUE a graph is returned instead. A panel for each gene showing the raw \(C_T\) values and the log input amounts (Fig. 2).

Figure 2: Standard curve of c-myc and GAPDH

Similarly, pcr_analyze is a wrapper and unified interface for difference analysis models; pcr_ddct , pcr_dct and pcr_curve . The models can be invoked by calling these functions directly or through the argument method to pcr_analyze . Possible input to the argument method are ‘delta_delta_ct’, ‘delta_ct’ and ‘relative_curve’ for calculating the double delta \(C_T\) , delta \(C_T\) and the standard curve models respectively.

Doube delta \(C_T\) ( \(\Delta\Delta C_T\) ) pcr_ddct

To apply the double delta \(C_T\) model, the default method of pcr_analyze , the main input is a data.frame with columns containing the genes and the rows the \(C_T\) values from different samples. In addition, a group variable group_var corresponding to these rows/samples is required. Finally, a reference_gene and a reference_group are entered.

The following code chunk applies this method to a data.frame of 12 samples from 2 groups ct1 (Table 1).

The output of pcr_analyze is 8 columns; contains the calculations of each gene in each group and the error terms (Table 9). This analysis uses the default mode , ‘separate_tube’ as the input dataset came for an experiment where the target c_myc and the control gene GAPDH were ran in separate tubes.

In contrast, the ct2 dataset, also shown in Table 3, came from an identical experiment except the samples were run in the same tube. So the following analysis invokes a different mode , ‘same_tube’.

The only difference here is that the average of the \(C_T\) values for the target gene c_myc is calculated after normalizing by the reference gene GAPDH. The rest of the calculations are expected to be slightly different than the previous case (Table 10).

Figure 3 shows the results of these two analysis. Bars represent the average relative expression of c-myc in the kidney, normalized by GAPDH and calibrated by the brain. The error bars are the standard deviations.

Figure 3: Relative expression of c-myc using double delta \(C_T\)

Delta \(C_T\) ( \(\Delta C_T\) ) pcr_dct

This method is a variation of the double delta \(C_T\) model. It can be used to calculate the fold change of in one sample relative to the others. For example, it can be used to compare and choosing a control/reference genes.

Here, we used the column GAPDH from the dataset ct1 to make a data.frame , pcr_hk of two identical columns GAPDH1 and GAPDH2 two show how such comparison can be done.

The input to pcr_analyze is identical to that of the previous call, only method is specified this time to ‘delta_ct’.

Similarly, the output contains the calculated model and the error terms (Table 11). The difference here will be skipping the normalization step and calibrating the \(C_T\) values of all genes to a reference_group .

Figure 4 shows the average relative fold change of the identical housekeeping genes and there error terms in two tissue samples.

Figure 4: GAPDH relative fold change using delta \(C_T\)

Standard curve pcr_curve

The calculation of the standard curve method involves to steps as explained earlier. First, a standard curve is calculated for each gene to find the intercept and the slope . Then the relative expression is calculated.

To apply this method to the ct1 dataset (Table 2). We used the variables slope and intercept that were calculated earlier using the ct3 dataset (Table 8).

The output of pcr_analyze is the same as explained before. The calculated averages and error term of target genes in each group relative to the reference gene and group (Table 12).

The argument mode can be can be used to change the way the \(C_T\) values are averaged when samples from different genes were ran in the same tube (Table 4).

The output is similar to that described earlier (Table 13).

Figure 5 shows the output of the standard curve method. Relative expression values of c-myc in the kidney normalized by GAPDH and calibrated to the brain are shown as bars, Averages \(\pm\) standard deviations.

Figure 5: Relative expression of c-myc using the standard curve

Testing for statistical significance between conditions is important to ensure validity and replicability of the analysis. Different statistical methods require different assumptions. So the choice of which test to use depends on many factors. Among these factors are the number of the conditions/groups, the sample and replicate sizes and the type of desired comparison.

pcr_test provides a unified interface to different testing methods, which is similar to that used before for analysis and quality assessment.

Here, we used a dataset ct4 of 48 samples and two gene columns ref and target. The samples came from two groups as indicated in the variable group . arguments reference_gene and reference_control are used to construct the comparison the same way they were used to calculate the relative expression.

Figure 6: Relative expression of target gene using delta delta \(C_T\)

We start by analyzing the dataset ct4 using the default method of pcr_analyze , ‘delta_delta_ct’ and show a bar graph of the results (Fig. 6).

Finally we used the pcr_test to perform different tests. The resulting output tables can be compared to the results from the original paper that provided the dataset (Table 6).

When the argument test is set to ‘t.test’ a simple two-group t -test is carried out and an estimate for the difference between groups for the change in the target relative to a control gene is provided. In addition, a p_vale , a lower and upper 95% confidence intervals are provided as well (Table 14).

When the argument test is set to ‘wilcox.test’, Wilcoxon test method is used instead and similar output is provided (Table 15).

The linear regression can be applied to more than two groups and more advanced comparisons.

The output of the test , ‘lm’ contains an extra column term to show the different comparison terms used to calculate the results (Table 16).

Pabinger et al. surveyed the tools used to analyze qPCR data across different platforms (Pabinger et al. 2014) . They included 9 R packages which provide very useful analysis and visualization methods. Some of these packages focuses one certain models and some are designed to handle high-throughput qPCR data. Most of these packages are hosted in CRAN and a few on the Bioconductor so they adhere to Bioconductor methods and data containers. In comparison, pcr provides a unified interface for different quality assessment, analysis and testing models. The input and the output are tidy data.frame , and the package source code follows the tidy-verse practices. This package is targets the small scale qPCR experimental data and there R user practitioner. The interface and documentation choices were made with such users in mind and require no deep knowledge in specific data structures or complex statistical models.

Miscellaneous

In this section, we discuss briefly some common issues that arises when using real-time qPCR data and some simplified solutions using the pcr package. Mainly, we use linear models to quantify the effect of variables external to the reaction conditions on the resulting data and analysis. Either to identify such effects or to inform further experiments that are more likely to yield better results once the sources of the problems were removed. The examples are applied to the ct4 dataset alone with some artificial variable.

When considering testing, multiple variable in an experiment or their interactions the argument model_matrix should be used in pcr_test . The model_matrix should be constructed first to reflect the hypothesis at hand. For example, when having multiple dose of a treatment. A vector of numerical values is constructed first to indicate the dose that were used with each sample. Along with the main grouping variable group , they should be combined in a data.frame and the function model.matrix is used. The first argument to this function is the formula of the comparison.

In this case, the estimate effect of the interaction term between dose and group is very small and the p_value is very large (Table 17).

The quality of the RNA is very critical and should be measured and pass the minimum threshold. Including the scaled qualities , quality , can be added to an interaction term in a linear model, to rule out its effect on the analysis if suspected to have any.

The randomly generated quality seems to influence the results as indicated by the big estimate for the term ‘model_matrixgroupcontrol:quality’ and its p_value (Table 18).

The questions of whether it’s permissible to combine data from multiple runs depends on many factors. However, in practice it might be the only available option. In that case, one way to ensure the reliability of the data, is to use consider carefully which samples to run each time. Randomization or blocking can be used to avoid that batch effect or at least leave a chance to detect it if exists.

Here, we constructed a variable run to simulate a situation in which a dataset was generated in three separate runs. By including run in a model_matrix , its effect can be estimated. In this case, the estimate are very small and the p_value are very large, so it can be ignored (Table 19).

Contribution

I’d be glad to receive any comments or ideas to help the package forward.

  • To report a bug please use the issue page on github
  • Fork this repo to your github account
  • Clone the repo to your machine and make changes
  • Push to your account
  • Submit a pull request at this repo

My email is: [email protected]

  • It seems like there is a tiny mistake in the original table presented in (Livak and Schmittgen 2001) in calculating the average of \(C_T\) values of the GAPDH in the brain samples and the subsequent calculations. The tables shown here from this study are the corrected ones.
  • The original (Livak and Schmittgen 2001) serial dilution dataset provides only averages and standard deviations of \(C_T\) values. We used these summaries to regenerate a dataset of raw \(C_T\) values using the rnorm(n = 3, mean = average, sd = sd) to show how they could be used from the start in a typical analysis. So the subsequent calculations that involve this data set might be slightly different than the original tables in the paper.

Livak, Kenneth J, and Thomas D Schmittgen. 2001. “Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the Double Delta CT Method.” Methods 25 (4). ELSEVIER. https://doi.org/10.1006/meth.2001.1262 .

Pabinger, Stephan, Stefan Rodiger, Albert Kriegner, Klemens Vierlinger, and Andreas Weinhausel. 2014. “A Survey of Tools for the Analysis of Quantitative PCR (qPCR) Data.” Biomolecular Detection and Quantification . ELSEVIER. https://doi.org/10.1016/j.bdq.2014.08.002 .

Yuan, Joshua S, Ann Reed, Feng Chen, and Neal Stewart. 2006. “Statistical Analysis of Real-Time PCR Data.” BMC Bioinformatics 7 (85). BioMed Central. https://doi.org/10.1186/1471-2105-7-85 .

  • Create mode – the default mode when you create a requisition and PunchOut to Bio-Rad. You can create and edit multiple shopping carts
  • Edit mode – allows you to edit or modify an existing requisition (prior to submitting). You will be able to modify only the cart that you have PunchedOut to, and will not have access to any other carts
  • Inspect mode – when you PunchOut to Bio-Rad from a previously created requisition but without initiating an Edit session, you will be in this mode. You cannot modify any Cart contents
  • Order Status
  • Quick Order
  • Bioprocessing
  • Clinical Research
  • Drug Discovery & Development
  • Translational Research
  • Wastewater Surveillance
  • Diabetes / Hemoglobinopathies
  • Hospital / Clinical Core Lab
  • Infectious Disease
  • Newborn Screening
  • Transfusion Medicine
  • Quality Control
  • Food & Beverage Testing
  • Classroom Education
  • Bioprocess Analytics
  • Bioprocess Chromatography
  • Cell Line Development / Characterization
  • Cell Research

Gene Expression Analysis

  • Mutation Detection
  • Pathogen Detection
  • Protein Expression / Characterization / Quantitation
  • Viral / Vector Characterization
  • Bacteriology
  • Blood Typing, Screening & Antibody Identification
  • Hemoglobinopathies
  • Infectious Disease Testing
  • Molecular Diagnostics
  • Data Management Systems
  • Proficiency Testing & EQAS
  • Verification & Validation
  • Food & Beverage Safety Testing
  • Cannabis Testing
  • Veterinary Diagnostics
  • Water Quality Testing
  • Biotechnology Textbook & Program
  • DNA, PCR & Agarose Gel Electrophoresis
  • Genetic Engineering, Microbiology & Model Organisms
  • Proteins, Enzymes & ELISA
  • COVID-19 Assay & Research
  • Cell Isolation & Analysis
  • Chromatography
  • Digital PCR
  • Electrophoresis & Blotting
  • Flow Cytometers
  • Immunoassays
  • PCR & qPCR
  • Sample Preparation & Quantitation
  • Transfection
  • Autoimmune Testing
  • Blood Typing & Antibody Detection
  • Diabetes Testing
  • Hemoglobinopathy Testing
  • Microbiology Testing
  • Quality Controls
  • Software & Data Analysis
  • Molecular Testing
  • B2B Commerce Solutions
  • Custom PCR Plastics & Reagents
  • Expert Care Service
  • New Labs & New Grants
  • Remote Diagnostic Services
  • Supply Center Program
  • Instrument Service Support Plans
  • Trade-Up Program
  • Certificate of Analysis
  • ​​Electronic IFUs
  • Literature Library
  • Product Safety Data Sheets
  • Quality Management Systems Certificates
  • Quality Control Inserts
  • Life Science
  • Clinical Testing Solutions
  • Bioprocess Chromatography Resources
  • Classroom Resources
  • Product News
  • Corporate News

qPCR Analysis

qpcr data presentation

Questions about qPCR Analysis?

Questions about rna isolation.

Contact a Specialist

Real-time quantitative polymerase-chain-reaction (qPCR) is a standard technique in most research laboratories performing gene expression. qPCR data analysis is a crucial part of a gene expression experiment, and has led to the development of several key methods. The sections which follow provide an overview of the key quantification strategies used in qPCR for gene expression.

The Importance of Data Analysis in Gene Expression

Gene expression analysis by real-time qPCR has been a key enabler of a routine and robust approach for measuring gene expression in genes of interest, as well as monitoring biomarkers.  This section will provide the key features of qPCR data analysis and describe examples of common methods to analyze data from a qPCR assay.

How to Evaluate Amplification Curves from a qPCR Assay

Structure of an amplification curve can be defined in terms of the phases described below.

Amplification Curve

Figure 1. Amplification curve. The different phases of the qPCR assay reaction. Baseline occurs from cycles 0 through 10, where the initial concentration of template is low. At this level, the fluorescence intensity is too low to be detected and only background signal occurs. Exponential: once target yield attains the detection threshold, the reaction is followed through its exponential phase. Linear: template concentration increases, leading to a reduction in available DNA polymerase concentration, and the reaction rate decreases. Plateau: reaction is at maximal yield.

Introduction to Data Acquisition and Analysis: Basic Calculations using Cq

The quantitative analysis of RT-qPCR is obtained through analysis of the quantification of cycle (Cq) values. The Cq value has been given many different names including:

  • Ct — threshold cycle
  • Cp — crossing point
  • TOP — take-off point

According to MiQE guidelines, Cq is the correct terminology.

How to Identify the Cq Value for qPCR

The Quantification Cycle, Cq, is the cycle number at which the fluorescence first rises above the threshold level. In the diagram below, note where Cq is defined in relation to the baseline level and the beginning of the exponential phase of the reaction curve.

Identifying Cq Value for qPCR

How to Analyze Data to Determine Cq

A set of procedures and data normalization is required before performing quantification analysis:

Baseline Correction

To avoid variation from background signal caused by external factors unrelated to the samples (e.g. plastic ware, light leaking, fluorescence probe not quenched, etc…), it is recommended to select the fluorescence intensity from the first cycles, usually from cycles 5-15 and identify a constant and linear component of background fluorescence. This information is used to define the baseline for analysis.

Threshold Setting

The quantification zone should be selected which exceeds the detection limit of the PCR thermocycler being used. Usually the number of cycles needed is directly proportional to the sample copy number. Instead of using the intensity of minimum fluorescence detected by the instrument, a quantification zone with higher fluorescence intensity is selected and a threshold defined for this area.

Using the following guidelines can help define a correct threshold:

  • Lies in the amplification log phase, and avoids the plateau phase
  • Lies significantly over the background baseline
  • Lies within a log phase range where sample amplification plots are parallel

To see a real-world example of analysis for gene expression, review the example listed below.

The most common application of real-time PCR is the study of gene expression. Measuring gene expression is of fundamental importance in man areas of contemporary biomedical research, ranging from basic science to practical applications in industry and medicine. In most gene expression studies, researchers are interested in determining the relative gene expression levels in test vs. control samples. For example, a researcher might be interested in the expression of a particular gene in cancerous vs. normal tissue. In this section, general guidelines and a sample gene expression experiment are presented to demonstrate how to perform gene expression analysis using real-time PCR.

Experimental Design

A sample gene expression analysis using a multiplex Taqman assay is presented in the following sections. In this example, we're interested in the relative expression of three genes in the polyamine biosynthesis pathway, ornithine decarboxylase (ODC), ODC antizyme (OAZ), and antizyme inhibitor (AZI), in two different samples, sample A and sample B. Due to possible pipetting errors and initial quantification errors of the input RNA, the amount of starting cDNA in the different real-time reactions may be different. In this example, the expression of a reference gene, β-actin, was chosen as the normalizer to control for any difference in cDNA input amount. The following steps were performed to determine the relative expression level of ODC, OAZ, and AZI in the two different samples:

  • RNA was isolated from sample A and sample B.
  • RNA was reverse transcribed into cDNA.
  • The amount of the target genes (ODC, OAZ, and AZI) and the reference gene (β-actin) was determined in each of the cDNA samples using a multiplex qPCR assay.
  • Data were analyzed and the relative expression of each of the target genes in the two samples was calculated.

Experimentation

To determine the expression of the three target genes, ODC, OAZ, and AZI, and the reference gene, β-actin, in samples A and B, these four genes were assessed in a multiplex two-step RT-qPCR assay.

Reaction Components for Multiplex Assay

The multiplex assay contained the following components in a 50 µl reaction

  • 1/10 of a reverse transcription reaction that used 100 ng of total RNA
  • 300 nM each primer
  • 200 nM each probe
  • 200 µM each dNTP
  • 5 mM MgCl 2
  • 3.75 U iTaq DNA polymerase

Cycling Protocol

The cycling program used is shown below:

Learn more about specific features in Bio-Rad's CFX Maestro Software for qPCR data analysis »

CFX Opus Cloud Connectivity: BR.io

Get started using br.io today.

Sign Up for an Account

The CFX Opus System seamlessly integrates with Bio-Rad's new cloud platform, BR.io, enabling you to get the most out of your instrument and minimize time at the bench. BR.io can be accessed from any internet connection using a Safari or Chrome web browser, and there is no software installation required.

BR.io cloud connectivity eliminates the need for a dedicated computer connected to the instrument and provides new capabilities.

Remotely set up CFX Opus runs, and start directly from the instrument

Monitor instruments and run progress remotely

Automatically transfer and store data in the cloud

Access and analyze data from anywhere

qpcr data presentation

Considering a CFX Opus Instrument for your Lab?

Br.io cloud platform tutorial video series.

Learn how to use the BR.io cloud platform with a CFX Opus Real-Time PCR System. Videos in this series:

  • 1. Introduction to the BR.io Cloud Platform
  • 2. Getting Started in the BR.io Cloud Platform
  • 3. Linking a CFX Opus Real-Time PCR System to Your BR.io Cloud Platform Account
  • 4. Setting Up a CFX Opus Protocol in BR.io
  • 5. Setting Up and Performing a CFX Opus Real-Time PCR System Run Using the BR.io Cloud Platform
  • 6. Reviewing and Exporting Data from a CFX Opus Real-Time PCR System Run Using the BR.io Cloud Platform

Consult with a Specialist

Questions about the CFX Opus Cloud qPCR System?

Related products.

qPCR Detection Systems

qPCR Detection Systems

These systems deliver sensitive, reliable detection of both singleplex and multiplex real-time PCR reactions.

qPCR Analysis Software

qPCR Analysis Software

CFX Maestro Software is designed to streamline data collection, visualization, and analysis. Precision Melt Analysis software enables high resolution melt for SNP Genotyping and detection of small indels.

Bulletin 6090

Amplification and Reagents and Plastics Brochure

An overview of PCR reagents and plastics with info on specifications, performance data, and more.

Bulletin 5279

PCR Amplification Guide

A guide for RT-PCR on key concepts, experimental designs, analyses, and product recommendations.

These pages list our product offerings in these areas. Some products have limited regional availability. If you have a specific question about products available in your area, please contact your local sales office or representative .

  •   Bio-rad LinkedIn   Bio-rad Antibodies LinkedIn
  •   Bio-rad YouTube   Bio-rad Antibodies YouTube
  •   Bio-rad Twitter   Bio-rad Antibodies Twitter
  •   Bio-rad Facebook   Bio-rad Antibodies Facebook
  •   Bio-rad Instagram
  •   Bio-rad Pinterest

About Bio-Rad

Bioradiations, sustainability, investor relations.

  • Open access
  • Published: 27 August 2009

QPCR: Application for real-time PCR data management and analysis

  • Stephan Pabinger 1 , 2 ,
  • Gerhard G Thallinger 1 ,
  • René Snajder 1 ,
  • Heiko Eichhorn 3 ,
  • Robert Rader 2 &
  • Zlatko Trajanoski 1 , 2  

BMC Bioinformatics volume  10 , Article number:  268 ( 2009 ) Cite this article

28k Accesses

53 Citations

Metrics details

Since its introduction quantitative real-time polymerase chain reaction (qPCR) has become the standard method for quantification of gene expression. Its high sensitivity, large dynamic range, and accuracy led to the development of numerous applications with an increasing number of samples to be analyzed. Data analysis consists of a number of steps, which have to be carried out in several different applications. Currently, no single tool is available which incorporates storage, management, and multiple methods covering the complete analysis pipeline.

QPCR is a versatile web-based Java application that allows to store, manage, and analyze data from relative quantification qPCR experiments. It comprises a parser to import generated data from qPCR instruments and includes a variety of analysis methods to calculate cycle-threshold and amplification efficiency values. The analysis pipeline includes technical and biological replicate handling, incorporation of sample or gene specific efficiency, normalization using single or multiple reference genes, inter-run calibration, and fold change calculation. Moreover, the application supports assessment of error propagation throughout all analysis steps and allows conducting statistical tests on biological replicates. Results can be visualized in customizable charts and exported for further investigation.

We have developed a web-based system designed to enhance and facilitate the analysis of qPCR experiments. It covers the complete analysis workflow combining parsing, analysis, and generation of charts into one single application. The system is freely available at http://genome.tugraz.at/QPCR

Amongst other high throughput techniques like DNA microarrays and mass spectrometry, qPCR has become important in many areas of basic and applied functional genomics research. Due to its high sequence-specificity, large dynamic range, and tremendous sensitivity it is one of the most widely used methods for quantification of gene expression. Moreover, due to the adoption of robotic pipetting stations and 384-well formats, laboratories generate a huge amount of qPCR data demanding a centralized storage, management, and analysis application.

Most software programs provided along with the qPCR instruments support only straightforward calculation of quantification cycle (Cq) values from the recorded fluorescence measurements. However, in order to get biological meaningful results these basic calculations need to undergo further analyses such as normalization, averaging, and statistical tests [ 1 ].

To this end, a variety of different methods have been published describing the normalization of Cq values. The simplest model (termed ΔΔ-Cq method) was developed by Livak and Schmittgen [ 2 ] which assumes perfect amplification efficiency by setting the base of the exponential function to 2 and uses only one reference gene for normalization. The model proposed by Pfaffl [ 3 ] considers PCR efficiency for both the gene of interest and a reference gene and is therefore an improvement over the classic ΔΔ-Cq method. Nevertheless, it still uses only one reference gene which may not be sufficient to obtain reliable results [ 4 ]. Hellemans et al. [ 5 ] proposed an advanced method which considers gene-specific amplification efficiencies and allows normalization of Cq values with multiple reference genes based on the method proposed by Vandesompele et al. [ 4 ]. It should be noted that these methods could differ substantially in their performance, because of the different assumptions they are based on.

Available software tools often cover only single steps in the analysis pipeline compelling researchers to use multiple tools for the analysis of qPCR experiments [ 5 – 8 ]. However, these tools do not share a common file format making it difficult to analyze the experimental data. Additionally, no standardization of methodology has been established that would be needed for relatable comparison between laboratories [ 9 ]. Recently, the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines [ 10 ] were published which are intended to describe the minimum information necessary for evaluating and comparing qPCR experiments. Based on a subset of these guidelines the XML-based Real-Time PCR Data Markup Language (RDML) [ 11 ] was proposed which tries to facilitate the exchange of qPCR data and related information between qPCR instruments, analysis software, journals, and public repositories. These efforts could allow a more reliable interpretation of qPCR results if they were accepted in the qPCR community.

The lack of complete or partial assessment of error propagation throughout the whole analysis pipeline may result in an underestimated final error and could therefore lead to incorrect conclusions. Moreover, the analysis of experiments using tools that make invalid biological assumptions can cause significantly wrong results as reported in [ 8 ].

To the best of our knowledge, there is no single tool available which integrates storage, management, and analysis of qPCR experiments. Hence a system enabling comparison of results and providing a standardized way of analyzing data would be of great benefit to the community. We have therefore developed QPCR, a web-based application which supports: a) technical and biological replicate handling, b) the analysis of qPCR experiments with an unlimited number of samples and genes, c) normalization using an arbitrary number of reference genes, d) inter-plate normalization using calibrators, e) assessment of significant gene deregulation between sample groups, f) generation of customizable charts, and g) a plug-in mechanism for easy integration of new analysis methods.

Implementation

The QPCR system was implemented in Java, a platform independent and object-oriented programming language [ 12 ]. The application is based on the Java 2 Enterprise Edition (J2EE) three-tier architecture consisting of a presentation-, business -, and database-layer. A relational database (PostgreSQL or Oracle) is used as the persistence backend. The business layer consists of Enterprise Java Beans (EJB) and is deployed on a JBoss [ 13 ] application server. The presentation layer is based on the Model-View-Controller (MVC) framework Struts [ 14 ] and uses Java Servlets and Java Server Pages.

In order to enhance usability current web technologies have been extensively used in this application. AJAX functionality has been incorporated into the application using the open-source library DWR [ 15 ]. This technology allows asynchronous loading of data without the need to reload the page thus providing a desktop like application behavior. Multiple JavaScript libraries (Prototype [ 16 ], JQuery [ 17 ]) have been used that allow executing functions on the client side and therefore remarkably improve the usability of the application. Charts are generated using the open-source Java library JFreeChart [ 18 ] and all charts are created either in the lossless PNG format or as a scalable vector graphic (SVG).

All algorithms, calculation methods, and data file parsers used by the application are integrated through a plug-in mechanism which allows simple extension with additional qPCR data formats and analysis approaches. For each class that uses the plug-in mechanism a specific interface needs to be implemented in order to support another vendor or implement an additional analysis method. The new Java classes are then automatically detected by the QPCR application.

Currently the data file parsers support files generated by Applied Biosystems (ABI 7000, ABI 7500, ABI 7900) and Roche LightCycler (LightCycler 2.0, LightCycler 480) [ 19 ] systems as well as a generic file format based on comma separated values (CSV). Since not all fluorescence measurements can be extracted from data files created by the qPCR instrument systems, additional export files are required to parse all relevant data.

Analysis methods that calculate Cq and amplification efficiency values are computationally expensive and are therefore executed asynchronously and do not interfere with the QPCR web interface. They are designed to operate on a per well basis and report the current progress of the calculation. Normalization methods and statistical tests are not time consuming processes and are therefore executed in real time.

The QPCR application has been designed using the Unified Modeling Language (UML) [ 20 ]. The use of a UML representation improves maintainability as the application architecture is outright visible and provides an important part of the system documentation. We used the AndroMDA framework [ 21 ] to create basic EJB and presentation tier source code as well as configuration files based on the UML model. AndroMDA minimizes repetitive coding tasks, allows to easily extend or edit the architecture of the application, and helps maintaining the consistency between design and implementation.

The stored data is secured by a user management system which allows the definition of several fine grained user access levels and offers data sharing and concurrent access in a multi-centric environment [ 22 ]. Moreover, the application provides two configurations which assign the ownership of objects either to the submitter or to the submitter's institute. The latter setup provides the possibility to edit and analyze experiments by all users of an institute without the need to explicitly share objects.

QPCR is an application which integrates storage, management, and analysis of qPCR experiments into one single tool. Implemented as a web application it can be accessed by a web browser from every network connected computer and therefore supports the often decentralized work of biologists. It parses files generated by qPCR instruments, stores data and results in a database, and performs analyses on the imported data. Moreover, it allows conducting of statistical tests and provides several ways to visualize and export the calculated results (Figure 1 ).

figure 1

Analysis pipeline . This figure illustrates the analysis pipeline implemented in the QPCR application.

Parsing files and calculation of Cq/efficiency values

Data files are uploaded into the application using a single file upload dialog or an integrated Java applet which supports uploading of multiple files at once. An upload zone lists all available files and allows querying and downloading of data previously uploaded. All files are stored in a user defined directory facilitating the backup of project critical files.

After uploading the exported files into the QPCR application, a list of all files which have not yet been processed is shown. The user can select single or multiple files for parsing. Moreover, Cq and amplification efficiency values can be automatically calculated after the files have been parsed using one or several different methods.

During parsing all relevant data is extracted, including plate setup, fluorescence measurements, and qPCR instrument specifications and stored in the database. In contrast to many available analysis tools the application is able to import qPCR data files without the need for additional file manipulations and therefore reduces error-prone and cumbersome manual work. In addition to the already existing data file parsers the application can be easily extended to support other vendors due to the modularity of the platform and the used plug-in mechanism.

Once the data is parsed and stored in the database, Cq and amplification efficiency values are calculated based on the fluorescence measurements. Several published and widely-used algorithms were implemented; two different algorithms to calculate Cq together with efficiency values, three different algorithms to calculate solely the amplification efficiency, and one method to calculate the Cq value are available (see Table 1 ).

The progress of all active parser or analyzer background tasks is displayed on a view that automatically updates the current status. As soon as a process has finished a message is shown at the top of the page. For each process a log file is created which informs the user about the outcome of the performed job. A color scheme helps to quickly identify the jobs that have not finished successfully.

During parsing of uploaded files a Run is created in the application which is a direct representation of the performed qPCR run. It stores information about the hardware, software, thermocycler profile, and category.

Each Run contains a plate which consists of multiple wells that store information about the sample, target, passive reference, task, and omitted status. The plate layout can be displayed in a list and each well can be edited to correct inconsistencies or to omit it from further analysis.

Additionally, QPCR provides a graphical representation of the plate layout by showing a grid which displays sample, target, and status information of each well. By selecting an arbitrary number of wells, charts of amplification (raw and background subtracted) and dissociation (raw and derivative) curves are displayed (Figure 2 ). This view is helpful to evaluate the performance of the PCR for each well and is useful to perform a quick quality check of the conducted qPCR run.

figure 2

Graphical representation of the plate layout . The tabbed bar at the top is used to switch between different chart types. The chart itself features tool tips and provides a legend. Beneath the chart is a representation of the plate layout that is adapted to the plate size (96/384 wells, linear layout). Selected wells are colored in red, omitted and empty wells in blue.

Analysis of experiments

After Cq and efficiency values have been determined, experiments consisting of one or multiple runs are subjected to subsequent analysis steps. Several plates can be combined into one experiment. In order to support a flexible and adaptable analysis of experiments, the application allows selecting of specific samples and genes to be used in subsequent analysis steps. Moreover, the Cq calculation method, the efficiency method, and the reference genes can be defined.

Four different ways to consider amplification efficiencies in the analysis have been implemented: (1) setting a single efficiency value for all targets, (2) manually defining the efficiency for each target, (3) using efficiencies derived from dilution series for each target, and (4) using calculated efficiencies for each well. Several different efficiencies values for a target, calculated by serial dilution series, can be stored in the database.

Normalization of experiments is based on a method proposed by Hellemans et al. [ 5 ] and includes averaging of technical replicates, normalization against reference genes, inter-run calibration, and calculation of quality control parameters. Technical replicates are averaged either within one plate or over all plates of the experiment depending on the analysis setting. In the next step all samples of one gene are referenced to the arithmetic mean Cq value across all samples for this gene. Thereafter the user selected type of efficiency is considered for each target and the samples are normalized to the selected reference genes. If reaction specific efficiency has been selected the efficiency is averaged for each target. Depending on the analysis setting the application supports spreading of reference genes across multiple runs or uses reference genes for each run independently. Finally, inter-run calibrators are automatically detected and are used to normalize results between different qPCR runs.

Quality control parameters for reference genes are calculated based on a method described by Vandesompele et al. [ 4 ]. When multiple reference genes are selected the coefficient of variation and the gene stability value M are calculated. These parameters are helpful for selecting and evaluating reference genes. Additionally, QPCR performs outlier detection by calculating the difference in quantification cycle value between technical replicates and allows highlighting those that have a larger difference than a user defined threshold. Moreover, quality control checks are performed to test if a no template control (NTC) is present for each target.

Fold change ratios of the calculated normalized Cq values can be calculated by referencing them to one or multiple samples. All analysis setup parameters are automatically stored in the database and are loaded when the experiment is analyzed again. Additionally, each analysis setup can be stored under a user defined name. Throughout the whole analysis process proper error propagation is performed using methods described in [ 5 , 23 ].

During the development of the QPCR application special attention was laid on the accurate and user-friendly visualization of calculated results. Therefore, the application allows to display and export results of every important analysis step. The generated figures are highly customizable and are designed to be usable in publications without further manipulation. Among other parameters QPCR allows to define color, labeling, sort sequence, and data type to be used in histogram charts. Cq values normalized by reference genes and calibrators are presented as histograms displaying results of one gene or multiple genes at once (Figure 3 ). Every result throughout the analysis pipeline can be exported in tab-delimited or spreadsheet format (txt, csv, xls) to be used in external applications.

figure 3

Visualization of normalized relative quantities . The tabbed bar is used to switch between views that display multiple targets at once, one target at a time (displayed), or quality control parameters. On the left side the user can define various parameters including the displayed target, the specific result, the presented error, and the reference samples. The list of displayed samples can be reordered using drag and drop, samples may be excluded from the chart, and for each sample an alternative name and an individual color can be assigned.

Conducting statistical tests

The final step in the analysis pipeline is the comparison of samples using statistical tests (e.g.: biological replicates, samples of a time series). The application allows to group samples into an arbitrary number of classes which are tested for their significant difference against one defined reference class. QPCR includes several statistical tests to compute p-values such as ANOVA, student's t-test, and a permutation based test which makes no assumption on the distribution of the data. Tests can be conducted on either untransformed or log2 transformed values. The application allows adjusting the calculated p-value by supporting several established correction methods for multiple testing [ 24 ].

Calculated test results are displayed for each class and can be exported for further analysis. Moreover, the fold changes of samples are displayed in histogram charts in which samples of each class are grouped together. Every class is assigned to a specific user defined color or shape that is used in different shades to group the samples of one class (Figure 4 ).

figure 4

Visualization of a statistical test result . The statistical test was used to test two classes of biological replicates for their significant differences, whereas class " fasted m" was used as reference class. The table shows the calculated p-Value and parameters. Samples of each class are grouped together and marked in different colors or shapes.

General data entry and query

The application provides views of every entity to (1) manually enter data and (2) list available items. Entry views consist of mandatory and optional fields and use drop down selection lists to specify references to other entities. Entered data is checked for validity and the user is informed about erroneous inputs. List views present the data in tabular form and support paging, sorting, and querying for any combination of the available attributes. Moreover, queries can be stored in the database for later use.

We have developed an integrated platform for the analysis and management of qPCR experiment data using state-of-the-art software technology. The uniqueness of the application is defined by the support of various qPCR instruments, multiple data analyzers, and statistical methods, as well as the coverage of the complete analysis pipeline including proper error propagation. Moreover, it provides a flexible plug-in mechanism to incorporate new parsers and methods and allows generation of highly customizable charts. A comparison of features between QPCR and several other popular qPCR analysis tools is provided in Table 2 .

The capability to import and parse data without the need for further file manipulations is an integral part of the application which avoids errors during the analysis and reduces the time to analyze the experimental data. As most of the available qPCR software tools rely on special formatted input files it was a prerequisite of the platform to be able to directly parse files generated by the qPCR instruments software suits. Moreover, the system is not confined to a specific manufacturer and can therefore be used in laboratories equipped with qPCR instruments from different vendors.

QPCR includes established and widely used methods for the calculation of Cq and amplification efficiency values and supports an easy integration of new algorithms. This framework does not limit the researcher to one specific approach and allows incorporation of newly developed analysis methods. Furthermore, it is of great value as different experimental situations need to be considered separately and it remains up to individual researches to identify the method most appropriate for their experimental conditions [ 25 ]. QPCR allows to store several different analysis settings for each experiment and calculates quality control parameters which help to evaluate the performed analysis. Incorporating several different methods to include the amplification efficiency enhances the flexibility of the application and allows adapting the analysis to the experimental conditions or laboratory practices. Particularly, supporting the widely used calculation of efficiency based on serial dilution series increases the acceptance in the qPCR community.

An often underestimated drawback of using multiple tools to analyze qPCR experiments is the lack of support for assessment of error propagation. Therefore the final error is often based solely on the standard deviation of biological replicates which can lead to false biological interpretations. The QPCR application addresses this problem and includes assessment of error propagation throughout the whole analysis pipeline covering technical replicate handling, normalization, inter-run calibration, referencing against samples, and biological replicate handling. The implemented method is based on Taylor series expansion which allows direct calculation of the full probability distribution and is in contrast to Monte Carlo based methods computationally inexpensive [ 26 ].

Special focus was laid on the presentation of analysis results. QPCR provides an interface which uses state-of-the-art software technologies to generate highly customizable charts that are designed to be ready for publication. Since many available tools do not provide a suitable graphical representation of the calculated results, Microsoft Excel is often used to create figures which require manual import and/or conversion of data. QPCR combines the calculation and presentation of results into one single tool which reduces analysis time and avoids additional potential error-prone steps. A flowchart displaying each analysis step and its suggested method is included into the user guide.

The recent developments of data exchange formats (RDML) and guidelines describing the minimum information about qPCR experiments (MIQE) could become an important part in standardizing qPCR experimental data. QPCR already integrates the suggested nomenclature and RDML support will be implemented as soon as the relevant Java libraries are available. Once established in the qPCR community these initiatives will allow a standardized exchange of data between software tools and facilitate the comparison of qPCR experiments.

Using three-tier software architecture that separates the presentation, the business, and the database layer enables not only easy maintenance but also allows distribution of the computing load to several servers. As more and more data needs to be analyzed this design may be very valuable in the future.

The use of a database allows easy querying and comparing of data and guarantees data integrity. The implemented plug-in framework, which is used for including data file parsers, analysis methods, and statistical algorithms, ensures that the application is adaptable to new developments and allows the effortless integration of innovative scientific methods.

We have developed QPCR, a system for the storage, management, and analysis of qPCR data. It integrates the complete analysis workflow, ranging from Cq determination over normalization and statistical analysis to visualization, into a single application. The analysis time is significantly reduced and complex analyses can now be compared within a single or across multiple laboratories. Optimal usability has been ensured by involving biologists throughout the entire development process and by extensive tests in a laboratory setting. Given the incorporation of several analysis methods and the flexibility due to the use of standard software technology and plug-in mechanism, the developed application could be of great interest to the qPCR community.

Availability and requirements

Project name: QPCR

Project home page: http://genome.tugraz.at/QPCR

Operating system: Solaris, Linux, Windows, Mac OS X

Programming language: Java

Other requirements: Java JDK 1.6.x, Oracle™ 9i or PostgreSQL™ 8.0.x, a server with at least 1 GB of main memory (2 GB are recommended) available to the application

License: IGB-TUG Software License

Any restrictions to use by non-academics: IGB-TUG Software License

Installation of the application is provided through an installer and should be completed within one hour provided the necessary database access rights are granted. We recommend installing the application on a central server by a system administrator. Step-by-step instructions are provided at the projects web site together with the installer file. The reference installation of QPCR is running on a SUN Fire™ X4600 M2 6 × dual core Opteron server (Sun Microsystems Ges.m.b.H, Vienna, Austria) with 24 GB of memory running Solaris and using a dedicated Oracle 10 g database server. Attached is a Storage Area Network (EVA 5000, Hewlett-Packard Ges.m.b.H., Vienna, Austria) with 9.5 TBytes net capacity.

Wong ML, Medrano JF: Real-time PCR for mRNA quantitation. Biotechniques 2005, 39: 75–85. 10.2144/05391RV01

Article   CAS   PubMed   Google Scholar  

Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 2001, 25: 402–408. 10.1006/meth.2001.1262

Pfaffl MW: A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res 2001, 29: E45. 10.1093/nar/29.9.e45

Article   PubMed Central   CAS   PubMed   Google Scholar  

Vandesompele J, De P, Pattyn F, Poppe B, Van R, De P, Speleman F: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol 2002, 3: RESEARCH0034. 10.1186/gb-2002-3-7-research0034

Article   PubMed Central   PubMed   Google Scholar  

Hellemans J, Mortier GR, De P, Speleman F, Vandesompele J: qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol 2007, 8: R19. 10.1186/gb-2007-8-2-r19

Jin N, He K, Liu L: qPCR-DAMS: a database tool to analyze, manage, and store both relative and absolute quantitative real-time PCR data. Physiol Genomics 2006, 25: 525–527. 10.1152/physiolgenomics.00233.2005

Simon P: Q-Gene: processing quantitative real-time RT-PCR data. Bioinformatics 2003, 19: 1439–1440. 10.1093/bioinformatics/btg157

Ramakers C, Ruijter JM, Deprez RH, Moorman AF: Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci Lett 2003, 339: 62–66. 10.1016/S0304-3940(02)01423-4

Bustin SA: Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems. J Mol Endocrinol 2002, 29: 23–39. 10.1677/jme.0.0290023

Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, Mueller R, Nolan T, Pfaffl MW, Shipley GL, Vandesompele J, Wittwer CT: The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments. Clin Chem 2009, 55: 611–622. 10.1373/clinchem.2008.112797

Lefever S, Hellemans J, Pattyn F, Przybylski DR, Taylor C, Geurts R, Untergasser A, Vandesompele J: RDML: structured language and reporting guidelines for real-time quantitative PCR data. Nucleic Acids Res 2009, 37: 2065–2069. 10.1093/nar/gkp056

Gosling J, Joy B, Steele G, Bracha G: The Java(TM) Language Specification . 3rd edition. Boston: Addison-Wesley Professional; 2005.

Google Scholar  

JBoss Group: JBoss Application Server.2008. [ http://www.jboss.org/jbossas/ ]

Apache Software Foundation: Apache Struts.2006. [ http://struts.apache.org/ ]

Getahead: DWR: Easy AJAX for JAVA.2008. [ http://directwebremoting.org ]

Prototype Core Team: Prototype: JavaScript Framework.2009. [ http://www.prototypejs.org/ ]

John Resig and jQuery Team: jQuery.2009. [ http://jquery.com/ ]

Gilbert David: The JFreeChart Class Library.2008. [ http://www.jfree.org/jfreechart/ ]

Wittwer CT, Ririe KM, Andrew RV, David DA, Gundry RA, Balis UJ: The LightCycler: a microvolume multisample fluorimeter with rapid temperature control. Biotechniques 1997, 22: 176–181.

CAS   PubMed   Google Scholar  

Booch G, Rumbaugh J, Jacobson I: The Unified Modeling Language User Guide . 2nd edition. Boston, MA, USA, Addison-Wesley Professional; 2005.

AndroMDA Core Team: AndroMDA.2007. [ http://www.andromda.org/ ]

Maurer M, Molidor R, Sturn A, Hartler J, Hackl H, Stocker G, Prokesch A, Scheideler M, Trajanoski Z: MARS: microarray analysis, retrieval, and storage system. BMC Bioinformatics 2005, 6: 101. 10.1186/1471-2105-6-101

Larionov A, Krause A, Miller W: A standard curve based method for relative real time PCR data processing. BMC Bioinformatics 2005, 6: 62. 10.1186/1471-2105-6-62

Dudoit S, Shaffer JP, Boldrick J: Multiple Hypothesis Testing in Microarray Experiments. U C Berkeley Division of Biostatistics Working Paper Series Working Paper 110 2002. [ http://www.bepress.com/cgi/viewcontent.cgi?article=1014&context=ucbbiostat ]

Bustin SA, Benes V, Nolan T, Pfaffl MW: Quantitative real-time RT-PCR – a perspective. J Mol Endocrinol 2005, 34: 597–601. 10.1677/jme.1.01755

Gerards BM: Error Propagation In Environmental Modelling With GIS . Bristol, PA, USA, Taylor & Francis; 1998.

Guescini M, Sisti D, Rocchi MB, Stocchi L, Stocchi V: A new real-time PCR method to overcome significant quantitative inaccuracy due to slight amplification inhibition. BMC Bioinformatics 2008, 9: 326. 10.1186/1471-2105-9-326

Zhao S, Fernald RD: Comprehensive algorithm for quantitative real-time polymerase chain reaction. J Comput Biol 2005, 12: 1047–1064. 10.1089/cmb.2005.12.1047

Rutledge RG: Sigmoidal curve-fitting redefines quantitative real-time PCR with the prospective of developing automated high-throughput applications. Nucleic Acids Res 2004, 32: e178. 10.1093/nar/gnh177

Wilhelm J, Pingoud A, Hahn M: SoFAR: software for fully automatic evaluation of real-time PCR data. Biotechniques 2003, 34: 324–332.

Ostermeier GC, Liu Z, Martins RP, Bharadwaj RR, Ellis J, Draghici S, Krawetz SA: Nuclear matrix association of the human beta-globin locus utilizing a novel approach to quantitative real-time PCR. Nucleic Acids Res 2003, 31: 3257–3266. 10.1093/nar/gkg424

Integromics: RealTime StatMiner.2009. [ http://www.integromics.com/StatMiner.php ]

Biogazelle: qBasePlus.2009. [ http://www.biogazelle.com/site/products/qbaseplus ]

MultiD: GenEx.2009. [ http://www.multid.se/genex.html ]

Download references

Acknowledgements

This work was supported by the Austrian Ministry of Science and Research, GEN-AU program (project Bioinformatics Integration Network) and the Christian-Doppler Society. We thank Anne Krogsdam and Andreas Prokesch for valuable discussions and Roman Fiedler for implementing the initial file parser.

Author information

Authors and affiliations.

Institute for Genomics and Bioinformatics, Graz University of Technology, Petersgasse 14, 8010, Graz, Austria

Stephan Pabinger, Gerhard G Thallinger, René Snajder & Zlatko Trajanoski

Christian Doppler Laboratory for Genomics and Bioinformatics, Petersgasse 14, 8010, Graz, Austria

Stephan Pabinger, Robert Rader & Zlatko Trajanoski

Development Anti-Infectives Microbiology, Sandoz GmbH, Biochemiestrasse 10, 6250, Kundl, Austria

Heiko Eichhorn

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Zlatko Trajanoski .

Additional information

Authors' contributions.

SP designed the application and drafted the manuscript. He was responsible for implementation of the database, the development the data presentation and many parts of the business logic. GGT contributed to conception and design of the application and helped drafting the manuscript. RS improved the data file parsers and analysis methods. HE gave valuable input regarding the usability of the platform. RR participated in the design and implementation of the application and helped drafting the manuscript. ZT was responsible for the overall project coordination. All authors gave final approval of the version to be published.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2, authors’ original file for figure 3, authors’ original file for figure 4, rights and permissions.

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Pabinger, S., Thallinger, G.G., Snajder, R. et al. QPCR: Application for real-time PCR data management and analysis. BMC Bioinformatics 10 , 268 (2009). https://doi.org/10.1186/1471-2105-10-268

Download citation

Received : 02 March 2009

Accepted : 27 August 2009

Published : 27 August 2009

DOI : https://doi.org/10.1186/1471-2105-10-268

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Unify Modeling Language
  • Amplification Efficiency
  • Unify Modeling Language Model
  • qPCR Experiment
  • Quality Control Parameter

BMC Bioinformatics

ISSN: 1471-2105

qpcr data presentation

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 29 October 2021

Auto-qPCR; a python-based web app for automated and reproducible analysis of qPCR data

  • Gilles Maussion 1 , 2   na1 ,
  • Rhalena A. Thomas 1   na1 ,
  • Iveta Demirova 1   na1 ,
  • Gracia Gu 1   na1 ,
  • Eddie Cai 1   na1 ,
  • Carol X.-Q. Chen 1 ,
  • Narges Abdian 1 ,
  • Theodore J. P. Strauss 3 ,
  • Sabah Kelaï 2 ,
  • Angela Nauleau-Javaudin 1 ,
  • Lenore K. Beitel 1 ,
  • Nicolas Ramoz 2 ,
  • Philip Gorwood 2 &
  • Thomas M. Durcan 1  

Scientific Reports volume  11 , Article number:  21293 ( 2021 ) Cite this article

16k Accesses

8 Citations

14 Altmetric

Metrics details

  • Bioinformatics
  • Data processing
  • PCR-based techniques
  • Pluripotent stem cells
  • Reverse transcription polymerase chain reaction
  • Stem-cell differentiation
  • Transcriptomics

Quantifying changes in DNA and RNA levels is essential in numerous molecular biology protocols. Quantitative real time PCR (qPCR) techniques have evolved to become commonplace, however, data analysis includes many time-consuming and cumbersome steps, which can lead to mistakes and misinterpretation of data. To address these bottlenecks, we have developed an open-source Python software to automate processing of result spreadsheets from qPCR machines, employing calculations usually performed manually. Auto-qPCR is a tool that saves time when computing qPCR data, helping to ensure reproducibility of qPCR experiment analyses. Our web-based app ( https://auto-q-pcr.com/ ) is easy to use and does not require programming knowledge or software installation. Using Auto-qPCR, we provide examples of data treatment, display and statistical analyses for four different data processing modes within one program: (1) DNA quantification to identify genomic deletion or duplication events; (2) assessment of gene expression levels using an absolute model, and relative quantification (3) with or (4) without a reference sample. Our open access Auto-qPCR software saves the time of manual data analysis and provides a more systematic workflow, minimizing the risk of errors. Our program constitutes a new tool that can be incorporated into bioinformatic and molecular biology pipelines in clinical and research labs.

Similar content being viewed by others

qpcr data presentation

PIPE-T: a new Galaxy tool for the analysis of RT-qPCR expression data

Nicolò Zanardi, Martina Morini, … Davide Cangelosi

qpcr data presentation

A user guide for the online exploration and visualization of PCAWG data

Mary J. Goldman, Junjun Zhang, … Miguel Vazquez

qpcr data presentation

Ensemble of nucleic acid absolute quantitation modules for copy number variation detection and RNA profiling

Lucia Ruojia Wu, Peng Dai, … David Yu Zhang

Introduction

Polymerase chain reaction (PCR) identifies a nucleic acid fragment of interest by increasing its proportion relative to others 1 . Initially the technique was primarily used to visualize DNA fragments for cloning 2 , 3 or genotyping 4 , 5 , 6 , but can now be used to investigate genetic polymorphisms and mutations 7 , 8 , copy number variants (CNVs) 9 , single nucleotide variants (SNVs), point mutations, and genetic deletion/duplication events 10 . With the development of fluorogenic probes and dyes capable of binding newly synthesized DNA, PCR became more quantitative, leading to innovative tools for quantifying relative transcript levels for one or more genes, now referred to as quantitative PCR (qPCR). With these technological advancements, qPCR is now used to quantify messenger RNA (mRNA) 11 , long non-coding RNA 12 , microRNAs 13 , 14 , DNA–protein interactions 15 and epigenetic modifications 16 , 17 . Thus, the advent of PCR has revolutionized our ability to analyze and quantify nucleic acids and has made qPCR a standard technique.

qPCR experiments are already automated at the data acquisition stage, with thermocycler software providing “by default” pre-processing procedures 18 . However, several steps (data exclusion, normalization, data display and differential analyses) required for full data interpretation are heterogenous, and the data processing and display methods and options vary widely across available licenced qPCR programs. Commercially available software that provide data summaries and statistical output do not systematically allow for user selections and are not necessarily transparent as to the processes and settings being used. Not knowing the conditions for data flagging or exclusion and normalization can lead to misinterpretation of the results. Also, not all qPCR software provides a statistical output. Analysis of qPCR data is still highly time consuming and error prone, especially when processing large numbers of data points. The user must intervene to include or exclude replicates, which, without guidelines or standardized procedures, can potentially introduce “user-dependent” variation and errors. To both simplify and accelerate this data analysis step for qPCR datasets, we have created a Python-based, open source, user-friendly web application “Auto-qPCR” to process exported qPCR data and to provide summary tables, visual representations of the data, and statistical analysis. The program can be found at the website https://auto-q-pcr.com/ . Furthermore, the program can be installed locally, and then run offline.

The program can work with the two commonly used molecular biology approaches: (i) absolute quantification, where all RNA estimations rely on orthogonal projection of the samples of interest onto a calibration curve 19 , and (ii) relative quantification that relies on difference of cycle threshold (CT) values between the gene of interest and endogenous controls 20 .

Here we use Auto-qPCR to analyze qPCR datasets and illustrate four distinct computational methods. Overall, Auto-qPCR provides an all-in-one solution for the user, going from datasets to graphs, within one web-based software package. Unlike other software, the intermediate and final results are output by the program, allowing a full review of the data and accurate statistical treatment based on the experimental design. Auto-qPCR was conceived to build logical links between the experimental design and required statistics for differential analyses of each mode, which is rarely found in other qPCR programs. While other open-source qPCR analysis software programs and web apps 21 , 22 , 23 are available, they are only able to normalize, compare and display qPCR data generated with one of the two quantification modes 19 , 20 . In contrast, Auto-qPCR provides a comprehensive data analysis package for a wide variety of qPCR experiments. Using the web app does not require prior programming knowledge, account creation or desktop installation. Additionally, the program has been designed to assist the user at each step of the analysis once the exported data files have been collected from the qPCR system.

Auto-qPCR can be used to analyse qPCR data in a reproducible manner, simplifying data analysis, avoiding potential human error, and saving time. In this manuscript, we describe some of the uses of the software and outline the steps required, from entering an individual dataset to complete statistical analysis and graphical presentation of the data.

Culture of iPSC lines

To illustrate the four different models of quantification managed by the Auto-qPCR program, we used 11 different iPSC cells lines whose properties are presented in Table S1 . Quality control profiling for the iPSCs used was outlined previously 24 .

The use of iPSCs in this research is approved by the McGill University Health Centre Research Ethics Board (DURCAN_IPSC/2019-5374).

For the cell lines GM25952, GM35953, GM25974, GM25975, fibroblasts were ordered from the Coriell Institute and reprogrammed at the Montreal Neurological Institute. The NCRM1 iPSC line was requested from the NIH Center for Regenerative Medicine (NIH CRM, http://nimhstemcells.org/crm.html ). The KYOUDXR0109B iPSC line was ordered from ATCC company. For the following iPSC cell lines—AiW001-2, AiW002-2, AJG001-C4, AJC001-5 and 522-2666-2—somatic cells were collected and reprogrammed at the Montreal Neurological Institute.

The iPSCs were seeded on Matrigel-coated dishes and expanded in mTESR1 (StemCell Technologies) or Essential 8 (ThermoFisher Scientific) media. Cells were seeded at 10–15% confluency and incubated at 37 °C in a 5% CO 2 environment. The media was changed daily until the cultures reached 70% confluency. Cells harbouring irregular borders, or transparent centres were manually removed from the dish prior to dissociation with Gentle Cell Dissociation media (StemCell Technologies). The iPSCs were then seeded and differentiated into cortical or dopaminergic neuronal progenitors or neurons.

Generation of cortical and dopaminergic neurons

The induction of cortical progenitors was performed as described previously 25 . The media used for cortical differentiation is described in the standard operating procedure published on the Early Drug Discovery Unit (EDDU) website 24 . Once neural progenitor cells (NPCs) attained 100% confluency, they were passaged and seeded on a Poly-Ornithine-laminin coated dishes to be differentiated into neurons. Cells were switched for 24 h to 50% Neurobasal (NB) medium, and 24 h later placed in 100% NB medium with AraC (0.1 µM) (Sigma) to reduce levels of dividing cells. After the third day of differentiation, cells were maintained in 100% NB medium without AraC for four days before being collected for RNA extraction. IPSCs were induced into dopaminergic NPCs (DA-NPCs) according to methods previously described 26 , modified according to methods used within the group 27 . DA-NPCs were subsequently differentiated into dopaminergic neurons (DANs), with immunostaining and qPCR analysis performed at four and six weeks of maturation from the NPC stage 28 .

DNA and RNA extraction

IPSCs were dissociated with Gentle Cell Dissociation Reagent (Stem Cell Technologies) while Accutase ® Cell Dissociation Reagent (Thermo Fisher Scientific) was used to dissociate NPCs and iPSC-derived neurons. After 5 min incubation at 37 °C with the indicated dissociation agent, cells were collected and harvested by centrifugation for 3 min at 1200 rpm. Cell pellets were resuspended in lysis buffer and stored at − 80 °C before DNA or total RNA extraction with the Genomic DNA Mini (Blood/Culture Cell) (Genesis) or mRNAeasy (Qiagen) kits, respectively.

cDNA synthesis, quantitative PCR, and data export

Reverse transcription reactions were performed on 400 ng of total RNA extract to obtain cDNA in a 40 μl total volume containing, 0.5 μg random primers, 0.5 mM dNTPs, 0.01 M DTT and 400 U/µl-MMLV RT (Carlsbad, CA, USA). The reactions were conducted in single plex, in a 10 µl total volume containing 2 × TaqMan Fast Advanced Master Mix, 20 × TaqMan primers/probe set (Thermo Fisher Scientific), 1 µl of diluted cDNA and RNAse-free H 2 O. Real-time PCR (RT-PCR) were performed on a QuantStudio 3 or QuantStudio 5 machines (Thermo Fisher Scientific). Primers/probe sets from Applied Biosystems were selected from the Thermo Fisher Scientific web site. Two endogenous controls (beta-actin and GAPDH) were used for normalization (Table S2 ).

Data generated from the QuantStudio machine were extracted using QuantStudio design and analysis software, either (i) as Excel files (*.xls or *.xlsx extensions) and the results tab was saved as a ‘comma delimited’ csv file or (ii) extracted as a txt file that only contained the result tab. Excel files should be carefully used since gene names (notably those whose numbers can be recognized as potential dates) could be modified by automatic changes in cell formatting 29 . We suggest exporting data in txt or csv file format.

Collection of external data set

An external qPCR data set was provided from an earlier published study 30 , which quantified levels of Nrxns and Nlgn transcripts in the subcortical areas of the brains from mice submitted to conditioned place preference (CPP) with cocaine. Briefly, subcortical areas (subthalamic nucleus, globus pallidum and substantia nigra) of sectioned mouse brains were isolated by laser capture microdissection. RNA was extracted with the Arcturus PicoPure kit and reverse transcription performed as above. The qPCR experiments were performed according to an absolute quantification design on the Opticon 2 PCR machine (Bio-Rad). Β2Microglobulin ( B2M ) was used as endogenous control. Data were re-extracted from the Opticon Monitor 2 files as csv files and analyzed by Auto-qPCR.

Program development and structure

The program was written in Python 3 using Pandas and NumPy. A main script calls the selected model script (absolute.py, relative.py and stability.py), which processes the data and then calls the statistical functions script (if selected) and the plotting function script. The graphical user interface (GUI) was created using Flask, a package for integrating HTML and Python code. The GUI is written in JavaScript, CSS, HTML and Bootstrap4, a framework for building responsive websites. Our GitHub repository ( https://github.com/neuroeddu/Auto-qPCR ) includes all Python processing scripts and scripts to build the GUI that can be installed locally to run on a computer. A complete list of package dependencies and versions are in the GitHub repository (requirements.txt) and File S1 . The program was developed using git version control. The web app is hosted by the Brain Imaging Centre at the Montreal Neurological Institute-Hospital (The Neuro) and was installed in a virtual machine directly from the public GitHub repository. When updates are available the changes will be applied to the web app using GitHub. The organization and function of the script files for the program are in Table S3 . The web app can be found at https://auto-q-pcr.com (Figure S1 ). The app can also be used locally, installation instructions for command line/Linux as well as executable files for Windows and Apple are on the Auto-qPCR website. Once launched, a web browser opens on the user’s computer, the app appears in the web browser identical to the online version, but no internet is required.

Program function—input data processing and quantification

The Auto-qPCR program reads the raw data in the form of a results spreadsheet (via the users file navigator) and reformats it into a data frame in Python. The user enters information into the web app read as arguments by the software. See Table S4 for a list of all the user inputs and Figure S2 for examples of the input files. The input spreadsheet needs to be organized such that samples are found in rows and values are found in columns, the required columns are: Well, Sample Name, Target Name, Task, CT (Figure S2 ), the column names do not need to match exactly. The values for the reference genes/targets ( ACTB , GAPDH ) are calculated for each sample and technical replicate (cell line, time point, treatment condition) separately.

To detect outliers, the CT standard deviation (CT-SD) of the technical replicates for a given sample is calculated, if the CT-SD is greater than the cut-off (the default value is 0.3), then the technical replicate furthest from the sample mean is removed. The process occurs recursively until the CT-SD is less than the cut-off or the value of “max outliers” is reached. This is determined by the parameter ‘Max Proportion’, the 0.5 default means that outliers will be removed until two technical replicates remain. The ‘preserve highly variable replicates’: If the CT-SD is higher than 0.3, but the absolute (mean-median)/median is less than 0.1, replicates are preserved. This helps to account for a lack of a clear outlier, where two of three replicates are close to equally distributed around the mean.

Model dependent processing: Absolute model calculates the ratio between the gene of interest and each control. For each gene/target of interest the normalized value is calculated against the mean of each control target separately, then the mean value from normalized to controls is calculated. Relative model ΔCT, without a calibration sample, calculates the ΔCT by subtracting the Control CT value from the CT value for the target from each (endogenous control), then takes mean value of the resulting deltas. Relative model ΔΔCT and genomic stability model, individually calculates the ΔCT for the target in test sample and the reference/calibration sample(s) then calculates the ΔΔCT by subtracting the reference ΔCT from the test sample. For all models, the mean value of technical replicates is calculated for each target.

For the relative models, values of reference genes are calculated separately for each input file. The data from one input file will not be applied to another file. For the absolute model, qPCR output for each gene is found in a separate file and the selected endogenous controls will be applied to all the data input in one analysis. For all models, two spreadsheets are created that can be opened in Excel. (1) “clean_data.csv” contains the ΔCT calculated for each technical replicate, including outliers, indicated by “TRUE” in the column “Outlier”. (2) and “summary_data.csv” contains the mean, standard deviation (SD) and standard error (SE) for each sample calculated from the included technical replicates; this output can easily be analyzed in another statistical program (R, SASS, Prism). All the input and output data are cleared after processing and no user data is stored in the web app.

Program function—statistical analysis

For testing differential gene expression, the user selects the statistic option and files in a form to indicate the conditions of the experiment. Either paired test ( t test) or multiple comparisons (one-way ANOVA or 2-way ANOVA) to investigate interaction effects is selected. The names of the variables to be grouped by must be within either the ‘sample names’ column in the input file or within an additional column, which was created during the qPCR setup. A column can also be added manually into the results input file(s), although this will add a risk of copy/paste errors and add additional time to the analysis process. See Table S5 for the list of which analysis is applied for each setting. All default settings are maintained for statistical functions (for details see the Pingouin documentation at https://pingouin-stat.org/ ), the output has been reformatted to be more easily read and interpreted by users and for consistency across statistical outputs.

Program function—visualization

The plotting scripts were written using the Matplotlib bar chart function. The labels and axis settings were all adjusted directly within the script (plot.py). The user can dictate the gene/target order and the sample order (cell lines, treatments, time points) in the web app by entering the orders into the appropriate input box. The order variables will be grouped for the summary plots. All the plots are automatically generated and saved as png files. If statistics are applied, two summary bar charts of the mean values are generated, grouped by the selected variable. For two-way ANOVA analysis, the summary bar chart will group the first variable on the x-axis and the second variable will be visualized in different colours and indicated in the legend.

Data availability and reproducibility

All raw csv input files data files and output files used in plots are available at https://github.com/neuroeddu/Auto-qPCR , along with a user guide. The example input (Input Data) and output files (Output Data) are all available and organized by Figure names. The parameters used for each figure can be found in the document “Notes_on_Datasets.docx” and screen shots of the filled web app from for each figure are in the Supplementary Figures . The example output will be replicated identically if the same conditions are entered.

Illustrations

The schematic representation in Fig.  1 and simplified versions in Figs.  2 , 3 , 4 were created in Adobe Illustrator Creative Cloud 2020, with icons inserted from BioRender.

figure 1

Workflow of a qPCR experiment. Schematic representation of common qPCR assays: genomic stability assay to detect DNA deletions or duplication events (green line), two methods to quantify RNA (cDNA) using either absolute (red line) or relative quantification designs (blue lines). qPCR experiments can be sub divided in two parts: the sample preparation and running the PCR machine (Experimental Workflow) and the data analyses (Auto-qPCR Program). The preparation of the experiment includes nucleic acid extraction followed by a cDNA synthesis step (for RNA) and the in silico design of the PCR plate layout. Nucleic acid preparations are accurately diluted. For the absolute model, a standard curve must be created. The experimental design of the PCR plate, including the chemistry (fluorophore, primer mix), the status of the samples, and the transcripts or DNA region that are going to be amplified, must be generated in silico . After having defined the parameters of the qPCR reactions (number of PCR cycles and length of the different steps (denaturation, hybridization and elongation), and the temperatures), the PCR is run. The exported data from the thermocycler, converted to csv, is entered into the Auto-qPCR software and the model matching the experimental design and parameters for analysis are selected. The software will reformat the data, quantify each sample normalized to controls, and create spreadsheets and graphs to visualize the data analyses, all of which will be included in a zip file for the user to save.

figure 2

Auto-qPCR can process PCR genomic stability data. ( A ) Screen capture of the Auto-qPCR web-app. ( B ) Simplified schematic of PCR workflow showing the genomic instability analysis in green. The DNA copy number is quantified with the same formula as the ΔΔ CT relative quantification model. ( C ) The calculations carried out for genomic instability testing (ΔΔ CT). Top, the general formula used where the CT values for each chromosome were normalized to a region of interest and then to a reference sample. Middle, the reference DNA region (CHR4) and the reference sample (Normal) used in this dataset. Bottom, the confidence interval for determining a genomic instability, insertion, or deletion event. ( D ) Bar chart showing the output from Auto-qPCR program running the genomic instability model. Four different iPSC cell lines are indicated and compared to the control sample. Normalized signals for all four cell lines are in the confidence interval defined by the control sample.

figure 3

Auto-qPCR can process quantitative qPCR data using a standard curve to perform statistical analysis. Output of Auto-qPCR processing using the absolute model. ( A ) Illustration of a calibration curve displaying 8 serial dilution points of a four-fold dilution which covers cDNA quantities from 0.003053 to 50 ng and establishes the linear relationship between CT values (y-axis) and the log2[RNA]. ( B ) Schematic of PCR workflow showing the pipeline for the absolute quantification using a standard curve in red. ( C ) Formula used to process a real-time PCR experiment using an absolute quantification design. Top, general formula where the linear relation between the logarithm of RNA concentration and the CT value is provided by the calibration curve. The normalized quantification is expressed as a ratio between concentrations for the gene of interest and the endogenous control(s) estimated from their respective calibration curves. Bottom, the variables specific to this dataset are shown in the general formula. ( D ) Bar chart showing the output from Auto-qPCR program using the absolute model for the normalized expression of the gene KCNJ6 for six cell lines at four different developmental stages (iPSC, induced pluripotent stem cells; NPC, Neural progenitor cells; DA4W, dopaminergic neurons at 4 weeks, DA6W: Dopaminergic neurons at 6 weeks). ( E , G ) Bar charts showing the average expression levels obtained from the three technical replicates for each cell line and time point for the three genes (S YP, KCNJ6 and GRIA1), normalized with two housekeeping genes ( ACTB : beta-actin , GAPDH ). ( E ) Mean RNA expression grouped by genes on the x-axis, cell lines and time points are indicated in legend. ( G ) Mean RNA expression grouped by cell lines and time points; the gene transcripts quantified are indicated in the legend. ( F , H ) Bar charts showing the mean expression levels of S YP, KCNJ6 and GRIA1 for four developmental stages (n = 6 cell lines). ( F ) Grouped by genes (x-axis), time points are indicated in the legend. ( H ) Grouped by time points (x-axis), the genes are indicated in the legend. One-way ANOVAs across differentiation stages for KCNJ6 , SYP and GRIA1 (p < 0.001, p < 0.001, p = 0.002).

figure 4

Auto-qPCR can process quantitative PCR data using two different relative models. Output of Auto-qPCR using the relative quantification with both the ∆CT and ∆∆CT models. ( A ) Amplification curves illustrating a difference of cycle threshold values (∆CT) between a gene of interest and an endogenous control. ( B ) Schematic of PCR workflow showing the two methods to calculate relative RNA quantity, ∆CT in dark blue and ∆∆CT in light blue. ( C ) Formula used to perform a qPCR using relative quantification models, according the ∆CT (right), or the ∆∆CT methods (left). ( D – F ) Bar charts showing the output of the delta-CT model (RQ ∆CT ). ( G – I ) Bar charts showing the output from the ΔΔ-CT model (RQ ∆∆CT ). ( D ) and ( G ) Mean normalized gene expression values from technical replicates for the genes PAX6 , CAMK2A and GRIN1 indicated on the x-axis for 2 cell lines at two stages of differentiation (D0: Neural progenitor cells, and D7: cortical neurons at 7 days of differentiation) as indicated. ( E , H ) Statistics output showing the mean gene expression from two cell lines at two stages of differentiation indicated, for the three genes indicated on the x-axis. ( F , I ) Statistics output showing the mean expression values for two cell lines at two time points on the x-axis and the three genes indicated. Differential expression between D0 and D7 is not significant ( PAX6 p = 0.40, CAMK2A p = 0.18, GRIN1 p = 0.16), t tests, n = 2.

The Auto-qPCR program functions with the workflow of a qPCR experiment

A qPCR experiment includes multiple steps that can be divided into two categories: (1) sample preparation to conduct the qPCR reaction, and (2) data analysis, visually represented in the schematic in Fig.  1 . Nucleic acids are extracted from biological samples (RNA which is converted to cDNA for quantifying gene expression levels; or genomic DNA). Prior to performing qPCR in vitro, the user must generate the in-silico experimental layout using software that monitors the biochemical reaction. The user defines the experimental design (absolute or relative quantification), the method for detecting DNA synthesis (Taqman or SybrGreen) and the location of each sample within the plate. Finally, at the end of the qPCR process/cycle/program, the recorded data is exported and then would normally be analyzed manually. In our workflow, the data is exported from the PCR machine and saved as spreadsheet in the form of a txt or csv file (Supplementary Figure S2 ). The file is then uploaded into the Auto-qPCR web app and the user enters their experimental settings.

Auto-qPCR will remove technical replicates by the selected criteria, normalize to an endogenous control, create a clean data table, and summary data table and graphs of all the results. If the user selects the statistical analysis, differential expression analyses will be performed on the designated groups. The program was designed for the most common uses of qPCR: detecting DNA fragment duplications or deletions, and quantifying gene expression levels according to the absolute or relative quantification models.

Genomic instability

A relatively new application for qPCR detects small changes within the genome, from a deletion to a duplication of a DNA segment. DNA regions known to be highly susceptible to such events can be quantified using a genomic instability qPCR test. In induced pluripotent stem cell (iPSC) research, genomic instability tests are critical for quality control to screen for duplication/deletion events that can arise during reprogramming and prolonged cell passaging 31 , 32 . We performed a qPCR test for genomic stability, where for each cell line, the signal from each DNA region of interest was compared to the endogenous control region.

We uploaded the data into the Auto-qPCR web app and selected the genomic instability model (Fig.  2 B). The endogenous control used to normalize the data, was an amplicon of a region on chromosome 4 (CHR4), a location of the genome known not to contain any instabilities. As a reference sample, we used DNA known not to have any instabilities as the calibrator (Normal) (Fig.  2 A). The genomic instability model has two steps of normalization in its general formula. This formula and the variables used in the example calculation (Fig.  2 B, C ). First, the CT values from the control region (i.e., CHR4) for each cell line are subtracted from each region of interest. Next, the ∆CT from the Normal DNA control is subtracted from the ∆CT calculated for each cell line sample. Finally, the mean is calculated from the average of multiple technical replicates included with the plate design for each sample. Thus, the ∆∆CT values are expressed as “Relative Quantification” according to the following formula: RQ = 2 −∆∆CT . If the sample has no abnormalities (deletions or duplications) the values obtained should be equal or close to 1, except for targets in the X chromosome in a male individual in which the ratio would be expected to be at 0.5. As the DNA used for PCR amplification may come from a mixed population of cells, where only some cells carry a deletion or duplication, we set an acceptable range of variation as 0.3 above and below the expected value of 1; DNA regions with RQ values between that 0.7 and 1.3 are considered normal. Values below 0.7 indicate a deletion and values above 1.3 indicate an insertion. For ease of analysis, we have included a column in the output file from the Auto-qPCR program that indicates normal, insertion or deletion (Supplementary Table S6 ). We found that all seven chromosomal regions in the four cell lines tested were between 0.7 and 1.3 and we concluded that no duplications or deletions were present (Fig.  2 D and Supplementary Fig. S3 B). Overall, we demonstrated how Auto-qPCR can be used to analyse the data from a genomic instability qPCR assay, and that the app effectively processed the data, creating a summary table and graph of the data.

Absolute quantification

For absolute quantification experiments, the quantities of RNA transcripts for a gene of interest and the endogenous controls are first estimated with a calibration curve (Fig.  3 A) to provide a mathematical relationship between the CT values and the RNA concentration or quantity. The relationship is described by the equation CT = alog 2 [RNA] + b, where “a” is the slope and “b” is the Y-intercept (Fig.  3 C) 33 . The expression levels of the RNA molecule of interest are then given by the ratio of the estimated amount of RNA for a select transcript and the estimated amounts of endogenous controls (Fig.  3 C). Consequently, the values given as “Normalized Expression Levels” depend on the levels of transcript within the biological material used to set the calibration curves. We used Auto-qPCR to compare the expression of three gene transcripts across six different cell lines at four different stages in the differentiation of neurons from iPSCs (Fig.  3 B and Supplementary Fig. S4 ). The calibration curve was made from a mix of the cDNAs generated from the reverse-transcribed RNA reactions from the four timepoints in the differentiation process and made of eight four-time serial dilutions to cover a linear relationship in a dynamic range from 1 to 16,384-fold dilution (Fig.  3 A). Raw data was normalized with two endogenous controls ( ACTB and GAPDH ) (Fig.  3 D–H and Supplementary Fig. S4 A). Auto-qPCR app provides several graphical representations of the normalized expression values. The means of technical replicates are provided for each gene (Fig.  3 D). Bar charts were generated for all gene and sample observations plotted together (grouped by gene Fig.  3 E and by sample Fig.  3 G), allowing for an overview of the data and visualization of the biological variation between cell lines at a given stage.

We used the statistical module in Auto-qPCR to test for changes in gene expression over the different stages of neuronal differentiation; the different cell lines were considered as biological replicates (Supplementary Fig. S5 ). As there are more than two groups, the Auto-qPCR software runs a one-way-repeated measures ANOVA for each gene. Two summary plots (Fig.  3 F, H ) and two statistical output tables were generated: one for the ANOVAs and one for the secondary measures (Supplementary Tables S7 and S8 ). There was a significant effect of the differentiation stage on the expression of synaptic markers. The t tests with false discovery rate (FDR) correction for pairwise comparisons of each stage showed that iPSCs have significantly less expression of each synaptic marker than DAN differentiated for 4 and 6 weeks (Supplementary Table S8 ), indicating that the differentiation protocol is successful for all cell lines tested, with each iPSC differentiating into progenitors and ultimately DAN (Supplementary Figure S5 ). We show that raw absolute qPCR data was effectively processed by Auto-qPCR, creating summary data, visualization and statistics for differential gene expression between conditions.

Relative quantification

In addition to absolute quantification, the Auto-qPCR software also enables the processing of qPCR data obtained according to a relative quantification design. Contrary to absolute quantification, relative quantification does not require a calibration curve, and quantification (of transcripts) is based on the CT difference between a transcript of interest and one or more endogenous controls (Fig.  4 A). Relative qPCR is optimal for two kinds of comparisons: (1) detecting a difference in gene expression between two different conditions, and (2) detecting a difference between two transcripts within the same condition. Relative quantification can be expressed either as RQ = 2 −∆CT , where samples are normalized to internal control(s), or RQ = 2 −∆∆CT , where a given sample is considered as a calibrator for the unknown samples (Fig.  4 B, C ).

To illustrate the functions of the program, we compared the expression levels of two different control cell lines at two developmental stages, indicated as D0 (neural precursor cells) and D7 (7 days of differentiation into cortical neurons). We measured the expression levels of the progenitor marker PAX6 , and two markers of neuronal differentiation ( GRIN1 and CAMK2A ) and normalized to the housekeeping genes ACTB and GAPDH.

We used the Auto-qPCR app to process the same data twice, for a direct comparison of the two distinct relative quantification options (Supplementary Fig. S6 ). Figure  4 D shows the mean expression from technical triplicates calculated by selecting the RQ = 2 −∆CT . The ∆CT approach (not using a sample as calibrator) allows a comparison of the expression levels for the three different transcripts. We observed that relative to the endogenous controls, the D0 expression values for each transcript varied widely between the two cell lines tested. However, as expected for both cell lines, PAX6 expression is higher at the D0 stage compared to D7. Conversely, both GRIN1 and CAMK2A exhibited higher expression at the D7 stage compared to D0. Using the statistics module in the Auto-qPCR app, we compared the mean levels of each gene transcript at D0 and D7 using paired t tests for each gene (Fig.  4 E, F ). We found that although there were clear differences in expression, they were not significant between D0 and D7, likely a result of there only being two samples for each time point (Supplementary Table S9 and Supplementary Fig. S6 A and S7 ). Interestingly, we found that the CAMK2A RQ ∆CT was twice the level of GRIN1 at D7 RQ ∆CT (Fig.  4 F).

We next analysed this dataset with the RQ ∆∆CT model (indicated as ΔΔCT) in the web app (Supplementary Fig. S6 B) where transcript levels are compared to both control gene expression (in this case ACTB and GAPDH ) and a calibration sample; in this case we set one sample, AIW002-02-D0 arbitrarily as the reference sample (Fig.  4 G). Here we can easily compare expression in a test condition relative to a control condition by displaying the results as fold change in expression. All decreases are displayed as between 0 and 1 and all the increased expression levels are above 1 (Fig.  4 C). With the double normalization (RQ ∆∆CT ), all values were expressed as a variation compared to the calibrator (AIW002-2-D0) as seen in Fig.  4 G–I. As in the RQ ∆CT model, the changes in gene expression from D0 to D7 were not significant (Supplementary Table S10 ). Although the ratio of expression for a given gene in each cell line between DO and D7 remained unchanged, differential expression between genes can no longer be analysed. The RQ ∆∆CT shown in Fig.  4 H showed that PAX6 expression was higher at D0 than D7 and that CAMK2a and GRIN1 expression were both higher at D7 than D0, as seen in Fig.  4 E using the RQ∆ CT model. However, with the double normalization, the increase in GRIN1 expression from D0 to D7 appears much larger than the increase in CAMK2a expression (Fig.  4 H, I ), which was the opposite result from the single normalization model (RQ ∆CT ) (Fig.  4 E, F ). Our findings highlight the need to analyze data with attention to the biological question. Using only the RQ ∆∆CT analysis, one might mistakenly believe the increase in GRIN1 expression is greater than that of CAMK2a. With Auto-qPCR we provide a quick easy option to process the exported qPCR data with two different relative models. We show the same gene expression ratios between the two time points, but different expression gene levels using the different relative quantitation models.

Auto-qPCR produces the same results as manual processing of a previously published dataset

One of our objectives was to provide a tool for analyzing data from qPCR experiments generated with different qPCR machines. We reanalyzed a published dataset generated by the Gorwood lab 30 , on a different machine (Opticon 2, Bio-Rad). The original study measured gene expression in three sub cortical areas (subthalamic nucleus (STN), substantia nigra (SN) and globus pallidus (GP) of mice submitted to a place preference paradigm to cocaine 30 . Manual processing shows a significant increase in Nrxn3 expression in the cocaine-treated group compared to control, specifically in the GP (Fig.  5 A).

figure 5

Auto-qPCR can process data from different thermocyclers and produce the same results as manual processing. ( A ) Bar chart showing the mean Nrxn3 expression level normalized to B2M levels assessed with an absolute quantification design manually processed and plotted in Prism, grouped by brain regions (STN: subthalamic nucleus, GP: globus paladus, SN: substantia nigra) on the x-axis, with and without cocaine treatment. ( B ) Output of Auto-qPCR processing the same dataset. Nrxn3 normalized expression levels from technical replicates for each biological sample. The treatment conditions are indicated below the x-axis. ( C ) Statistics output of Auto-qPCR program comparing cocaine and control groups. Nrxn3 normalized expression levels in the combined brain regions. Expression is not significantly different, p = 0.113, t test, n = 13. ( D ) Auto-qPCR statistical output showing mean Nrxn3 expression combining treatments and comparing the three brain regions. One-way ANOVA shows significant effect of brain regions, FDR adjusted p < 0.001, n = 9 for GP and SN, n = 10 STN. ( E ) Bar chart of Nrxn3 expression shown as six groups distinguished by brain region and treatment generated by Auto-qPCR program after a one-way ANOVA, p < 0.001, n = 4 or 5. Post hoc analysis using multiple t test with FDR correction comparing treatment at each brain region: SNT p = 0.990, GP p = 0.033, SN p = 0.413. ( F ) Bar chart of Nrxn3 average normalized by brain region (x-axis) and treatment, generated by Auto-qPCR program after a two-way ANOVA, brain region p < 0.001, treatment p = 0.2265, n = 4 or 5. Post hoc analysis using multiple t test with FDR correction comparing each brain region with and without cocaine: SNT p = 0.0.998, GP p = 0.053 and p-unadjusted = 0.017, SN p = 0.619 ( G ) Bar chart of the average Nrxn3 normalized expression levels in the GP compared between the two groups with a t test (p = 0.0176).

We next processed the raw data using the Auto-qPCR web app absolute quantification pipeline and normalized to B2M (Fig.  5 B and Supplementary Figure S8 A). This summary data closely matched the manually calculated data (Supplementary Table S11 ). The standard method of removing outliers from technical replicates is to remove the replicate most different from the mean, if the CT standard deviation (CT-SD) is above 0.3. Under ‘Options for removing technical replicates’ in the Auto-qPCR software the threshold can be adjusted. During manual analysis, each set of technical replicates is inspected when the CT-SD value is above 0.3, when one replicate is clearly different from the other two the divergent value will be removed. There are some instances in manual processing where no replicates are removed when the CT-SD is greater than 0.3, because the triplicate values are evenly distributed. Auto-qPCR has an option to account for this type of data when the user selects ‘preserve highly variable values’. With this option a replicate is only removed if the median is far from the mean. We processed the Nrxn3 expression data with a range of CT-SD cut-off values to display the difference in outcomes and with or without preserving highly variable replicates (Supplementary Table S11 ). We compared the variances generated by the differences between the expression values from manual treatment and from Auto-qPCR using a CT-SD cut-off of 0.3 with or without preserving highly variable replicates. We found that the preservation of highly variable option combined with a cut-off at 0.3 generate a 20% decrease in the variance between manual and automatic treatments (Supplementary Table S12 ) and preserved values falsely estimated as outliers by manual processing, which illustrates the subjectivity of the user with respect to the decision to retain or exclude a value based on criteria of divergence Our analysis suggests that applying two rules of data filtering provides a more systematic data analysis method and minimizes interindividual bias. Here we applied the standard cut-off of 0.3 and preserved highly variable replicates, appropriate for the highly variable and RNA level experimental samples we are analyzing.

Auto-qPCR also permits statistical groups to be designated in the sample name or in a specific group column, which can be added into the qPCR data during the plate set up or later in the results spreadsheet. To allow for statistical analysis of this data, we added a grouping column into the raw data files (Supplementary Table S13 ) and using the Auto-qPCR statistics module, we reanalysed the effect of drug treatment and brain regions on expression of Nrxn3 across several parameters. We first compared the overall effect of cocaine on expression after pooling the three brain regions and found that although the expression of Nrxn3 was increased across brain regions with cocaine treatment, there was no overall significant effect of drug treatment (Fig.  5 C, Supplementary Fig. S9 A and Supplementary Table S14 ). Comparing the three brain regions while pooling together control and cocaine treatment showed a significant difference in expression across brain regions. Post-hoc analysis revealed Nrxn3 expression in the STN was significantly lower than in the GP and SN (Fig.  5 D, Supplementary Fig. S10 A and Supplementary Table S15 ). When we considered each brain region with and without treatment as independent conditions, and individual mice as biological replicates and used a one-way ANOVA followed by post hoc tests using multiple t test with a correction for multiple comparisons we find cocaine significantly increased Nrxn3 expression specifically in the GP and not in the SN or STN (Fig.  5 E and Supplementary Table S16 ). To apply the identical statistical treatment as originally presented, we performed a two-way ANOVA followed by a repeated measures t tests with FDR correction on the interaction variable between treatment and brain region, using Auto-qPCR, and found the same results as the one-way ANOVA (Fig.  5 F, Supplementary Fig. S10 B and Supplementary Table S17 ) and a t test of the GP alone (Fig.  5 G), all in agreement with the originally published results 30 . Together the data shows that the Auto-qPCR software is capable of processing data generated by another machine and the results match those processed manually.

This paper presents Auto-qPCR, a new web app for qPCR analysis and provides examples of the functionalities of the software applied to qPCR experimental datasets generated from DNA (genomic instability assay), cDNA amplification, and RNA transcripts (absolute and relative quantification data). We have also summarized the computational bases of relative and absolute quantifications performed by Auto-qPCR, which is important for users to understand during experimental design. The Auto-qPCR web app also provides a statistical module that will be applicable to the majority of qPCR analysis experiments, and provides a correction across multiple tests, when more than two samples are compared, to mitigate against false positives. As not all experimental designs require differential analyses, the user can use Auto-qPCR without statistical analysis, calculating normalized RNA concentrations, and a summary table and graphs will be generated. Furthermore, the web app can be used with no installation or login requirements. We have created an easy-to-use program that is completely free and open source, able to process data from different qPCR machines and all common experimental designs, that will be advantageous for any lab performing qPCR experiments.

Given the importance of qPCR in molecular biology, other programs are available to perform many steps of the qPCR data treatment 18 , 21 , 22 , 23 , 34 . The Q-PCR and PIPE-T programs were designed to treat and display qPCR data generated according to a relative quantification model 23 , 34 . SATQPCR is a web app that treats qPCR data using the relative quantification model and performs differential analyses. However, it does not take the exported results files directly from the qPCR data and requires manually preformatting of the data before analysis 22 . ELIMU-MDx is a web-based interface conceived to collect specific information regarding qPCR assays for diagnostic purposes. EILMU-MDx functions as a data management system, processes qPCR data generated using the absolute quantification method and requires an account and login information 21 . Finally, another web app “Do my qPCR calculations” requires no login but needs manual preformatting of an Excel sheet to upload or enter values directly. It also provides relative quantification results, but requires manual preformatting of an Excel sheet to upload or entering values directly 35 . The main specifications of these programs relative to ours are presented in Supplementary Table S18 for side-by-side comparison.

Reviewing different software published to serve similar purposes highlights the unique characteristics of Auto qPCR, as no other web app combines all the features we have included in our software. First as a web app, Auto-qPCR does not require installation or a user login and can be accessed from any device connected to internet. Furthermore, for the users who want to work on their analysis off-line, we also provide the option to install the program onto their computer, which entirely reproduces the environment of the web-app. Second, data processed by Auto-qPCR does not require any preformatting of the results file to be performed manually. Instead, once the qPCR experiment is complete, our program takes the csv or txt export file directly from the thermocycler so there is no copy/paste or formatting step to be done by the user. Third, Auto-qPCR can manage the data from multiple separate absolute files at once, as well as batch process multiple results files from a relative quantification. The program creates a clean data set (with all technical replicates) and a summary data table. Fourth, unlike the other software mentioned above, Auto-qPCR includes three different models, conceived to support qPCR data generated from absolute and two methods of relative quantification designs. No other program provides the option of choosing between the two relative quantification methods. Fifth, we provide normalization to multiple reference genes and calculate the mean normalized value for each replicate, and not the sample mean, an important feature implemented in relatively few other programs. This avoids the RNA quantity value being influenced by extreme values. Sixth, we extend the use of the program to suit qPCR data from DNA quantification. Finally, we provide an extensive statistics module for calculating differential gene expression that requires no additional input files. Options are included for experimental designs that include two or more sample comparisons ( t test, one- and two-way ANOVA and the equivalent non-parametric tests) and automatically generates bar charts for data visualization and summary tables with the statistical results. In summary, we have created a unique, easy to use qPCR analysis program that can benefit any researcher or lab that needs to analyze qPCR data on a regular basis, by saving time, avoiding errors and generating reproducible, figure-ready plots.

Auto-qPCR provides users the option for relative quantification by two methods: expression relative to endogenous control genes only (∆CT method) or relative to endogenous genes and also normalized to a control condition (∆∆CT method). Although the ∆∆CT method is considered the gold standard to express, in one number, the variation in gene expression between two conditions and the amplitude of that change in expression 36 , it does not account for inter gene expression variation within the control condition 37 . The differences between quantifying relative expression with or without a control condition used as a calibrator, are clearly demonstrated above (Fig.  4 ). Expression levels of GRIN1 and CAMK2a calculated with either relative quantification model was increased at seven days of differentiation (D7) compared to day zero (DO). However, we also found that GRIN1 and CAMK2A had different levels in the baseline condition (∆CT), thus we observe that information is lost when using a ∆∆CT normalization. For relative quantification using a ∆∆CT normalization we measured a fold change of variation compared to a control condition for a given gene 38 , but information about differences of expression between two genes in control condition were not observed (Fig.  4 F). We have provided both the gold standard method of relative quantification and a method to calculate gene expression without a reference sample, to allow users to quickly determine expression changes without losing information about the level of expression in control conditions.

Reprocessing the external dataset highlighted two main advantages of treating qPCR dataset with a program. First, manual analysis of qPCR data is time consuming. Second, comparing both data treatments (manual and program-assisted) has shown that one important source of variation between results of manual analysis is the inconsistent rules used for data exclusion. Although removing one outlier from technical replicates, in the vast majority of cases, improves the CT standard deviation (CT-SD) by decreasing it under the commonly accepted threshold of 0.3, in many cases researchers decide to keep a technical replicate even if the CT-SD value is above 0.3. These judgement calls frequently occur when transcripts have low expression levels and the high variance between technical replicates does not permit a decision based on the adjustment of the CT-SD. To account for these situations, we incorporated a second rule for data inclusion/exclusion based on the distance between the arithmetic mean and the median value of technical replicates to determine the most acceptable set of technical replicates. Applying such an algorithm to the user’s judgement removes variability and potential bias in the resulting normalized gene expression levels. We were able to reprocess external data using Auto-qPCR and acquired the same summary output, reaching the same conclusions as the initial study. We showed that Auto-qPCR can process data from different PCR machines and matched the expected outcome from manual processing without the risk of bias or errors. Using a double rule for data inclusion/exclusion for highly variable signal between technical replicates, the program provides a unique treatment that will considerably reduce the risk of variability and mistakes generated by and between users during manual data processing.

The Auto-qPCR program does have has some limitations, but it also has and a number other potential uses not included in this manuscript. Although the program is able to compute data from independent qPCR plates in single plex (where each plate has a different amplicon), Auto-qPCR has not been adjusted at this stage to manage duplex qPCR (with one endogenous control and one transcript of interest quantified in the same well). Auto-qPCR has also not been equipped to process an inter-plate calibrator, required to cover a sample size of more than one plate, in absolute quantification mode experimental designs. Finally, as most of the primer sets for gene expression are now predesigned and eventually pretested by companies taking in consideration optimal efficiencies of amplification, correction factors for efficiencies have not been added into the Auto-qPCR algorithms. Despite these caveats, we propose that Auto-qPCR could be employed in a variety of molecular biology protocols and many of these features could be added in future iterations. Auto-qPCR is capable of analyzing data from a chromatin immunoprecipitation experiment followed by specific DNA amplification 15 . The analyses could be performed using either the absolute or the relative quantification models. The absolute quantification method would permit testing primer efficiency through the calibration curve 39 , and the DNA target amplification would be normalized to an unbound DNA as previously described 40 , 41 . Alternatively, the level of DNA/protein interaction can be estimated using the relative quantification models with one or several regions, known to be unbound by a protein of interest, as endogenous control(s) (∆CT mode) and with a biological condition as a calibrator (∆∆CT mode). Auto-qPCR is flexible enough to let the user choosing the most appropriate model to use, based on the information available on the DNA regions to amplify and analyze.

The Auto-qPCR program was conceived to treat, analyze, and display qPCR data generated using either relative or absolute quantification designs, while limiting errors related to manual processing. Data processing tools cannot replace or supplement appropriate experimental design and statistical power. The conditions included with the design and interpretation of the results still remain in the user’s hand. We have provided a tool that will provide easy, reproducible analysis without user errors for unlimited samples. Although, we cannot computationally remove the need for replication and controls, analysis time will no longer be a limitation. Auto-qPCR permits researchers to conduct studies with larger experimental designs while minimizing the risk of mistakes during the data analysis.

Abbreviations

Cycle threshold

Quantitative polymerase chain reaction

Induced pluripotent stem cells

Copy number variants

Single nucleotide variants

Dopaminergic

Neural precursor cells

Saiki, R. K. et al. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230 , 1350–1354. https://doi.org/10.1126/science.2999980 (1985).

Article   ADS   PubMed   CAS   Google Scholar  

Magnuson, V. L. et al. Substrate nucleotide-determined non-templated addition of adenine by Taq DNA polymerase: Implications for PCR-based genotyping and cloning. Biotechniques 21 , 700–709. https://doi.org/10.2144/96214rr03 (1996).

Article   PubMed   CAS   Google Scholar  

Scharf, S. J., Horn, G. T. & Erlich, H. A. Direct cloning and sequence analysis of enzymatically amplified genomic sequences. Science 233 , 1076–1078. https://doi.org/10.1126/science.3461561 (1986).

Beggs, A. H., Koenig, M., Boyce, F. M. & Kunkel, L. M. Detection of 98% of DMD/BMD gene deletions by polymerase chain reaction. Hum. Genet. 86 , 45–48 (1990).

Article   CAS   Google Scholar  

Mullis, K. B. & Faloona, F. A. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol. 155 , 335–350 (1987).

Saiki, R. K., Bugawan, T. L., Horn, G. T., Mullis, K. B. & Erlich, H. A. Analysis of enzymatically amplified beta-globin and HLA-DQ alpha DNA with allele-specific oligonucleotide probes. Nature 324 , 163–166. https://doi.org/10.1038/324163a0 (1986).

De la Vega, F. M., Lazaruk, K. D., Rhodes, M. D. & Wenz, M. H. Assessment of two flexible and compatible SNP genotyping platforms: TaqMan SNP Genotyping Assays and the SNPlex Genotyping System. Mutat. Res. 573 , 111–135. https://doi.org/10.1016/j.mrfmmm.2005.01.008 (2005).

Ye, S., Dhillon, S., Ke, X., Collins, A. R. & Day, I. N. An efficient procedure for genotyping single nucleotide polymorphisms. Nucleic Acids Res. 29 , E88–E88. https://doi.org/10.1093/nar/29.17.e88 (2001).

Article   PubMed   PubMed Central   CAS   Google Scholar  

D’Haene, B., Vandesompele, J. & Hellemans, J. Accurate and objective copy number profiling using real-time quantitative PCR. Methods 50 , 262–270. https://doi.org/10.1016/j.ymeth.2009.12.007 (2010).

Charbonnier, F. et al. Detection of exon deletions and duplications of the mismatch repair genes in hereditary nonpolyposis colorectal cancer families using multiplex polymerase chain reaction of short fluorescent fragments. Cancer Res. 60 , 2760–2763 (2000).

PubMed   CAS   Google Scholar  

Wong, M. L. & Medrano, J. F. Real-time PCR for mRNA quantitation. Biotechniques 39 , 75–85. https://doi.org/10.2144/05391RV01 (2005).

Gupta, R. A. et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464 , 1071–1076. https://doi.org/10.1038/nature08975 (2010).

Article   ADS   PubMed   PubMed Central   CAS   Google Scholar  

Shi, R. & Chiang, V. L. Facile means for quantifying microRNA expression by real-time PCR. Biotechniques 39 , 519–525. https://doi.org/10.2144/000112010 (2005).

Varkonyi-Gasic, E., Wu, R., Wood, M., Walton, E. F. & Hellens, R. P. Protocol: A highly sensitive RT-PCR method for detection and quantification of microRNAs. Plant Methods 3 , 12. https://doi.org/10.1186/1746-4811-3-12 (2007).

Mukhopadhyay, A., Deplancke, B., Walhout, A. J. & Tissenbaum, H. A. Chromatin immunoprecipitation (ChIP) coupled to detection by quantitative real-time PCR to study transcription factor binding to DNA in Caenorhabditis elegans . Nat. Protoc. 3 , 698–709. https://doi.org/10.1038/nprot.2008.38 (2008).

Dahl, J. A. & Collas, P. Q2ChIP, a quick and quantitative chromatin immunoprecipitation assay, unravels epigenetic dynamics of developmentally regulated genes in human carcinoma cells. Stem Cells 25 , 1037–1046. https://doi.org/10.1634/stemcells.2006-0430 (2007).

Milne, T. A., Zhao, K. & Hess, J. L. Chromatin immunoprecipitation (ChIP) for analysis of histone modifications and chromatin-associated proteins. Methods Mol. Biol. 538 , 409–423. https://doi.org/10.1007/978-1-59745-418-6_21 (2009).

Pabinger, S., Rodiger, S., Kriegner, A., Vierlinger, K. & Weinhausel, A. A survey of tools for the analysis of quantitative PCR (qPCR) data. Biomol. Detect. Quantif. 1 , 23–33. https://doi.org/10.1016/j.bdq.2014.08.002 (2014).

Article   PubMed   PubMed Central   Google Scholar  

Bustin, S. A. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J. Mol. Endocrinol. 25 , 169–193 (2000).

Pfaffl, M. W. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 29 , e45. https://doi.org/10.1093/nar/29.9.e45 (2001).

Krahenbuhl, S. et al. ELIMU-MDx: A web-based, open-source platform for storage, management and analysis of diagnostic qPCR data. Biotechniques 68 , 22–27. https://doi.org/10.2144/btn-2019-0064 (2020).

Rancurel, C., van Tran, T., Elie, C. & Hilliou, F. SATQPCR: Website for statistical analysis of real-time quantitative PCR data. Mol. Cell Probes 46 , 101418. https://doi.org/10.1016/j.mcp.2019.07.001 (2019).

Zanardi, N. et al. PIPE-T: A new Galaxy tool for the analysis of RT-qPCR expression data. Sci. Rep. 9 , 17550. https://doi.org/10.1038/s41598-019-53155-9 (2019).

Chen, C. X. Q. et al. Standardized quality control workflow to evaluate the reproducibility and differentiation potential of human iPSCs into neurons. Methods Protoc. 4 , https://doi.org/10.3390/mps4030050 (2021).

Bell, S. et al. A rapid pipeline to model rare neurodevelopmental disorders with simultaneous CRISPR/Cas9 gene editing. Stem Cells Transl. Med. 6 , 886–896. https://doi.org/10.1002/sctm.16-0158 (2017).

Kriks, S. et al. Dopamine neurons derived from human ES cells efficiently engraft in animal models of Parkinson’s disease. Nature 480 , 547–551. https://doi.org/10.1038/nature10648 (2011).

Chen, E. S. et al. Induction of Dopaminergic or Cortical neuronal progenitors from iPSCs. Zenodo. https://doi.org/10.5281/zenodo.3364831 (2019).

Chen, E. S., Lauinger, N., Rocha, C., Rao, T. & Durcan, T. M. Generation of dopaminergic or cortical neurons from neuronal progenitors. Zenodo. https://doi.org/10.5281/zenodo.3361005 (2019).

Abeysooriya, M., Soria, M., Kasu, M. S. & Ziemann, M. Gene name errors: Lessons not learned. PLoS Comput. Biol. 17 , e1008984. https://doi.org/10.1371/journal.pcbi.1008984 (2021).

Kelai, S. et al. Nrxn3 upregulation in the globus pallidus of mice developing cocaine addiction. NeuroReport 19 , 751–755. https://doi.org/10.1097/WNR.0b013e3282fda231 (2008).

Tosca, L. et al. Genomic instability of human embryonic stem cell lines using different passaging culture methods. Mol. Cytogenet. 8 , 30. https://doi.org/10.1186/s13039-015-0133-8 (2015).

Yoshihara, M., Hayashizaki, Y. & Murakawa, Y. Genomic instability of iPSCs: Challenges towards their clinical applications. Stem Cell Rev. 13 , 7–16. https://doi.org/10.1007/s12015-016-9680-6 (2017).

Ovstebo, R., Haug, K. B., Lande, K. & Kierulf, P. PCR-based calibration curves for studies of quantitative gene expression in human monocytes: Development and evaluation. Clin. Chem. 49 , 425–432. https://doi.org/10.1373/49.3.425 (2003).

Pabinger, S. et al. QPCR: Application for real-time PCR data management and analysis. BMC Bioinform. 10 , 268. https://doi.org/10.1186/1471-2105-10-268 (2009).

Tournayre, J., Reichstadt, M., Parry, L., Fafournoux, P. & Jousse, C. “Do my qPCR calculation”, a web tool. Bioinformation 15 , 369–372. https://doi.org/10.6026/97320630015369 (2019).

Schmittgen, T. D. & Livak, K. J. Analyzing real-time PCR data by the comparative C(T) method. Nat. Protoc. 3 , 1101–1108. https://doi.org/10.1038/nprot.2008.73 (2008).

Yuan, J. S., Reed, A., Chen, F. & Stewart, C. N. Jr. Statistical analysis of real-time PCR data. BMC Bioinform. 7 , 85. https://doi.org/10.1186/1471-2105-7-85 (2006).

Rao, X., Huang, X., Zhou, Z. & Lin, X. An improvement of the 2(−delta delta CT) method for quantitative real-time polymerase chain reaction data analysis. Biostat. Bioinform. Biomath. 3 , 71–85 (2013).

MATH   Google Scholar  

Brankatschk, R., Bodenhausen, N., Zeyer, J. & Burgmann, H. Simple absolute quantification method correcting for quantitative PCR efficiency variations for microbial community samples. Appl. Environ. Microbiol. 78 , 4481–4489. https://doi.org/10.1128/AEM.07878-11 (2012).

Mathieu, O., Probst, A. V. & Paszkowski, J. Distinct regulation of histone H3 methylation at lysines 27 and 9 by CpG methylation in Arabidopsis. EMBO J. 24 , 2783–2791. https://doi.org/10.1038/sj.emboj.7600743 (2005).

Maussion, G. et al. Investigation of genes important in neurodevelopment disorders in adult human brain. Hum. Genet. 134 , 1037–1053. https://doi.org/10.1007/s00439-015-1584-z (2015).

Download references

Acknowledgements

T.M.D. received funding through the McGill Healthy Brains for Healthy Lives (HBHL) initiative, the CQDM FACS program, the Alain and Sandra Bouchard Foundation, the Ellen Foundation and the Mowafaghian Foundation. T.M.D is supported by a project grant from CIHR (PJT-169095). R.A.T was funded by a Healthy Brains for Healthy Lives Fellowship. Thanks to Ivan Castanon Niconoff for helping create and set up the virtual machine used to host the Auto-qPCR web app. Thanks to Maria José Castellanos Montiel, Vincent Soubannier and Nguyen-Vi Mohamed, for testing the web app.

Author information

These authors contributed equally: Gilles Maussion and Rhalena A. Thomas.

Authors and Affiliations

The Neuro’s Early Drug Discovery Unit (EDDU), McGill University, 3801 University Street, Montreal, QC, H3A 2B4, Canada

Gilles Maussion, Rhalena A. Thomas, Iveta Demirova, Gracia Gu, Eddie Cai, Carol X.-Q. Chen, Narges Abdian, Angela Nauleau-Javaudin, Lenore K. Beitel & Thomas M. Durcan

INSERM U1266, Institute of Psychiatry and Neuroscience of Paris, Paris, France

Gilles Maussion, Sabah Kelaï, Nicolas Ramoz & Philip Gorwood

McConnell MNI Brain Imaging Center, McGill University, 3801 University Street, Montreal, QC, H3A 2B4, Canada

Theodore J. P. Strauss

You can also search for this author in PubMed   Google Scholar

Contributions

G.M. and R.A.T. conceptualized the program. I.D., G.G., E.C. and R.A.T. wrote and tested the program. R.A.T. managed the program development and GitHub repository and ran all the analysis using the webapp. G.G. built the graphical user interface and website. T.J.P.S. transferred the website to run online through a virtual machine. G.M. generated the qPCR data used to test the absolute and relative quantification models of Auto-qPCR program. C.X.Q.C., N.A. and A.N.J. extracted DNA and performed the PCR used to improve the pipeline related to the genomic instability model of Auto-qPCR program. S.K., N.R. and P.G. generated the external data set used for Fig.  5 . R.A.T., I.D. and G.M. made the figures. G.M., R.A.T., L.K.B. and T.M.D wrote the manuscript.

Corresponding author

Correspondence to Thomas M. Durcan .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Maussion, G., Thomas, R.A., Demirova, I. et al. Auto-qPCR; a python-based web app for automated and reproducible analysis of qPCR data. Sci Rep 11 , 21293 (2021). https://doi.org/10.1038/s41598-021-99727-6

Download citation

Received : 09 June 2021

Accepted : 27 September 2021

Published : 29 October 2021

DOI : https://doi.org/10.1038/s41598-021-99727-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Deltaxpress (δxpress): a tool for mapping differentially correlated genes using single-cell qpcr data.

  • Alexis Germán Murillo Carrasco
  • Tatiane Katsue Furuya
  • Roger Chammas

BMC Bioinformatics (2023)

Characterisation of cytotoxicity and immunomodulatory effects of glycolipid biosurfactants on human keratinocytes

  • Simms A. Adu
  • Matthew S. Twigg
  • Ibrahim M. Banat

Applied Microbiology and Biotechnology (2023)

qRAT: an R-based stand-alone application for relative expression analysis of RT-qPCR data

  • Daniel Flatschacher
  • Verena Speckbacher
  • Susanne Zeilinger

BMC Bioinformatics (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

qpcr data presentation

4 Easy Steps to Analyze Your qPCR Data Using Double Delta Ct Analysis

Fingers "walking" up wooden blocks arranged as steps, representing the steps in Ct values from a qPCR experiment, analysed by double delta Ct analysis

You are at the airport burning away time with a report due tomorrow morning for your professor. You have your data. Why not take advantage of the time and calculate the expression fold change for the genes you have tested in that first qPCR experiment you did last week?

It’s easy – I’ll show you how.

Check Your Method

There are two main ways to analyze qPCR data: double delta Ct analysis and the relative standard curve method (Pfaffl method). Both methods make assumptions and have their limitations, so the method you should use for your analysis will depend on your experimental design.

The double delta Ct analysis assumes that:

  • there is equal primer efficiency between primer sets (i.e. within 5%);
  • there is near 100% amplification efficacy of the reference and the target genes;
  • the internal control genes are constantly expressed and aren’t affected by the treatment.

The method generally caters to experiments with a large number of DNA samples and a low number of genes to be tested.

The relative standard curve method assumes that:

  • there are equal efficiencies between the control and the treated samples.

This method works better if you have fewer DNA samples but a larger number of genes to test.

What You Need for Double Delta Ct Analysis

  • the housekeeping gene: control and experimental conditions;
  • the gene of interest: control and experimental conditions;
  • An Excel spreadsheet.

And that’s it! No expensive software required.

Here is a quick summary of the key steps in the double delta Ct analysis (for a detailed explanation read this paper ).

4 Steps for Double Delta Ct Analysis

1.  Take the average of the Ct values for the housekeeping gene and the gene being tested in the experimental and control conditions, returning 4 values. The 4 values are Gene being Tested Experimental (TE), Gene being Tested Control (TC), Housekeeping Gene Experimental (HE), and Housekeeping Gene Control (HC).

2.  Calculate the differences between experimental values (TE – HE) and the control values (TC – HC). These are your ΔCt values for the experimental (∆CTE) and control (∆CTC) conditions, respectively.

3.  Then, calculate the difference between the \Delta CT values for the experimental and the control conditions ( \Delta CTE – \Delta CTC) to arrive at the double delta Ct value ( \Delta\Delta Ct).

4.  Since all calculations are in logarithm base 2, every time there is twice as much DNA, your Ct values decrease by 1 and will not halve. You need to calculate the value of 2 ^{-\Delta\Delta C_{t}} to get the expression fold change.

What Does the Value Mean?

Now that you have your value for fold change, what does it actually mean? This value is the fold change of your gene of interest in the test condition, relative to the control condition, which has all been normalized to your housekeeping gene.

To make it a little clearer – you can think about it as a percentage. A fold change of 1 means that there is 100% as much gene expression in your test condition as in your control condition – so there is no change between the experimental group and the control group. A fold-change value above 1 is showing upregulation of the gene of interest relative to the control (1.2-fold change = 120% gene expression relative to control, 5 = 500%, 10 = 1,000%, etc.). Values below 1 are indicative of gene downregulation relative to the control (fold change of 0.5 is 50% gene expression relative to control, so half as much expression as in the control, etc.).

You can present these data as fold-change bar charts, graphing the control conditions equal to 1. You can also use statistical analyses to check the significance of the changes, e.g. using an analysis of variance (ANOVA) or t -tests, whatever is appropriate for your experimental set-up!

Using these steps you can conduct your qPCR analysis wherever you are, even if you’re on a road trip. To make things even easier, you can create an Excel template to use each time. Then you will only have to input your data and you will astonish others with your alacrity in conducting analyses!

Need more qPCR help? Discover our top 11 qPCR papers and improve your qPCR data and analysis.

Originally published July 9, 2016. Reviewed and updated on February 8, 2021.

Further Reading

Livak KJ, Schmittgen TD. Analysis of Relative Gene Expression Data Using RealTime Quantitative PCR and the 2 ^{-\Delta\Delta C_{t}} Method. Methods. 2001; 25 :402–8.

51 Comments

' src=

its a good explanation and easy to applied, but there is no a fixed role for done, for example some one say if fold change less than ONE meaning down-regulation and vise versa with respect there is no difference in expression when the fold change equals one. Some time one divided on the double delta Ct values and I think this is confused for many researchers. However the Livak method applied only when the efficiency of both GOI and HKG are similar or need with each other with 5%. While delta Ct can be applied for individual samples and is benefit for cell line application as well as Livak because the delta Ct method is variation to Livake in addition to Pfaffi method which used to non equals or near efficiency of GOI and HKG. best regards, I built a sepratesheet of Excel deal with not average of experiment and control.

' src=

i want ask you why you add ( – ) in 2^-ΔΔCt when you calculate =2^(-O4) already when you calculate in excl should you put negative ? thats why u fold change 17.5 i think thats wrong because i said u If the ddCt has a positive value, the gene of interest is upregulated, because the fold change will be larger than 1. On the other hand, if the ddCt has a negative value, the gene is downregulated and the fold change is <1. So delet the – you will get on 0.1 fold change , but i really dont know how i can say this gene was downregulated in 0.1 fold change ?

' src=

You need to keep the negative in 2^-ΔΔCt. If you have a negative expression, that equation will retrieve a value below 1.

If the value of the “Expression Fold Change” or “RQ” is below 1, that means you have a negative fold change. To calculate the negative value, you will need to transform the RQ data with this equation in Excel:

=IF(X>=1,X,(1/X)*(-1))

Change “X” to the cell of your RQ data. In the Excel of the example it will be the cell “P4”, therefore:

=IF(P4>=1,P4,(1/P4)*(-1))

That way if the number is 1 or >1 nothing changes, but if the number is <1 you will have the negative fold change. In your example if the value is 0.1, you will retrieve -10 fold change, and you will be able to say: "My RQ is 0.1, that means we have 10 times lower expression than our control population".

For me is easier to transform the RQ data this way to have a better graphical representation of the data (more intuitive representation). hope this clarifies your concerns!

hi, please can u tell me why a lot of people did graph for fold change and they put negative value for example in the graph they put gene down regulated -10 but when they discussed they said this gene was upregulatd 10 fold can you explain for me. my opinion according in your excel sheet i just can say the gene was downregulated -4.13 in fold change 17.5 is it correct way to explain?

qpcr data presentation

So that is not the case with fold change. fold change goes down like 0.1, 0.001, 0.002, 0.000000007 etc. what goes in negative is delta delta ct. and delta delta ct grpahs make more sense scientifically than fold change. However, as we starting the calculation doing subtraction as ct gene of interest-ct house keeping gene, the delta ct value here is inversely proportional to amount of dna or rna. so lesser the ct more the amount. so a negative value means up regulation. but if you do ct subtraction like reference gene- gene of interest than the delta ct will be directly proportional to amount of starting material which makes more sense.

Forgot your password?

Lost your password? Please enter your email address. You will receive mail with link to set new password.

Back to login

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.21(4); 2009 Apr

Logo of plntcell

Real-Time Quantitative RT-PCR: Design, Calculations, and Statistics

Two recent letters to the editor of The Plant Cell ( Gutierrez et al., 2008 ; Udvardi et al., 2008 ) highlighted the importance of following correct experimental protocol in quantitative RT-PCR (qRT-PCR). In these letters, the authors outlined measures to allow precise estimation of gene expression by ensuring the quality of material, refining laboratory practice, and using a normalization of relative quantities of transcripts of genes of interest (GOI; also called target genes) where multiple reference genes have been analyzed appropriately. In this letter, we build on the issues raised by considering the statistical design of qRT-PCR experiments, the calculation of normalized gene expression, and the statistical analysis of the subsequent data. This letter comprises advice for taking account of, in particular, the first and the last of these three vital issues. We concentrate on the situation of comparing transcript levels in different sample types (treatments) using relative quantification, but many of the concerns, particularly those with respect to design, are equally applicable to absolute quantification.

STATISTICAL DESIGN

As mentioned by Udvardi et al. (2008) , an experiment ideally should encompass at least three independent biological replicates of each treatment. For each biological replicate, it is common to run at least two technical replicates of each PCR reaction. Each sample provides material for both GOI and reference gene reactions, so these are paired for each biological replicate. Ideally, a full experiment (i.e., all primer pairs for the GOI and reference genes on all samples) would be analyzed on a single (typically 96-well) plate. However, an experiment with many treatments and/or GOIs and reference genes requires a design strategy for multiple plates. Such a design was investigated by Hellemans et al. (2007) where they compared gene maximization to sample maximization; but to enable effective statistical comparison of treatments, a strategy that may be termed “treatment maximization” is required. As a plate can be viewed as a statistical block, the best option would be to separate complete biological replicates (i.e., one biological sample of each treatment) on each plate, so that the design is then a randomized block (see, for example, Mead, 1988 ). Larger experiments may necessitate an unbalanced design (i.e., without a complete replicate of all the treatments on a given plate), which must be constructed so that treatment comparisons of greatest interest are seen most frequently together on the same plate. It is then beneficial to use inter-run calibrators (IRCs) on each plate to improve the assessment of plate-to-plate variation (described in Hellemans et al., 2007 ). When many GOIs and reference genes are analyzed using the same samples, it will be more economical to analyze different genes on different plates (sample maximization). Although this is common practice and even supported by qPCR software packages, from a statistical point of view, it is not correct to separate the paired reactions of the samples. Whichever of the above setups is chosen, it is, in principle, advisable to use a full randomization of the reactions within each plate to counteract the effects of systematic variations occurring within a plate as introduced during the PCR setup or PCR run. However, because within-plate variation introduced during the PCR run should be minimal in the current generation of real-time PCR cyclers, it may now be less of a problem to use a more practical, nonrandom plate setup.

CALCULATIONS

The reaction (amplification) curve formed during the PCR run is exponential in its early phase, the progress of this curve being determined by the amplification efficiency, E . The basic formula applying to qRT-PCR aims to convert the number of cycles at a threshold level of fluorescence (more generally termed the quantification cycle or Cq ; Hellemans et al., 2007 ) into a relative quantity of input template present at the start of the PCR. If we take the relative amount of fragments at the Cq as 1, then the relative quantity of template in the original sample ( RQ ) can be calculated as follows:

equation M1

In the optimal situation, E equals 2, and, in many studies, E is taken arbitrarily as 2. However, in reality, E may vary considerably between primer pairs and between plates. Hence, it is more accurate to estimate E for each primer pair through analysis of a dilution curve or, more commonly, by analyzing the amplification curves of all reactions (e.g., Ramakers et al., 2003 ; Ruijter et al., 2009 ). Usually the error in E estimation from a single reaction will be greater than the real difference in E values between samples for a given primer pair on the same plate. Accordingly, it has been shown that the most precise results are obtained by assuming the same E for all reactions with the same primer pair on the same plate ( Cook et al., 2004 ), calculated as the mean of E values. Nevertheless, variation in E values should be inspected to check for obvious and systematic outliers. Currently, it is advisable to discard such outliers, although progress is being made to improve E estimation for individual reactions (e.g., Alvarez et al., 2007 ; Durtschi et al., 2007 ; Spiess et al., 2008 ). The variation in Cq for technical replicates also should be assessed, and the mean Cq of technical replicates then can be used in subsequent calculations. After calculating the RQ of a GOI, this needs to be normalized for the total amount of cDNA that was used in the reaction, as discussed by Gutierrez et al. (2008) . This provides the normalized RQ ( NRQ ) of the GOI for each biological replicate.

Before meaningful statistical analysis can be performed, the NRQ data need transformation. Specifically, on the RQ and NRQ scale, qPCR data are nonlinear and typically suffer from heterogeneity of variance across biological replicates within treatments and across treatments. This usually can be accounted for by applying a log transformation ( Gomez and Gomez, 1984 ) to the NRQ data, the result of which may be termed Cq′ (as it brings the data back to the Cq scale):

equation M2

Following a single-plate experiment or a balanced-design experiment across a number of plates, analysis of variance (ANOVA) can be used to compare treatments using the Cq′ values calculated above. This reduces to a t test if there are only two treatments run on a single plate. One benefit of applying ANOVA is that if the treatment structure consists of two or more treatment factors (for example, three genotypes as one factor by two environmental conditions as the other factor), the method can assess the variation due to each of these (as main effects) and then the interaction between them. Also, ANOVA automatically accounts for block effects such as interplate variation. Performed on the Cq′ values of the biological replicates, the mean for each treatment is output and used to make a statistical comparison of the treatments based on the standard error of the difference (SED) between means using the estimate of random variation at the level of biological replication. Hence, from the SED, the least significant difference (LSD) can be calculated at a particular level of significance (e.g., 5%) and used to compare treatments of a priori biological importance from the full set of possible comparisons.

Data ( Cq′ values) from an unbalanced design where not all treatments may occur together on plates precludes the use of ANOVA and requires modeling to estimate the means one would have expected if the design had been balanced. Such modeling employs the method of residual maximum likelihood ( Patterson and Thompson, 1971 ) to provide the means and appropriate SEDs for comparison of any pair of treatments. The relative size of these SEDs reflects degree of imbalance, so treatments never seen on the same plate will be compared with the greatest SED/LSD. Repeating an identical sample on all plates (i.e., using IRCs) will help in the assessment of plate-to-plate variation and hence in controlling the size of SEDs in the analysis of unbalanced designs. In some cases, variances may still be heterogeneous after log transformation (e.g., in experiments that include samples with a very low transcript level [high Cq ], which inherently have a higher error). The influence of such samples could be inspected by inclusion/omission from the statistical analysis. Alternatively, nonparametric tests, such as Friedman's ANOVA (which accounts for block effects), Kruskal-Wallis ANOVA (which does not account for block effects), or the Mann-Whitney test in the situation of only two treatments (in one block) can be applied (note that all these tests require a balanced design).

It is possible to process and analyze qRT-PCR data using standard database/spreadsheet and statistics software such as R (freely available), SAS, and GenStat. Alternatively, dedicated qRT-PCR analysis software packages are available (both commercially and as freeware), although careful checking is required to determine whether they are tailored for the experimental setup (e.g., whether they can handle multiple levels of replication [i.e., biological and technical] and whether they can perform an appropriate statistical analysis of the data). An up-to-date overview of qPCR software can be found at http://www.gene-quantification.info or in Pfaffl et al. (2009) .

PRESENTATION

Although NRQ data should not be used for inferential statistics (the analyses/comparisons are done on a different scale; see above), the mean NRQ and corresponding standard error for each treatment, as calculated from the replicate NRQ observations, are commonly used (graphically) to represent qRT-PCR results. When using a randomized block design, the NRQ s may first be corrected to account for block effects (e.g., plate-to-plate variation). To do this, it is important to note that the block effects are not additive on the NRQ scale (hence, the transformation to Cq′ that was required). For a balanced randomized block design, the block effects can be taken out manually by subtracting the mean Cq′ of a given block from each individual Cq′ in that same block. In the case of an unbalanced design, it is not possible to take out the block effects manually using block means, but the Cq′ values can be corrected at least partially by subtracting from each individual Cq′ in a block the Cq′ of an IRC in the block. NRQ data for presentation then may be calculated by back-transforming the corrected Cq′ values to the NRQ scale. Usually the mean (corrected) NRQ s and corresponding standard errors are presented all rescaled by the same quantity for convenient display (e.g., such that the rescaled mean NRQ of one of the treatments equals 1). In cases where all treatments are only compared with a common control (or calibrator) sample, it may be appropriate to calculate and present only the ratios between the mean NRQ s of the treatments and the control. If this is done, the standard error of the ratios, using the standard errors of the mean NRQs, should be correctly calculated as given below:

equation M3

To make a valid comparison of treatments in qRT-PCR experiments, it is essential to begin with a statistical design that incorporates the concepts of randomization, blocking, and adequate (biological) replication. Subsequently, data analysis will benefit from appropriate data transformation and proper accounting of sources of variation due to the experimental design prior to making a statistical assessment of differences between treatments. Finally, it should be kept in mind that the strategy of relative quantification, as dealt with here, is only suitable for comparing results from a given primer pair between treatments and not for comparing results obtained with different primer pairs to each other.

www.plantcell.org/cgi/doi/10.1105/tpc.109.066001

  • Alvarez, M., Vila-Ortiz, G., Salibe, M., Podhajcer, O., and Pitossi, F. (2007). Model based analysis of real-time PCR data from DNA binding dye protocols. BMC Bioinformatics 8 85. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cook, P., Fu, C., Hickey, M., Han, E.-S., and Miller, K. (2004). SAS programs for real-time RT-PCR having multiple independent samples. Bioinformatics 37 990–995. [ PubMed ] [ Google Scholar ]
  • Durtschi, J., Stevenson, J., Hymas, W., and Voelkerding, K. (2007). Evaluation of quantification methods for real-time PCR minor groove binding hybridization probe assays. Anal. Biochem. 361 55–64. [ PubMed ] [ Google Scholar ]
  • Gomez, K.A., and Gomez, A.A. (1984). Statistical Procedures for Agricultural Research, 2nd ed. (Chichester, UK: John Wiley and Sons).
  • Gutierrez, L., Mauriat, M., Pelloux, J., Bellini, C., and Van Wuytswinkel, O. (2008). Towards a systematic validation of references in real-time RT-PCR. Plant Cell 20 1734–1735. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hellemans, J., Mortier, G., De Paepe, A., Speleman, F., and Vandesompele, J. (2007). qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol. 8 R19. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Mead, R. (1988). The Design of Experiments: Statistical Principles for Practical Application. (Cambridge, UK: Cambridge University Press).
  • Patterson, H., and Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika 58 545–554. [ Google Scholar ]
  • Pfaffl, M.W., Vandesompele, J., and Kubista, M. (2009). Data analysis software. In Real-Time PCR: Current Technology and Applications, J. Logan, K. Edwards, and N. Saunders, eds (Norwich, UK: Caister Academic Press), pp. 65–83.
  • Ramakers, C., Ruijter, J., Lekanne-Deprez, R., and Moorman, A. (2003). Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci. Lett. 339 62–69. [ PubMed ] [ Google Scholar ]
  • Ruijter, J.M., Ramakers, C., Hoogaars, W.M.H., Karlen, Y., Bakker, O., van den Hoff, M.J.B., and Moorman, A.F.M. (February 22, 2009). Amplification efficiency: Linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Res. http://dx.doi.org/10.1093/nar/gkp045 . [ PMC free article ] [ PubMed ]
  • Spiess, A.-N., Feig, C., and Ritz, C. (2008). Highly accurate sigmoidal fitting of real-time PCR data by introducing a parameter for asymmetry. BMC Bioinformatics 9 221. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Udvardi, M., Czechowski, T., and Scheible, W.-R. (2008). Eleven golden rules of quantitative RT-PCR. Plant Cell 20 1736–1737. [ PMC free article ] [ PubMed ] [ Google Scholar ]

IMAGES

  1. Frontiers

    qpcr data presentation

  2. The heatmap showing the qPCR data along with the deep sequencing data

    qpcr data presentation

  3. ChIP-qPCR data presentation: separate IgG versus sample/IgG ratios. (a

    qpcr data presentation

  4. Auto-qPCR can process quantitative PCR data using two different

    qpcr data presentation

  5. Tissue-specific Calibration of Real-time PCR Facilitates Absolute

    qpcr data presentation

  6. This panel demonstrates all RT-qPCR results in all groups after

    qpcr data presentation

VIDEO

  1. Q-qPCR Software Product Feature

  2. Q-qPCR Software Tutorial

  3. Q-qPCR

  4. Welcome to GENEIO®

  5. Introducing The SensiFAST Real-Time PCR Family

  6. Installing your PrimeQ

COMMENTS

  1. Introduction to real-Time Quantitative PCR (qPCR)

    This slidedeck introduces the concepts of real-time PCR and how to conduct a real-time PCR assay. The topics that are covered include an overview of real-time PCR chemistries, protocols, quantification methods, real-time PCR applications and factors for success. Read more. Health & Medicine.

  2. PCR/qPCR Data Analysis

    qPCR Data Analysis. Throughout this guide, the factors that contribute to variations in the measurement of nucleic acid using PCR or qPCR have been highlighted. Each of these factors should be optimized to result in an assay that provides the closest possible value to the actual quantity of gene (target) in the reaction.

  3. PDF qPCR Technical Guide

    6. Experimental setup, data analysis and MIQE 6.1 Setting up qPCR experiments 21 6.2 Analysis and peer-review quality data reporting 22 6.3 Absolute quantification data analysis 22 6.4 Relative quantification data analysis 23 6.5 Data reporting 24 7. Applications 7.1 1-Step RT-qPCR and 2-Step RT-qPCR 27 7.2 Gene expression 27

  4. A survey of tools for the analysis of quantitative PCR (qPCR) data

    Analysis of qPCR data is a crucial part of the entire experiment, which has led to the development of a plethora of methods. ... included procedures for graphical data presentation, and offered statistical methods. In addition, we provide an overview about quantification strategies, and report various applications of qPCR. ...

  5. Real-Time PCR (qPCR) Learning Center

    Real-time PCR that is quantitative, or qPCR, is very popular for analysis of gene expression. Mastery of the qPCR application requires understanding of the main steps of the qPCR workflow. This learning center is your guide to the theory and implementation of the qPCR technique as it is used in gene expression analysis, beginning with RNA ...

  6. PDF The qPCR data statistical analysis

    Since the invention of real-time PCR (qPCR), thousands of high-impact studies have been conducted and published using qPCR technique (Heid et al. 1996; Higuchi et al. 1993; VanGuilder, Vrana, and Freeman 2008). Because it is highly sensitive, qPCR is the preferred method for microarray data

  7. The Ultimate qPCR Experiment: Producing Publication Quality

    The analysis of qPCR data can be challenging, especially as experiments grow in sample number and complexity of biological groups. A defined approach to qPCR data analysis is necessary to clarify gene expression analysis. Quantitative PCR (qPCR) is one of the most common techniques for quantification of nucleic acid molecules in biological and ...

  8. qPCRtools: An R package for qPCR data processing and visualization

    On the other hand, the relative standard curve can be used to handle qPCR data. The data analyses required in qPCR experiments, including the calculation of primer amplification efficiency, gene expression levels, and final statistics, are relatively difficult for novices. Many tools have been developed to process qPCR data (Pabinger et al., 2014).

  9. Analysis of real-time qPCR data

    Overview. Quantitative real-time PCR is an important technique in medical and bio-medical applications. The pcr package provides a unified interface for quality assessing, analyzing and testing qPCR data for statistical significance. The aim of this document is to describe the different methods and modes used to relatively quantify gene expression of qPCR and their implementation in the pcr ...

  10. A common base method for analysis of qPCR data and the application of

    qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and ...

  11. qPCR Analysis

    Contact a Specialist. Real-time quantitative polymerase-chain-reaction (qPCR) is a standard technique in most research laboratories performing gene expression. qPCR data analysis is a crucial part of a gene expression experiment, and has led to the development of several key methods. The sections which follow provide an overview of the key ...

  12. PDF Auto-qPCR; a python-based web app for automated and ...

    Program function—input data processing and quantification. The Auto-qPCR program reads the raw data in the form of a results spreadsheet (via the users file navigator) and reformats it into a ...

  13. Statistical analysis of real-time PCR data

    Table 1 The sample real-time PCR data for analysis. In this data set, there two types of samples (treatment and control); two genes (reference and target); and four concentrations of each combination of gene and sample. For data quality control and ANCOVA analysis, the real-time PCR sample data set can be grouped in four groups according to the ...

  14. QPCR: Application for real-time PCR data management and analysis

    Background Since its introduction quantitative real-time polymerase chain reaction (qPCR) has become the standard method for quantification of gene expression. Its high sensitivity, large dynamic range, and accuracy led to the development of numerous applications with an increasing number of samples to be analyzed. Data analysis consists of a number of steps, which have to be carried out in ...

  15. How To Interpret RT-qPCR Results

    Using our example, we'll take the efficiency of our target gene (97) over 100, and then add one. (97/100) + 1 = 1.97. We will do the same for our reference gene value (95). (95/100) + 1 = 1.95. After running the qPCR, you will get the Ct values for each target and reference gene under control and treatment experiments.

  16. Auto-qPCR; a python-based web app for automated and ...

    Quantifying changes in DNA and RNA levels is essential in numerous molecular biology protocols. Quantitative real time PCR (qPCR) techniques have evolved to become commonplace, however, data ...

  17. QPCR: Application for real-time PCR data management and analysis

    Once established in the qPCR community these initiatives will allow a standardized exchange of data between software tools and facilitate the comparison of qPCR experiments. Using three-tier software architecture that separates the presentation, the business, and the database layer enables not only easy maintenance but also allows distribution ...

  18. A survey of tools for the analysis of quantitative PCR (qPCR) data

    After successfully generating a high-quality qPCR run, the data needs to be correctly analyzed to get biological meaningful results. To facilitate sharing and exchanging experimental data, the Real-time PCR Data Markup Language (RDML) has been developed [6].The data standard is based on XML and contains details about the experimental setup, information about the samples and targets, and all ...

  19. qPCR Double Delta Ct Analysis in 4 Easy Steps

    4 Steps for Double Delta Ct Analysis. 1. Take the average of the Ct values for the housekeeping gene and the gene being tested in the experimental and control conditions, returning 4 values. The 4 values are Gene being Tested Experimental (TE), Gene being Tested Control (TC), Housekeeping Gene Experimental (HE), and Housekeeping Gene Control (HC).

  20. Statistical analysis of real-time PCR data

    For data quality control and ANCOVA analysis, the real-time PCR sample data set can be grouped in four groups according to the combination of sample and gene. The Control-Target combination effect was named group 1, Treatment-Target group 2, Control-Reference group 3 and Treatment-Reference group 4. Replicate. Sample. Gene.

  21. A Basic Guide to Real Time PCR in Microbial Diagnostics: Definitions

    Real time PCR (quantitative PCR, qPCR) is now a well-established method for the detection, quantification, and typing of different microbial agents in the areas of clinical and veterinary diagnostics and food safety. ... difficulties with interpretation and presentation of data, the limitations of qPCR in different areas of microbial ...

  22. How do I publish qPCR data in a bar graph?

    1. find out the relative quantification values which you can do with your inbuilt qPCR software available with most of the instrument. 2. now just make a simple table in Excel with you samples ...

  23. Real-Time Quantitative RT-PCR: Design, Calculations, and Statistics

    Specifically, on the RQ and NRQ scale, qPCR data are nonlinear and typically suffer from heterogeneity of variance across biological replicates within treatments and across treatments. This usually can be accounted for by applying a log transformation ... NRQ data for presentation then may be calculated by back-transforming the corrected Cq ...