to BioTechniques free email alert service to receive content updates.
A suite of MATLAB-based computational tools for automated analysis of COPAS Biosort data
 
Elizabeth Morton and Todd Lamitina
Full Text (PDF)









Since ReFLX files offer unique analysis challenges and opportunities not present in single-sample data collection modes, we implemented several additional features common to high-throughput multiwell-based RNAi screening for ReFLX file analysis. First, the mean of a user-selected parameter from each well is plotted in an 8 × 12 matrix heat map that is color-coded by well value (Figure 3B). This visualization strategy is a useful way to compare the data across a plate and often helps in the identification of plate edge effects, a common confounder in high-throughput RNAi screening (11). Second, instead of normalizing to a single negative control sample (as we do for single-sample data analysis), COPAmulti takes advantage of the large number of samples and uses the plate mean (calculated from the median 80% of nonzero value samples to remove effects of outliers) as the negative control value. This approach is a well-accepted data normalization strategy for multiwell plate assays that can be uniformly applied across all plates (11). In addition to this normalization strategy, we also implemented a second approach (COPAmulti V2) that allows users to define the well(s) that contain negative control data through the COPAmulti GUI (Figure 4B). Using these calculated negative control reference values, we implement three common statistical tests for hit identification that have been previously utilized in RNAi screening formats: (i) mean ± k SD; (ii) median ± k MAD; and (iii) the multiple-comparisons t-test with Bonferroni correction. The specific significance test and threshold for each test is set within the user-adjustable GUI. Each test has specific strengths and weaknesses and in some cases may not represent the best statistical approach for data analysis. Nonetheless, these methods are among the most commonly used approaches for analysis of high-throughput RNAi screening data (11), and the best approach is usually to compare results obtained with each statistical method. In general, the mean ± k SD test is the most commonly used hit identification technique for RNAi screening, due to its ease of calculation (12,13). Most screeners utilize a 3-SD cutoff with this approach. However this method is sensitive to outlier data and frequently misses weaker positives. Decreasing the SD cutoff usually increases false positives to an unacceptably high rate. An alternative approach is the median ± k MAD test. Like the mean ± k SD test, MAD is relatively easy to calculate but is much less sensitive to outlier data. MAD also does a good job of identifying weak hits while controlling false positives (14). A shortcoming of MAD is that it is not easily linked to probability distributions and P values. Despite this shortcoming, others have recommended MAD as the method-of-choice for hit selection in high throughput RNAi screens (14). MAD values of ≥2 are commonly used for hit identification in genome-wide RNAi screens (14). A final common statistical test for RNAi screening is the multiple-comparison t-test. This statistic is easy to calculate (due to the large number of events in each well), but is extremely sensitive to outliers and requires multiple-comparison correction (11). For multiple comparison t-tests, the simplest form of correction is the Bonferroni correction, which scales the desired P value by the number of samples to obtain an equivalent multiple comparison P value. A table of Bonferroni-corrected P values for common thresholds is listed in Table 1. In general, users should analyze their data with each statistical approach and utilize the method or combination of methods that most frequently identifies known positive controls. A major advantage of our software is that it allows users to rapidly adjust and test each of these statistical methods for hit identification through the simple GUI. For users that wish to perform statistical analysis of their data using other approaches, COPAmulti automatically exports both summarized and raw data to delimited text files for further analysis.

Following statistical analysis, hits meeting user-determined thresholds are binarized in an 8 × 12 matrix, with hits plotted in white and non-hits plotted in black (Figure 3C). We also visualize all data from all plates using a well index plot (Figure 3D). Such plots are useful indicators of screen phenotypic behavior among plates and can help identify plates with phenotypic drift or substantial variance. For example, data in Figure 3 demonstrate lower values toward the end of the plate as compared with the beginning of the plate. Finally, since some users may screen in duplicate, we implemented a separate algorithm, COPAcompare, that allows users to compare results between two plates (Figure 5). COPAcompare plots a userselected parameter for each well between two user-selected plates. The degree of overall plate-to-plate correlation is determined by calculating the Pearson correlation coefficient (R), where an R value of 1 equals perfect correlation among all wells and −1 equals perfect opposite correlation among all wells.





We developed a suite of MATLAB-based programs to process large COPAS file data sets such as those associated with C. elegans RNAi screens. We implemented one program, COPAquant, for comparisons among data collected in the single-sample format, which is useful for small-scale screens with larger populations. We also implemented two additional programs, COPAmulti and COPAcompare, that use more advanced filtering, analysis, normalization, and statistical analysis of data from 96-well plates obtained using the COPAS ReFLX system. Both programs allow users to rapidly move from raw COPAS data to graphical data representation, replicate plate comparison, and hit identification without extensive knowledge of or experience with the programming environment. Our software greatly simplifies the analysis of COPAS data and fills a major gap in our need for data analysis tools for high-throughput screening using this platform. While we used this program in the validation steps of an RNAi screen for regulators of a heat shock–inducible reporter in C. elegans, the program is customized to the standard data format output by COPAS Biosort instruments and thus can be used in any type of COPAS application, including data obtained from other organisms.

Acknowledgments

This paper was supported by a grant from the National Institutes of Health (NIH; grant no. 1R01AA017580 to T.L.) and an NIH training grant in Cell and Molecular Biology [no. T32 GM-07229 to E.M. (principle investigators Richard Schultz and Marisa Bartolomei)]. The authors wish to thank Hernan Garcia and Michael Springer for helpful training in MATLAB-based analysis, Weon Bae and Rock Pulak for providing sample ReFLX data, and the suggestions of four anonymous reviewers that greatly improved the manuscript. This paper is subject to the NIH Public Access Policy.

Competing interests

The authors declare no competing interests.

Correspondence
Address correspondence to Todd Lamitina, University of Pennsylvania, Department of Physiology, Richards Research Building A700, 3700 Hamilton Walk, Philadelphia, PA 19104, USA. e-mail: [email protected]

References
1.) Smith, M.V., W.A. Boyd, G.E. Kissling, J.R. Rice, D.W. Snyder, C.J. Portier, and J.H. Freedman. 2009. A discrete time model for the analysis of medium-throughput C. elegans growth data. PLoS One 4:e7018.

2.) Boyd, W.A., M.V. Smith, G.E. Kissling, J.R. Rice, D.W. Snyder, C.J. Portier, and J.H. Freedman. 2009. Application of a mathematical model to describe the effects of chlorpyrifos on Caenorhabditis elegans development. PLoS One 4:e7024.

3.) Boyd, W.A., M.V. Smith, G.E. Kissling, and J.H. Freedman. 2009. Medium- and high-throughput screening of neurotoxicants using C. elegans. Neurotoxicol. Teratol. 32:68-73.

4.) Sprando, R.L., N. Olejnik, H.N. Cinar, and M. Ferguson. 2009. A method to rank order water soluble compounds according to their toxicity using Caenorhabditis elegans, a Complex Object Parametric Analyzer and Sorter, and axenic liquid media. Food Chem. Toxicol. 47:722-728.

5.) Doitsidou, M., N. Flames, A.C. Lee, A. Boyanov, and O. Hobert. 2008. Automated screening for mutants affecting dopaminergic-neuron specification in C. elegans. Nat. Methods 5:869-872.

6.) Burns, A.R., T.C. Y. Kwok, A. Howard, E. Houston, K. Johanson, A. Chan, S.R. Cutler, P. McCourt, and P.J. Roy. 2006. High-throughput screening of small molecules for bioactivity and target identification in Caenorhabditis elegans. Nat. Protocols 1:1906-1914.

7.) Lamitina, T., C.G. Huang, and K. Strange. 2006. Genome-wide RNAi screening identifies protein damage as a regulator of osmoprotective gene expression. Proc. Natl. Acad. Sci. USA 103:12173-12178.

8.) Pujol, N., O. Zugasti, D. Wong, C. Couillault, C.L. Kurz, H. Schulenburg, and J.J. Ewbank. 2008. Antifungal innate immunity in C. elegans is enhanced by evolutionary diversification of antimicrobial peptides. PLoS Pathog. 4:e1000105.

9.) Rea, S.L., D. Wu, J.R. Cypser, J.W. Vaupel, and T.E. Johnson. 2005. A stress-sensitive reporter predicts longevity in isogenic populations of Caenorhabditis elegans. Nat. Genet. 37:894-898.

10.) Link, C.D., J.R. Cypser, C.J. Johnson, and T.E. Johnson. 1999. Direct observation of stress response in Caenorhabditis elegans using a reporter transgene. Cell Stress Chaperones 4:235-242.

11.) Birmingham, A., L.M. Selfors, T. Forster, D. Wrobel, C.J. Kennedy, E. Shanks, J. Santoyo-Lopez, D.J. Dunican. 2009. Statistical methods for analysis of high-throughput RNA interference screens. Nat. Methods 6:569-575.

12.) Bard, F., L. Casano, A. Mallabiabarrena, E. Wallace, K. Saito, H. Kitayama, G. Guizzunti, Y. Hu. 2006. Functional genomics reveals genes involved in protein secretion and Golgi organization. Nature. 439:604-607.

13.) DasGupta, R., A. Kaykas, R.T. Moon, and N. Perrimon. 2005. Functional genomic analysis of the Wnt-wingless signaling pathway. Science 308:826-833.

14.) Chung, N., X.D. Zhang, A. Kreamer, L. Locco, P.F. Kuan, S. Bartz, P.S. Linsley, M. Ferrer, and B. Strulovici. 2008. Median absolute deviation to improve hit selection for genome-scale RNAi screens. J. Biomol. Screen. 13:149-158.

  1    2    3    4