to BioTechniques free email alert service to receive content updates.
A suite of MATLAB-based computational tools for automated analysis of COPAS Biosort data
 
Elizabeth Morton and Todd Lamitina
Department of Physiology, University of Pennsylvania, Philadelphia, PA, USA
BioTechniques, Vol. 48, No. 6, June 2010, pp. xxv–xxx
Full Text (PDF)
Abstract

Complex Object Parametric Analyzer and Sorter (COPAS) devices are large-object, fluorescence-capable flow cytometers used for high-throughput analysis of live model organisms, including Drosophila melanogaster, Caenorhabditis elegans, and zebrafish. The COPAS is especially useful in C. elegans high-throughput genome-wide RNA interference (RNAi) screens that utilize fluorescent reporters. However, analysis of data from such screens is relatively labor-intensive and time-consuming. Currently, there are no computational tools available to facilitate high-throughput analysis of COPAS data. We used MATLAB to develop algorithms (COPAquant, COPAmulti, and COPAcompare) to analyze different types of COPAS data. COPAquant reads single-sample files, filters and extracts values and value ratios for each file, and then returns a summary of the data. COPAmulti reads 96-well autosampling files generated with the ReFLX adapter, performs sample filtering, graphs features across both wells and plates, performs some common statistical measures for hit identification, and outputs results in graphical formats. COPAcompare performs a correlation analysis between replicate 96-well plates. For many parameters, thresholds may be defined through a simple graphical user interface (GUI), allowing our algorithms to meet a variety of screening applications. In a screen for regulators of stress-inducible GFP expression, COPAquant dramatically accelerated data analysis and allowed us to rapidly move from raw data to hit identification. Because the COPAS file structure is standardized and our MATLAB code is freely available, our algorithms should be extremely useful for analysis of COPAS data from multiple platforms and organisms. The MATLAB code is freely available at our web site (www.med.upenn.edu/lamitinalab/downloads.shtml).

Introduction

Automation has been a great boon to the field of high-throughput screening. The Complex Object Parametric Analyzer and Sorter (COPAS) platform (Union Biometrica, Holliston, MA, USA) is a tool that allows for rapid quantification of the fluorescence, size, and optical density of small biological specimens, such as Caenorhabditis elegans, Drosophila, and zebrafish. The COPAS utilizes microfluidic approaches to draw intact live organisms through a fluorescence-compatible flow cell at extremely high rates (~50 animals per second) and quantifies the size [measured as object time-of-flight (TOF)], object optical density (EXT), and fluorescence emissions from up to three separate fluorescent channels for each animal. Because of its complete optical transparency, rapid growth rates, and amenability to forward and reverse genetic approaches, C. elegans is an excellent model system for COPAS-based high-throughput phenotypic and genetic studies (1,2,3,4,5,6,7). In many cases, these studies are enabled by the expression of fluorescent reporter transgenes (5,7,8), which often exhibit significant animal-to-animal variability. Because of this inherent variability in reporter expression, quantification of fluorescence by the COPAS within a population of animals is a more accurate phenotypic assessment than subjective visual inspection of individual animals (8). While the COPAS excels at the rapid collection of population-based data, the number of individual samples analyzed during a large-scale screen can easily reach into the thousands. Efficient analysis of such large COPAS data sets requires the use of automated computational tools, which have so far not been developed.

Currently, the COPAS can collect data in two modes, a single-sample mode and an autosampler 96-well mode. The single-sample mode permits very large sample sizes to be analyzed, which is a tremendous advantage for assaying highly variable or subtle phenotypes. However, because samples must be loaded one at a time into the sample chamber, the throughput of this mode is slow and labor-intensive and best suited to small-scale screens. The autosampler mode, enabled by the ReFLX adapter system, allows rapid analysis of liquid-based samples from 96-well plates, which provides tremendous sample throughput. However, the small volumes of 96-well assays limit the number of events per well to sample sizes much smaller than those obtained in the single-sample mode, making the autosampler mode well suited to large-scale genome-wide RNA interference (RNAi) or drug screens that utilize phenotypes of low variability. In the single-sample mode, each file contains the data from one sample. In the autosampler 96-well mode, each file contains the data from every well within a 96-well plate, classified according to well address. In both cases, the time required to filter, extract, and normalize the data; graph the summary results of the screen; compare results among plates; and statistically identify hits is a major rate-limiting step in the screening pipeline. Tools that facilitate the analysis of such large-scale data sets would tremendously advance the throughput capability of COPAS-based assays. Such tools are currently unavailable.

Many different software environments are suitable for the analysis of large-scale COPAS data sets, including R, SAS, and Visual Basic. Another program suitable for such analyses is MATLAB (MathWorks, Natick, MA, USA). MATLAB is a computer interface program specifically designed for analysis of matrix-based data sets, which is typically applied to the automation and standardization of image analysis routines. However, MATLAB can just as easily be applied to analyze any type of numerical data presented in a matrix format. Since the COPAS data file structure is a standardized 26 × n matrix worksheet (where n is the number of events sorted), we reasoned that COPAS-generated data could be analyzed in the MATLAB environment. While analysis of COPAS data is possible in other programming environments, such as Microsoft Excel and Visual Basic, MATLAB offers several significant advantages for COPAS data analyses. First, MATLAB is an interpreted language, making it very easy to learn, use, and modify. It is compatible with many different operating systems (Windows, Linux, Macintosh, etc.) and is therefore accessible to almost all users, regardless of platform. Second, MATLAB can receive user input through custom graphical user interfaces (GUIs); end-users need not have any experience with MATLAB to execute prewritten MATLAB functions. Third, MATLAB provides access to a library of common data handling methods, graphical representations, and statistical tools that can be visualized in highly flexible ways using plotting and imaging commands integrated within the MATLAB program. Such commands must often be written de novo in other programming languages. Since MATLAB is written for science and engineering applications, this library is tailored for analysis of scientific data. Finally, MATLAB is widely used throughout the biomedical research community, providing access to a strong user base for teaching, implementation, and code sharing. These advantages strongly support the use of MATLAB as the software of choice for analysis of COPAS data sets.

  1    2    3    4