to BioTechniques free email alert service to receive content updates.
A suite of MATLAB-based computational tools for automated analysis of COPAS Biosort data
 
Elizabeth Morton and Todd Lamitina
Full Text (PDF)





To facilitate analysis of the numerous COPAS data files generated by our RNAi screen, we wrote an algorithm, using the programming platform MATLAB, to automatically extract desired values from COPAS *.txt data files (one file per RNAi condition) (Table 2). The COPAS exports data in a 26-column format, in which each row represents data from a single worm. The basic function of our COPAquant algorithm, COPASFun, imports numerical values from a COPAS data file. After data import, COPAquant queries the user as to whether the data to be analyzed should be filtered based on gating criteria, which are a unique combination of COPAS parameters (TOF, EXT, Ch1, Ch2, and Ch3) that are user-defined during data acquisition. COPAquant can be instructed to analyze gated data only, nongated data only, or all data. Using our hsp-16p::GFP screen data as an example, we chose to extract gated values for TOF, EXT, and fluorescence values for each of the three fluorescent channels. Because COPASmeasured GFP fluorescence is related to object size (unpublished data), COPASFun can correct for this bias by normalizing to the object TOF, which is a direct measure of object size. These ratio values (Ch1/TOF, Ch2/TOF, Ch3/TOF) are entered into new columns. The resulting columns for our values of interest (TOF; EXT;, Ch1, Ch2, and Ch3; as well as their associated ratios) are then summarized with mean and SD. In the current screen for hsp-16::GFP regulators, meaningful yellow (Ch2) and red (Ch3) data were not obtained, since this strain does not express reporters in either of these fluorescent channels. These statistics, as well as the number of events in the sample (n), are then exported to the function COPASImp (Figure 2A).









The COPASImp function sends multiple COPAS *.txt files to COPASFun for analysis (Figure 2A). Once the MATLAB directory is set to the appropriate folder, COPASImp recognizes and reads all *.txt files within the folder (Figure 2A). Once all the files in the folder have been analyzed, the results are presented in a table titled Results (which is automatically saved as the tab-delimited text file Results.txt for analysis outside of MATLAB) as well as in a structure labeled ImStruc (in which each cell contains the results for one sample). Following analysis, COPASImp queries the user as to which parameter should be represented in graphical format. The user-selected parameter is then plotted and displayed (Figure 2B).

In addition to the form of normalization discussed above, COPAquant V2 will also normalize all samples to a negative control sample to produce a relative fold-change value (Table 2). The program presents data in both the raw form (Figure 2, B and C) and in various normalized forms (Figure 2, D and E), using the lowest numbered file as the negative control reference. The mean of the reference sample is calculated for each parameter, and each event within subsequent samples is divided by this value, creating a new, normalized column of values. The means of the normalized values, as well as their SD values, are exported back to COPASImp (Figure 2, D and E).

Using COPAquant, we dramatically enhanced the rate of data analysis in our screen for regulators of hsp-16p::GFP expression using the single-sample mode of COPAS screening. We were able to rapidly identify hits that affect GFP expression but not worm growth by analyzing both normalized GFP, as well as normalized TOF values (i.e., normalized to the negative control sample—empty vector RNAi in this case). Prior to implementation of COPAquant, the time required for manual analysis of a single day's worth of COPAS data obtained using the single-sample acquisition mode frequently exceeded 8 h. Using COPAquant, data from 1 day of sorting are now analyzed, normalized, and graphed within 10 s, which represents a ~3000 fold increase in data analysis efficiency.

In addition to the single-sample sorting mode described above, some labs also employ an autosampling device called the ReFLX system. ReFLX-equipped COPAS systems sort and quantify events from individual wells of 96-well plates using the optional ReFLX sampler. Data from each well are stored within a single 26-column format file according to their row and column address. To make our MATLAB program applicable to ReFLX screening platforms, we modified our existing single-sample MATLAB code to read ReFLX files. The modified programs, COPAmulti and COPAcompare (Figure 3 and Table 2), read raw *.txt files generated by the ReFLX, filter and extract matrices for each well, and summarize useful parameters. Data from one or more 96-well files (COPAmulti) or a replicate pair of 96-well files (COPAcompare) are analyzed, and the data for each plate is stored in a separate cell of a Results Structure within MATLAB. For each plate analyzed, the raw data (n and well mean ± SD for each of eight different parameters for every well) are exported to a Results Structure, which can be accessed for export to other programs. To make COPAmulti as user-friendly as possible, we implemented a GUI within MATLAB that allows users to define several criteria for data analysis, including filtering cutoffs, the parameter to be utilized for analysis, and statistical criteria and thresholds used to identify hits (Figure 4A). Since these criteria can be adjusted through the GUI and the data are rapidly reanalyzed, the effects of altered filtering and statistical criteria are easily determined.

  1    2    3    4