2Department of Obstetrics and Fetal Medicine, Universitaetsklinikum Hamburg-Eppendorf (UKE), Hamburg, Germany
3Max Planck Institute for Molecular Genetics, Berlin-Dahlem, Germany
Full Text (PDF)
The Bisulfite Sequencing Data Presentation and Compilation (BDPC) web interface for compilation and presentation of data from bisulfite sequencing DNA methylation studies has been improved by adding a new module. This module allows visualization of the whole data set in form of a heatmap of DNA methylation levels and clustering and comparison of tissue methylation patterns. It can also be used for data not processed by BDPC. In addition, several functions of the existing BDPC compilation module have been improved.
DNA methylation of CpG dinucleotides is a major epigenetic process with important medical implications (1). Because the DNA methylation is lost during DNA amplification by cloning or PCR, a wide range of technologies have been developed to determine the methylation state in the genomic DNA (2). The most commonly used standard technique for the analysis of DNA methylation covering a relatively large region (~300–500 bp) is bisulfite genomic sequencing, which utilizes the selective deamination of unmethylated cytosines with sodium bisulfite, subsequent amplification of a region of interest with primers specific for the converted DNA, and sequencing of the PCR product (3). Most detailed qualitative and quantitative information about the methylation status of individual DNA molecules is achieved when this method is combined with sequencing of subcloned PCR products (4). In general, the aim of DNA methylation analyses is to determine the methylation state of a region of interest, which is amplified with specific primers, and called an amplicon herein. In the first step of the analysis, the resulting sequencing data, which usually contain sequences of >20–50 subcloned DNA molecules, need to be aligned against the in silico–converted genomic reference sequence. The bisulfite sequencing DNA methylation analysis (BISMA) tool (http://biochem.jacobs-university.de/BDPC/BISMA) or the BiQ-Analyzer assists the user during this step with implemented quality control features (5). This analysis has to be performed for each PCR product obtained in every sample analyzed, giving rise to multiple output files. The next step of the analysis is to compile the derived information and compare the results from each amplicon in the different samples. To assist this task and implement automated data visualization, we recently introduced the BDPC web application, which is freely available at http://biochem.jacobs-university.de/BDPC(6). Here we describe new features of this application which considerably improve data analysis and visualization.
A regular task of methylome projects is to visualize the complete set of methylation data [for example, using a heatmap and performing a hierarchical clustering of the tissues to compare the similarity of the tissue methylation patterns (7,8,9,10,11,12)]. Since all the information required for such analyses is available in the BDPC output file, we implemented an additional clustering module. When using the uncomplicated BDPC output format, data derived from other analysis pipelines can also be submitted here. For illustration purposes, we used here a subset of the DNA methylation data described previously (12) after including additional results from a trisomic umbilical cord sample. These data were analyzed with BiQ Analyzer and processed by BDPC. Additionally, to show the usability of the new BDPC module, we processed published DNA methylation data determined using the Sequenom Mass Array (13) and data generated by deep sequencing techniques (14).
The functions of the new BDPC clustering module are exemplified in Figure 1 and are described as follows. The user may consult the detailed BDPC online manual and the example data files for further information and instructions how to use BDPC.
1. Methylation heatmap. The new module creates a methylation heatmap, showing the average methylation of all amplicons in all tissues, which displays the whole methylation data set in comprehensive form (Figure 1C). BDPC offers automatic sorting of the amplicons. Alternatively, the results can be plotted using the initial sorting provided by the user. Here, one can, for example, sort by the chromosome position, the distance to the transcriptional start site, presence of certain histone modifications, or by CpG content of the sequence analyzed (as shown in Figure 1D). If more than 1000 amplicons are uploaded, the preparation of the methylation map is disabled, but the tissue clustering (see sections 2 and 3) will still be performed and the dendrogram plotted.
2. Comparison of tissue methylation patterns. The new BDPC clustering module compares the methylation patterns of different tissues. The underlying principle is to calculate the pairwise correlation of the methylation results obtained from each sample. Here, matching pairs of data (results from the same amplicon) are compared between the tissues. The tissue similarity is calculated using the Pearson's correlation coefficient which assumes a normal distribution of the values or the Spearman's rank correlation coefficient, which is a hypothesis-free method (15). A correlation coefficient of 1 indicates that both data sets are identical. A value of 0 indicates no correlation, while -1 would be indicative of a negative correlation. As shown in Figure 1B, the methylation patterns on human chromosome 21 are most similar when comparing normal and trisomic cells of the same cell type.
3. Tissue clustering. To visualize the difference between methylation patterns among several tissues, BDPC transfers the similarity matrix of the correlation coefficients into a distance matrix which is then used for a hierarchical clustering, applying the unweighted pair-group average method (UPGMA) (15). For example, in the data set mentioned above, the normal and trisomic cells of the same cell type form close clusters (Figure 1A). As shown in Figure 1, E–F, astrocytes and neural precursor cells from passage 18 cluster together (14), while in the data set of Igarashi et al. (2008), the methylation patterns of liver and testis differ most from brain, colon, heart, kidney, and muscle (13).
4. Standalone tissue clustering. We also set up a standalone option for tissue clustering, using a tissue similarity matrix provided by the user. To illustrate this function, we used the results of global meDIP experiments, which provided correlation coefficients for each pair of tissues (11). The BDPC clustering function (Figure 1G) nicely illustrates that there are two groups of tissues which significantly differ (pMEFs and TS cells versus ES11.5, EG12.5, and sperm). In the original publication, this conclusion was derived on the basis of a visual inspection of the tissue similarity plot (11).
Additional improvements of the existing BDPC compilation module (6) include:
1. Improved amplicon overview. BDPC now generates for each amplicon a publication-grade image showing the CpG site methylation averages of each PCR product (Figure 2 A). By default, the PCR products will be sorted by their average methylation levels. This enables the identification of CpG site–specific methylation differences between tissues. In the example shown in Figure 2A, CpG site 5 tends to have higher methylation than the other sites in three of the tissues.
2. Online data display. BDPC is now working as a fully interactive platform for temporary visualization of the compiled data (Figure 2). In addition to the possibility of downloading the result files, now the data are available online on the BDPC server. If the additional amplicon information file is uploaded together with the data, links to the University of California, Santa Cruz (UCSC) genome browser for automatic uploading of a custom track are provided in the amplicon summary.
3. Clone sorting. Clones can be sorted by their average methylation level in the PCR product methylation picture (Figure 2B).
4. Continuous color. BDPC pictures can now be drawn in a continuous color scale (Figure 1, B–D; Figure 2).
5. Reference sequence checking. As an additional quality control measure, BDPC now checks if the reference sequence used for all PCR products of each amplicon is identical.
The technical assistance of Sandra Becker is gratefully acknowledged. We thank Marc-Thorsten Hütt for valuable advice and comments. This work was supported by the Nationale Genomforschungsnetz (NGFN2) program of the German Minister of Research and Education (BMBF).
The authors declare no competing interests.
Address corresponding to Albert Jeltsch, School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28725 Bremen, Germany. Email: a.jeltsch@jacobs-university.de

