to BioTechniques free email alert service to receive content updates.
LASAGNA-Search: an integrated web tool for transcription factor binding site search and visualization
 
Chih Lee and Chun-Hsi Huang
Full Text (PDF)
Supplementary Material


Figure 3.  LASAGNA-Search input page user interface. (Click to enlarge)




For TF model input, LASAGNA-Search accepts variable-length TFBSs for model building. Users may input TFBSs in the FASTA format. Clicking the “Start Searching’” button aligns the TFBSs. The PWM and sequence logo (36) of the automatically trimmed alignment will be displayed. Users may choose to further trim the alignment or recover previously trimmed columns. Figure 3C shows the user interface for TFBS alignment trimming. In addition to TFBSs, users may input a PWM for model building. LASAGNA-Search recognizes formats used by JASPAR, TRANSFAC and UniPROBE.

LASAGNA-Search currently offers two ways of selecting models in the TFBS-based and PWM-based collections.One is to browse each model collection, while the other is to search by keywords for models in all the collections. To browse a collection, users may click the radio button to browse models by species or species group. A model can then be added to the “shopping cart” by marking the model with a tick. To search for models, users may enter one or more keywords and click the “Search” button. The models found will be displayed in a list and can be selected or removed (see Figure 3B for an example). The number of selected models is displayed on the input page. Users may click the “Show” button to view these models or remove the unwanted ones.

Promoter sequences may be input in the FASTA format. Users may also retrieve promoter sequences by NCBI Gene IDs, gene symbols, or mRNA accession numbers. By clicking the “Search” button, LASAGNA-Search will display the matching promoters. figure 3D shows the promoters found using keywords CCND1 and MYB. Users may choose to examine only promoters of a particular organism. In Figure 3D, only the matching human promoters are listed after applying the filter. Promoters are selected in a manner similar to selecting TF models. Finally, users may also select from a list of randomly sampled promoters from a chosen organism. Results page

The results page is organized into five tabs. The first tab displays hits on all the promoter sequences; the second tab displays hits pertaining to one promoter sequence at a time; the third tab shows the GRN inferred from the search results; the fourth tab allows for importing previous search results to be merged with the current search results; the last tab contains the inputs, including the selected TF models, the selected promoters, and the search parameters. Figure 4 shows an example results page with the third tab named “Promoter view” showing.




Figure 4.  Result page of LASAGNA-Search. (Click to enlarge)




Only hits meeting the specified criterion are reported in the first and second tabs. For each hit, the model name, sequence, zero-based position, strand, score, p-value, and E-value are reported. Hits found in the same promoter sequence can be sorted by model name, sequence, position, strand, p-value, and E-value by clicking the respective column header. By default, the hits are displayed in an HTML table. Users may click a button on the results page to obtain the hits in a tab-delimited format. These hits can be easily imported into a new search session. This is particularly useful when additional TFs of interests are identified after an initial search.

Users may display the hits along the promoter sequence, where the -log p-value of each hit is used as the height to plot a box. This allows easy visualization of the predicted binding sites by a model in the context of other models. Finally, the hits can be saved in GFF (general feature format) or bedGraph format for visualization in the UCSC Genome Browser (29). Links are provided for each promoter sequence to automatically create a custom track that redirect users to the UCSC Genome Browser. Figure 5 shows a custom track of putative binding sites predicted by LASAGNA-Search in the context of four other relevant tracks.




Figure 5.  Visualization of hits in the UCSC Genome Browser. (Click to enlarge)




The automatically inferred GRN can be displayed and manipulated by clicking the tab named “Gene regulatory network.” To produce a sparser network, users may set a more stringent p-value than the one used to filter hits. Users may show only nodes belonging to one or more species listed under “Filter by species.” Figure 2A shows the network after restricting the species to Homo sapiens. Users may choose to display the TF coding genes by checking “Map TFs to coding genes.” Figure 2B shows the resulting network. While six nodes are present in the GRN in Figure 2B, there are essentially only two genes and their products in the network. When a GRN involves more genes, it may be desirable to simplify the GRN, replacing the TF models with their respective coding genes. Figure 2C displays the simplified two node GRN generated by checking “Simple network.” We note that a GRN can be simplified only after the TF models are mapped to coding genes. Comparison of features to existing web tools

LASAGNA-Search allows users to scan promoters for TFBSs without leaving the LASAGNA-Search page. Many features of LASAGNA-Search were designed to be user-friendly. Hence, even without the knowledge of PWM or TFBS databases and promoter sequence retrieval tools, users can search for binding sites in a promoter sequence and visualize the hits in the UCSC Genome Browser immediately. There are several integrative TFBS search web tools available. By comparing it to existing web tools, we can better understand the advantages and disadvantages of LASAGNA-Search and suggest future improvements. Table 2 summarizes the comparison of LASAGNA-Search to matrix-scan and the search engine of MAPPER2 database for identifying TFBSs.

  1    2    3    4    5