2Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Mendoza, Argentina
3USDA-Agricultural Research Service, Vegetable Crops Research Unit, University of Wisconsin, Madison, WI, USA
Full Text (PDF)
Calculation of theoretical restriction fragment size in a completely random genome.
This script will generate simulated, or "in silico" BAC end sequences from a supplied complete genome sequence.
Large insert DNA libraries, such as bacterial artificial chromosome (BAC) libraries, have served a pivotal role for a number of genetic and genomic purposes, including map-based cloning, construction of physical maps, and whole-genome sequencing. BAC end sequences (BESs) were originally proposed as the primary scaffold in the sequencing of the human genome, serving as the primary means for selecting minimally overlapping clones for sequencing (1). In a BAC-by-BAC sequencing approach, BESs aided in the construction of physical maps and minimal tile paths (MTPs), anchoring of the physical and genetic maps, and in the assembly of sequences. The BAC-by-BAC strategy has been used for sequencing the genomes of several model eukaryotic organisms, including human (2), mouse (3), Arabidopsis thaliana (4) and rice (5).
In addition, BES data provide a first glimpse at the composition of unsequenced genomes (6). In this approach, BES are usually analyzed by means of querying the BES using BLAST algorithms (7) against DNA and protein sequence databases. Identities are assigned to the BES on the basis of sequence similarity with the reference sequences. A match to a database entry is considered significant when the ‘expect (E) value’ or the ‘BLAST score’—two parameters that reflect the degree of sequence similarity between the queried and reference sequences—surpasses a specified cutoff value.
An important quality parameter in BES data is the actual sequence length. This can significantly affect the BLAST search results and, consequently, the informativeness of the analysis. Presumably, the probability that a BES finds a significant match in a database (for a given E-value cutoff) is higher if longer BES are queried.
BES data are also a source of molecular markers useful for mapping and breeding (8). The identification of potential markers in the BES, such as simple sequence repeats (SSRs), is usually done by using specific software. The probability of detecting useful SSR markers in short BESs is lower than if longer (>500 bp) BESs are used as non-repetitive sequences flanking the SSR are needed for designing primers.
For these reasons it is important to develop a procedure for generating longer BES. To date, all the large-scale BAC end sequencing projects have used direct sequencing from purified recombinant plasmids. This technology has typically yielded 400–650 nucleotides of BES, in BAC libraries from species as diverse as human (9), mouse (10), A. thaliana (11), rice (12), and papaya (6).
Splinkerettes (13) enable PCR amplification of DNA sequences that lie between a single known primer and a nearby restriction site. The advantage of the splinkerette adaptor is that it forms a hairpin loop in one of the strands, which prevents amplification in the absence of the known primer, thus minimizing the possibility of non-specific amplifications. This feature has made splinkerettes a robust tool for characterizing unknown DNA regions that are adjacent to known sequences (14,15).
Here we report on the development of a novel splinkerette-based method, suitable for high-throughput application, for generating longer end sequences from large insert library clones, using as a model a carrot (Daucus carota L.) BAC library.
Materials and methods BAC library, vector and splinkerette componentsA BamHI BAC library of carrot (16) was used as source of large insert DNA clones. The BamHI cloning site of the vector pIndigoBAC-5 (Epicenter Technologies, Madison, WI, USA) is flanked 5' and 3′ by the manufacturer's described pIndigoBAC-5 forward and reverse sequencing primers, which we termed VL1 (for vector left primer 1) and VR1 (for vector right primer 1), respectively. Two additional nested primers, VL2 (5'-GCCAGTGAATTGTAATAC-GACTCACTAT-3') and VR2 (5'-CACACAGGAAACAGCTATGACCATGATT-3'), were designed -57 and +96 bases from the cloning site, respectively, and used for sequencing of the insert ends. The splinkerettes were made by duplexing the top strand 5'-CGAATCGTAACCGTTCGTACGAGAATTCGTACGAGAATCGCTGTCCTCTCCAACGAGCCAAGG-3' and the bottom strand 5'-XXXXCCTTGGCTCGTTTTTTTTTGCAAAAA-3', where XXXX is the overhang compatible with the cohesive ends left by the restriction enzyme used (e.g., AATT for MunI, GATC for BglII, or TCGA for SalI). The annealing reaction contained both oligonucleotides (40 µM each), 10 mM Tris-HCl (pH 7.4), 5 mM MgCl2, and was performed in a thermocycler by heating the solution to 65°C for 10 min and cooling 1°C/min to 4°C. The external and internal (nested) splinkerette primers, Splk1 (5'-CGAATCGTAACCGTTCGTACGAGAA-3') and Splk2 (5'-TTCGTACGAGAATCGCTGTCCTCTCC-3'), used in PCR and sequencing reactions, respectively, were the same as originally described (13).