2School of Engineering and Science, Jacobs University Bremen, Bremen, Germany
3BASF SE, Fine Chemicals and Biocatalysis Research, Ludwigshafen, Germany
††Present address: Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
Protein consensus-based surface engineering (ProCoS) is a simple and efficient method for directed protein evolution combining computational analysis and molecular biology tools to engineer protein surfaces. ProCoS is based on the hypothesis that conserved residues originated from a common ancestor and that these residues are crucial for the function of a protein, whereas highly variable regions (situated on the surface of a protein) can be targeted for surface engineering to maximize performance. ProCoS comprises four main steps: (i) identification of conserved and highly variable regions; (ii) protein sequence design by substituting residues in the highly variable regions, and gene synthesis; (iii) in vitro DNA recombination of synthetic genes; and (iv) screening for active variants. ProCoS is a simple method for surface mutagenesis in which multiple sequence alignment is used for selection of surface residues based on a structural model. To demonstrate the technique’s utility for directed evolution, the surface of a phytase enzyme from Yersinia mollaretii (Ymphytase) was subjected to ProCoS. Screening just 1050 clones from ProCoS engineering–guided mutant libraries yielded an enzyme with 34 amino acid substitutions. The surface-engineered Ymphytase exhibited 3.8-fold higher pH stability (at pH 2.8 for 3 h) and retained 40% of the enzyme’s specific activity (400 U/mg) compared with the wild-type Ymphytase. The pH stability might be attributed to a significantly increased (20 percentage points; from 9% to 29%) number of negatively charged amino acids on the surface of the engineered phytase.
Directed protein evolution is a well-established, versatile, and successful algorithm for tailoring protein properties to industrial demands and advancing our understanding of structure– function relationships in biocatalysts (1,2). Directed evolution entails the accumulation of beneficial mutations in iterative cycles of mutagenesis and screening or selecting for improved enzyme variants. These accumulations of mutations mostly result in a downhill path on the fitness landscape (fitness versus sequence) plot. A downhill mutational path eventually leads to an unfolded or inactive enzyme variant (3). It is becoming clear that there exist two pathways for directed evolution: (i) a widely known pathway of accumulating single amino acid changes at each cycle of mutagenesis and screening to select an improved enzyme (4) and (ii) a pathway in which cooperative effects between a combination of mutations lead to synergistic or additive improvements (5,–8).
Protein consensus-based surface engineering (ProCoS) is a simple method in which multiple sequence alignment is used to select surface residues based on a structural model. Synthetic gene variants are designed for surface mutagenesis and synthesized commercially. A mutant library generated by PCR-based in vitro recombination of synthetic genes is screened for active variants.
In practice, substitutions resulting from cooperative effects are barely reported or studied. Recently, we showed that knowledge-based combinations of key residue substitutions identified in directed evolution yielded an improved enzyme (6). Traditional directed evolution approaches for improving enzyme properties are laborious and involve multiple steps of mutagenesis and screening. Screening more than 9000 clones of random mutant libraries identified 5 key positions in a phytase (9), and the subsequent multi-site saturation mutagenesis of these sites and screening (1100 clones) yielded a pH-stable phytase variant (7). A hydrolase was evolved for higher pH stability and thermostability using error-prone PCR and DNA shuffling by screening >45,000 clones (10). Therefore, knowledge-based methods using sequence alignments and structural information are becoming an attractive alternative.
Here, we present a computer-assisted method for surface engineering of proteins employing sequence alignment and structural analysis to design and screen mutant libraries. Our goal was to determine if a large number of mutations could be incorporated into a protein to engineer its surface while also maintaining its functionality. This presumes that incorporation of a higher number of mutations in a protein will increase the probability of cooperative effects between the substitutions in the mutant library and could yield a functional protein due to increased functional diversity of the libraries.
Re-engineering the protein surface has become an interesting tool due to its wide applications in therapeutic protein delivery (11,12), immobilization (13,–16), solubilization (17,18), stabilization of proteins in aqueous (19) or organic solvents (20), and preservation of enzyme activity in ionic liquids (21). Over the past few years, it has become apparent that modification of protein surface charge is a viable strategy for enhancing protein stability (22). Several computationally designed proteins with altered surface charge–charge interactions showed improvements in thermostability (23,–25). A rational modification of 18 surface residues of carbonic anhydrase yielded a variant with extreme halotolerance that is active at >3 M NaCl (26). A replacement of charged residues on the surface with hydrophobic residues improved stability of a protease in organic solvents (20). Other approaches for protein surface engineering, including chemical modifications (amination or succinylation) (13), coupling of polymers (PEGylation) (11), and fusion of polypeptides (PASylation) (27), have been developed to alter the surface properties of proteins.
Here, we present protein consensus based surface engineering (ProCoS), a method that can be used to incorporate >30 amino acid substitutions simultaneously in an enzyme to produce surface modifications. A computational analysis based on conservation of amino acids was used to identify functionally important regions and highly variable regions in a model protein, phytase from Yersinia mollaretii (Ymphytase). A combination of computational and molecular biology tools was used for surface engineering of Ymphytase. The Ymphytase variant retained ∼40% of the wild-type’s phytase activity after incorporation of 34 amino acid changes located on the surface of the protein. Interestingly, the pH stability of Ymphytase was improved 3.8-fold (pH 2.8) compared with the wild-type, which might be due to the significant increase (20 percentage points; from 9% to 29%) in negatively charged surface substitutions in the identified Ymphytase variant.
Materials and methods
Identification of conserved residues
As a first step toward identification of functionally important residues, available phytase amino acid sequences from the Enterobacteriaceae family of bacteria were retrieved from the ExPASy proteomics server (www.expasy.org). A total of 25 phytase enzyme sequences were obtained from different genuses of the Enterobacteriaceae family of bacteria, including Escherichia (6), Yersinia (7), Klebsiella (5), Pectobacterium (1), Shigella (1), Obesumbacterium (2), and Citrobacter (3). Sequence alignment of all 25 sequences was performed with VectorNTI suite 10 software (Invitrogen, Darmstadt, Germany) using the blosum62mt2 matrix with a gap-opening penalty of 10 and a gap-extension penalty of 0.05. A phylogenetic tree was built by applying the neighbor joining method implemented in VectorNTI to the sequence alignment of 25 Enterobacteriaceae phytases. The consensus sequence was generated by the AlignX module of VectorNTI (used for multiple sequence alignment). Amino acid residues that are conserved (blue), similar (green), or identical (yellow) in the multiple sequence alignment are shown in the consensus sequence (Supplementary Figures S1 and S2). The consensus sequence was used to identify conserved residues in the Enterobacteriaceae family. These sets of amino acids (blue, green, and yellow) were considered to be functionally important residues, whereas the residues colored in white (non-conserved) are considered variable regions.
Selection and substitution of sites
Amino acid sites in Ymphytase were selected based on the conservation of each residue within the Enterobacteriaceae family of bacteria. The protein sequence was divided into functionally important regions (conserved residues) and variable regions (non-conserved). Each residue in the variable region of the sequence alignment was analyzed for the frequency of occurrence in all of the Enterobacteriaceae species included in this study. The positions of these residues were visualized in a homology model of Ymphytase (9) using VMD software (28). Residues belonging to loops and surface regions were selected. Selected residues were substituted with other amino acids based on three criteria: (i) frequently occurring residues in the sequence alignment were identified and selected for substitution; (ii) chemically similar amino acids were preferred, with swapping of the charged residues performed to avoid charge accumulation in a few areas; and (iii) sterically favorable residues were favored.
Gene synthesis and subcloning
Three synthetic genes were designed based on the above criteria, and codon optimization was performed using the GeneDesign server for E. coli expression (29). Ymphytase gene (YmappA) variants with restriction sites (NdeI at the 5´ end and NotI at the 3´ end) were synthesized commercially at GENEART (Regensburg, Germany). All synthetic constructs were digested with NdeI and NotI restriction enzymes (New England BioLabs). Digested synthetic genes were cloned into the pET-22b(+) vector (Merck Chemicals GmbH, Darmstadt, Germany) and transformed into the E. coli BL21-Gold(DE3) strain (Agilent Technologies Deutschland GmbH, Waldbronn, Germany) for protein expression.
In vitro DNA recombination
A PCR-based DNA recombination protocol was used to recombine the three synthetic genes of the YmappA variants. The template for recombination was generated from each pET-22b(+)- YmappA synthetic gene construct (1 ng/ µl) by PCR (50 µL reaction volume) (98°C for 3 min; 25 cycles of 98°C for 10 s, 58°C for 15 s, 72°C for 25 s; and then 72°C for 3 min) using the pET-22b(+) vector-specific primers F1 (5´-CGA CTC ACT ATA GGG GAA TTG TGA GCG GA-3´) and R3 (5´-CGG GCT TTG TTA GCA GCC GGA TCT CAG-3´) (0.4 µM each), Pfu DNA polymerase (0.025 U/µl), and dNTP mix (0.2 mM each) in thin-walled PCR tubes. All PCR products were methylated using 8 U dam methyltransferase (New England BioLabs, Frankfurt, Germany) and column purified (NucleoSpin Extract II Kit; MACHEREY-NAGEL, Düren, Germany).
Vector-specific primers F1 and R3 were used for in vitro recombination. Each generated template was mixed together in equimolar amounts. Three PCRs were performed using different annealing/extension times at 55°C. PCR was performed using the amplified template (20 ng), 0.15 µM of each primer (F1 and R3), 1× Taq buffer, dNTP (0.2 mM each), and 2.5 U Taq polymerase. The PCR program consisted of 94°C for 30 s (denaturation), and 55°C for 1 s, 5 s or 10 s (annealing/extension), performed on a Mastercycler gradient (Eppendorf AG, Hamburg, Germany). PCR products (∼1.4 kb) were gel-extracted and purified with the NucleoSpin Extract II Kit. PCR products obtained with 5 s and 10 s annealing/extension times were mixed together (Supplementary Figure S3). Following the PCR, 20 U DpnI (New England BioLabs) was added, and the mixture was incubated overnight at 37°C and then column purified. Both purified PCR products (1 s annealing/extension time or combined 5/10 s annealing/ extension times) were digested separately with the NdeI and NotI restriction enzymes. Digested PCR products were cloned into the E. coli expression vector pET-22b(+) and transformed into E. coli BL21-Gold(DE3) for expression. We will refer to the mutant library obtained by 1 s annealing/extension as the ProCoS mutant library–A and the library obtained by 5 s and 10 s annealing/extension as the ProCoS mutant library–B.
Screening of ProCoS variants
ProCoS mutant libraries were expressed in 96-well microtiter plates (37°C, 900 rpm, 70% relative humidity), and a 96-well microtiter plate-based AMol (Ammonium Molybdate) screening system was used as reported previously (9). Briefly, 10 µl cell lysate was incubated with 140 µl substrate solution (0.6% phytic acid in 250 mM acetate buffer, 0.01% Tween-20, pH 5.5) for 1 h at 37°C. The reaction was stopped by the addition of 150 ml 15% trichloroacetic acid. Inorganic phosphate release was quantified by addition of 20 µL stopped reaction mixture to 280 µL color mix solution (0.27% w/v ammonium molybdate and 1.08% w/v ascorbic acid in 0.32 M H2SO4), and the absorption was measured at 820 nm using a Tecan Infinite M1000 microtiter plate reader (Tecan Group AG. Männedorf, Switzerland).
Purification and characterization of wild-type and mutant Ymphytase
Purification and kinetic parameter determination for Ymphytase wild-type and the ProCoS-2 variant were performed as described previously (9).
The pH stability of the wild-type and mutant phytases was determined at 37°C using 4 different buffers: 0.25 M glycine-HCl buffer for pH 2.0–3.2; 0.25 M sodium acetate buffer for pH 3.6–5.6; 0.25 M imidazole- HCl buffer for pH 6.0–7.0; and 0.25 M Tris-HCl for pH 7.4–9.0. Purified enzymes were diluted in the specified buffers to 40 ng/mL and incubated at 37°C for 3 h. The activity assay for Ymphytase was carried out with 1 mM phytate at pH 4.5 and at 37°C using the 96-well-plate format colorimetric AMol screening assay as reported previously (9). Non pH-treated Ymphytase activity (the initial activity measured at pH 4.5 before the incubation) was considered to be 100% activity, and residual (relative) activity was calculated.
Results and discussion
ProCoS method concept
Homologous enzymes harbor a common feature for catalysis in their primary sequences, the sequence motifs, which reflect their functionality. These functional sequence motifs are conserved throughout different species. Amino acid residues located in the interior (especially the hydrophobic residues), buried inside the protein, are important for correct folding. The underlying concept for ProCoS engineering is the hypothesis that the highly conserved residues in a protein sequence belong to the ancestors of a family of species and, therefore, these residues (non-variable) are important for protein function. On the other hand, the highly variable regions that are not conserved during protein evolution can be targeted for mutagenesis to alter the surface properties of enzymes.
In our ProCoS method (Figure 1), a target protein sequence is first divided into functional and variable (sequence variable or non-conserved) regions based on a multiple sequence alignment of the homologous proteins from a species family. The sequences of these homologous proteins are retrieved from a protein database, and a phylogenetic tree is then constructed based on the multiple sequence alignment to study the relatedness between the homologous sequences. The sequence alignment is used to identify the most variable residue positions. The locations of these residues are then visualized using a protein structure or homology model. Residues situated on the surface of the protein and in the loop regions are targeted for substitution with another residue based on three criteria: (i) frequently occurring residues in the sequence alignment; (ii) chemical similarity, where charged amino acids are introduced either by swapping two charged amino acids that might be involved in salt bridge interactions or by analyzing the surrounding area to avoid the accumulation of positive or negative charge; and (iii) sterically favorable, where the size of the amino acid is considered for substitution to avoid steric clashes after substitution. Synthetic genes were designed based on the above criteria and synthesized commercially. All of the synthetic gene constructs are recombined using in vitro PCR recombination, and the resulting mutant library is screened for active variants.