Large-scale discovery, analysis and design of protein energy landscapes

Library design

The initial set of 15,715 domain sequences was organized into five batches and further divided into 18 libraries (mix 1–4, libraries 1 and 4; libraries 7–15; and mutants 2–4): (1) mix 1–4: de novo designed ααα, βαββ and ββαββ sequences21; (2) libraries 1 and 4: de novo designed αββα proteins11; (3) libraries 7–14: natural domains from the Pfam database, including LysM, PASTA, WW, SH3, pyrin and cold-shock; (4) library 15: PDB-derived monomeric proteins devoid of cysteine residues and metal cofactors; (5) mutant libraries containing single and double mutants from EEHEE_rd4_0871 and HHH_rd4_0518 low-cooperativity proteins. Sequences were randomly assigned to libraries within each batch, ensuring a minimum mass difference of 50 ppm between nearest-neighbour sequences for mass spectrometry compatibility (except library 15 where two sequences are 36 ppm apart). After SUMO cleavage (see below), all proteins begin with the dipeptide HM (the scar from the NdeI ligation). Some sequences were modified with C-terminal padding (G, S, GG or GS) to optimize mass spacing. All sequences were reverse-translated and codon-optimized for E. coli using DNAworks (v.2.0)68. To standardize amplification efficiency, a ‘GGS’ sequence was appended after the stop codon. Oligo libraries encoding the original 15,715 sequences were purchased from Agilent Technologies, while the 280 designed mutations were sourced from Twist Bioscience.

Cloning of Twist oligo libraries into the pGR02 plasmid

Oligo libraries were resuspended and amplified by quantitative PCR (qPCR) for restriction enzyme cloning. A preliminary qPCR run determined optimal amplification cycles, preventing overamplification by terminating reactions at around 50% of maximum fluorescence intensity. Purified qPCR products were digested with XhoI and NdeI and ligated into the pGR02 plasmid, which encodes an N-terminal 10×His-SUMO tag. Ligated constructs were electroporated into 10-β electrocompetent E. coli (New England Biolabs) and recovered in SOC medium at 37 °C for 1 h before plating onto selective MDAG-11 + B1 + kanamycin agar plates69. Serial dilutions determined transformation efficiency, and all colonies were pooled to maximize sequence diversity. Plasmid DNA was extracted from pooled cultures using the QIAprep Spin Miniprep Kit (Qiagen).

Library expression and purification

Each library’s plasmid pool (5 μl) was electroporated into 25 μl BL21(DE3) electrocompetent E. coli (Sigma-Aldrich), recovered in SOC medium (1 ml, 37 °C, 1 h), and plated onto selective MDAG-11 + B1 + kanamycin agar plates. Colonies were pooled and used to inoculate 2–4 l of LB broth with 50 μg ml−1 kanamycin. Cultures were grown at 37 °C until an optical density at 600 nm (OD 600 ) of 0.6, then induced with 1 mM IPTG and incubated at 16 °C overnight (~16 h). Cells were collected by centrifugation and resuspended in lysis buffer (20 mM Tris, 500 mM NaCl, 30 mM imidazole, 0.25% CHAPS, 1 mg ml−1 lysozyme, 10 U ml−1 Benzonase, 1× Pierce protease inhibitor cocktail, pH 8.0). Sonication (QSonica, 5 min total, 60% amplitude, 1 min on/off cycles) was followed by centrifugation (12,500g, 30 min, 4 °C; repeated at 14,000g for clarification). The soluble fraction was purified through Ni-NTA agarose gravity columns (Qiagen). After washing with buffer (20 mM Tris, 500 mM NaCl, 30 mM imidazole, 0.25% CHAPS, 5% glycerol, pH 8.0), proteins were eluted (20 mM Tris, 300 mM NaCl, 500 mM imidazole, 5% glycerol, pH 8.0). Eluted proteins were dialysed overnight into PBS, and SUMO tags were cleaved using a 1:100 molar ratio of ULP1 (4 °C, ~20 h). A second Ni-NTA purification removed SUMO and ULP1, collecting cleaved proteins in the flow-through. Proteins were concentrated (3 kDa Amicon Ultra filters) and further purified by Superdex 75 10/300 GL size-exclusion chromatography (Cytiva) on the NGC FPLC system (Bio-Rad). The monomeric fractions were pooled, reconcentrated, filtered (0.22 μm Millex-GP filter), flash-frozen in liquid nitrogen and stored at −80 °C until use.

Labelled protein expression and purification for NMR analysis

We selected 13 proteins for individual expression, purification and NMR analysis. The DNA sequences were codon-optimized for E. coli and cloned into pET-28a(+) (thrombin cleavage site) from Twist Biosciences or pET-28a(+)-TEV from GenScript. The plasmids were transformed into chemically competent BL21(DE3) cells. A small starter culture (5 ml) was inoculated in LB Miller broth with 50 μg ml−1 kanamycin and grown overnight at 37 °C, 220 rpm. The starter culture (25 μl) was then diluted into 50 ml of labelled M9 medium (42 mM Na 2 HPO 4 , 22 mM KH 2 PO 4 , 8.6 mM NaCl, 8.6 mM 15NH 4 Cl (Cambridge Isotope), 11 mM d-glucose (13C, Cambridge Isotope), 1 mM MgSO 4 , 0.2 mM CaCl 2 , 0.15 mM thiamine, 1% (v/v) trace elements (3 mM FeCl 3 , 0.37 mM ZnCl 2 , 0.074 mM CuCl 2 , 0.042 mM CoCl 2 ·H2O, 0.162 mM H 3 BO 3 , 6.84 mM MnCl 2 ·H 2 O)) with 50 μg ml−1 kanamycin and grown overnight at 37 °C, 220 rpm. Larger cultures of M9 medium were inoculated with overnight M9 small culture (50 ml per 1 l) and grown at 37 °C, 220 rpm to OD 600 of around 0.6. Expression was induced with 0.5 mM IPTG, and cells were incubated at 16 °C overnight (around 16–18 h). Cells were collected, resuspended in lysis buffer (20 mM Tris, 500 mM NaCl, 30 mM imidazole, 0.25% CHAPS, pH 8.0, 1 mg ml−1 lysozyme, 10 U ml−1 Benzonase, 1× Pierce protease inhibitor EDTA-free) and lysed by sonication. The lysates were clarified by centrifugation (13,000g, 30 min). Proteins were purified by immobilized metal affinity chromatography (IMAC) using Ni-NTA agarose. The column was washed with buffer (20 mM Tris, 500 mM NaCl, 30 mM imidazole, 0.25% CHAPS, 5% glycerol, pH 8.0), and proteins were eluted in elution buffer (20 mM Tris, 300 mM NaCl, 500 mM imidazole, 5% glycerol, pH 8.0). Eluted proteins were dialysed into buffer (50 mM Tris, 200 mM NaCl, 5% glycerol, pH 8.0) using Pur-A-Lyzer dialysis tubes (Sigma-Aldrich). His-tags were cleaved using either TEV protease (produced in-house, pRK793 plasmid; Addgene, 8827) or thrombin CleanCleave kit (Sigma-Aldrich), depending on the construct. TEV protease was added at a protease:target protein ratio of 1:10 with 0.5 mM DTT and incubated overnight at room temperature. Thrombin cleavage followed the manufacturer’s protocol, incubating overnight at room temperature. A second IMAC Ni-NTA purification was performed to remove the tag and uncleaved protein. Proteins were further purified by size-exclusion chromatography using a Superdex 75 10/300 column in phosphate-buffered saline. Monomeric fractions were identified based on elution profiles of a standard mixture (BSA, ovalbumin, ribonuclease A, aprotinin and vitamin B12), pooled, and concentrated using Amicon Ultra-4 centrifugal filters. The protein concentration was determined using the Pierce BCA assay (Thermo Fisher Scientific).

NMR structure determination

NMR spectra for HHH_rd4_0518, EEHEE_rd4_0871 and EEHEE_rd4_0642 structure calculations were acquired at 288 K on Bruker spectrometers operating at 600 and 800 MHz, equipped with TCI cryoprobes with the protein buffered in 20 mM sodium phosphate (pH 7.5, 150 mM NaCl) at concentrations of 0.5 to 1 mM. Resonance assignments for 15N/13C-labelled proteins were determined using FMCGUI70 based on a standard suite of 3D triple- and double-resonance NMR experiments collected as described previously71. All 3D spectra were acquired with non-uniform sampling in the indirect dimensions and were reconstructed by the multi-dimensional decomposition software qMDD72, interfaced with NMRPipe73. Peak picking was performed manually using NMRFAM-Sparky74. Torsion angle restraints were derived from TALOS+75. Automated NOE assignments and structure calculations were conducted using CYANA (v.2.1)76. The best 20 out of 100 CYANA-generated structures were refined with CNSSOLVE77 by performing a short restrained molecular dynamics simulation in explicit solvent78. The final 20 refined structures comprise the NMR ensemble. Structure quality scores were performed using Procheck analysis79 and the PSVS server80.

... continue reading