Atomically accurate de novo design of antibodies with RFdiffusion

Given the success of RFdiffusion at designing VHHs with three de novo CDRs, we next tested its ability to design both heavy and light chains in scFv format. RFdiffusion was used to generate scFvs targeting specific epitope sites, following a strategy similar to the VHH design approach. However, unlike VHHs, where only three CDRs were built de novo, scFv design involved constructing all six CDRs on both the heavy and the light chains in addition to the docking mode.

The gene synthesis problem is more formidable for scFvs than for VHHs as they are too long to be simply assembled from pairs of conventional oligonucleotides synthesized on oligonucleotide arrays, and are challenging to uniquely pair due to high sequence homology between scFvs. We developed stepwise assembly protocols that enable the construction of libraries with heavy and light chains either specifically paired as in the design models (Supplementary Figs. 10 and 11) or combinatorially mixed within subsets of designs specifically with similar target-binding modes (Supplementary Fig. 12). The latter approach helps to overcome the greater challenge of accurate design of six CDRs de novo, which increases the possibilities for error compared with the VHH problem as only one suboptimal CDR can compromise binding. We found that given sets of nearly superimposable designs targeting the same site with the same binding mode, new scFvs generated by combining pairs of heavy and light chains from different designs were confidently predicted to bind to the target site in the designed binding mode at similar frequencies as compared to the original designs (Extended Data Fig. 7a). By contrast, random, structure-agnostic pairing rarely led to predicted binders (Extended Data Fig. 7a). Hence, by mixing CDRs from different designs that bind in the same orientation, we can effectively overcome failures due to single imperfectly designed CDRs, thereby offering a combinatorial solution to a combinatorially more complex problem (two-chain scFv design versus one-chain VHH design). This strategy highlights a key advantage of structure-based design: ‘intelligent’ pairing of heavy and light chains is possible with a structural model of every antibody, and allows de novo-designed antibody libraries to reach scales attainable by traditional library assembly methods, despite current limits in gene synthesis.

We succeeded in identifying epitope-specific scFvs from the heavy–light combinatorial libraries (of a theoretical complexity of approximately 10 million; Extended Data Figs. 7b,c and 8a–c) but not the fixed pairing libraries (Supplementary Fig. 13). Following expression and purification, SPR analysis of six distinct scFvs originating from two unique docks targeting the Frizzled epitope of TcdB revealed a range of affinities (Fig. 4d–h): the highest affinity binder, scFv6, had a K d of 72 nM (Fig. 4g). Conversion of the scFv to a full length IgG1 generated antibodies that bind with comparable (68 nM) affinity, demonstrating that our design method can be used to generate full-length antibodies (Fig. 4i). There are no antibodies binding to this epitope in the PDB, hence, this success cannot be attributed to memorization. Subsequent prediction of the structure of the scFv with AlphaFold3 showed a binding mode identical to that of the two nearly superimposable parent designs that contributed the light and heavy chains (Supplementary Fig. 16c,d). Competition with a known receptor, Frizzled-7, to this epitope confirmed that binding of scFv5 was on target (Fig. 4j). By contrast, no competition was seen in the presence of CSPG4, an alternative receptor that interacts with an epitope at the toxin core. Thus, scFvs targeting user-specified epitopes can be identified from structure-aware designed combinatorial libraries.

Fig. 4: Biochemical characterization of combinatorially assembled scFvs with six designed CDRs. a, Multiple sequence alignment of six scFvs that bind to TcdB. scFvs 1–5 originate from the same structural cluster, whereas scFv6 originates from a distinct cluster. b,c, AlphaFold3 predictions of scFv5 (b) and scFv6 (c) in complex with TcdB. scFv5 and scFv6 are predicted to bind to a similar but not identical epitope. The predicted orientation of scFv6 relative to TcdB is rotated compared with scFv5. d, Affinity of scFv5 to TcdB was 460 nM by SPR. e, Computational prediction of the scFv5–TcdB interface for VH (variable heavy-chain fragment; left) and VL (variable light-chain fragment; right). f, scFv5, when expressed as a full-length IgG1, shows a binding affinity of 380 nM to TcdB by SPR. g, Affinity of scFv6 to TcdB was 72 nM by SPR. h, Computational prediction of the scFv6–TcdB interface for VH (left) and VL (right). i, scFv6, when expressed as a full-length IgG1, shows a binding affinity of 68 nM to TcdB by SPR. j, scFv5 competes with Frizzled-7 and does not compete with CSPG4, indicating on-target binding. scFv5 was conjugated to a CM5 chip and TcdB RBD was flowed over at 50 nM either alone or mixed with 1 μM of Frizzled-7, CSPG4 or scFv5. k,l, SPR comparative analysis of B1.2.1 binding to C*07:02–PHOX2B versus C*07:02–PHOX2B(R6A). scFv was immobilized and then on-target and off-target binding was measured across an eight-step, twofold titration with an upper concentration of 5 μM. Steady-state kinetic analysis (k) and raw SPR trace (l) of on-target and off-target binding indicate specific binding to the intended target. m, AlphaFold3 predictions of HLA-C*07:02 with peptide PHOX2B (left) and PHOX2B(R6A) (right). R6 of PHOX2B is predicted to be solvent exposed. n, AlphaFold3 prediction of scFv B1.2.1 in complex with C*07:02–PHOX2B (left). Predicted polar contacts with R6 of the PHOX2B peptide (right), mediated by CDRH3, CDRL1 and CDRL2, are also shown. Figure was created using BioRender (http://biorender.com). Full size image