Tech News
← Back to articles

Transferable enantioselectivity models from sparse data

read original related products more articles

Identifying a catalyst class to optimize the enantioselectivity of a new reaction, either involving a different combination of known substrate types or an entirely unfamiliar class of compounds, is a formidable challenge. Statistical models trained on a reported set of reactions can help predict out-of-sample transformations1–5 but often face two challenges: (1) only sparse data are available i.e., limited information on catalyst–substrate interactions, and (2) simple stereoelectronic parameters may fail to describe mechanistically complex transformations.6,7 Here we report a descriptor generation strategy that accounts for changes in the enantiodetermining step with catalyst or substrate identity, allowing us to model reactions involving distinct ligand and substrate types. As validating case studies, we collected data on enantioselective nickel-catalyzed C(sp3)-couplings8 and trained statistical models with features extracted from the transition states and intermediates proposed to be involved in asymmetric induction. These models allow for the optimization of poorly performing examples reported in a substrate scope and are applicable to unseen ligands and reaction partners. This approach offers the opportunity to streamline catalyst and reaction development, quantitatively transferring knowledge learned on sparse data to novel chemical spaces.