Skip to content
Tech News
← Back to articles

Move over, AlphaFold: open source model predicts shape of 1 billion proteins

read original get Protein Structure Prediction Kit → more articles
Why This Matters

The release of the ESM Atlas, an open-source database predicting over one billion protein structures, marks a significant advancement in bioinformatics, expanding our understanding of the protein universe and accelerating biological discovery. Its superior performance over existing models like AlphaFold3 highlights the rapid progress in AI-driven protein structure prediction, offering valuable resources for researchers and potential breakthroughs in medicine and biotechnology.

Key Takeaways

The AI tool designed binders against Cytotoxic T-lymphocyte-associated protein 4 (CTLA-4). Credit: Molekuul/Science Photo Library

The known protein universe just got a lot bigger. A newly released artificial-intelligence tool has generated an atlas of more than one billion predicted protein structures and billions more protein sequences.

The database, known as the ESM Atlas, was unveiled today by researchers at the Chan Zuckerberg Initiative’s Biohub, a biomedical institute created in San Francisco, California, by Facebook founder Mark Zuckerberg and his wife, physician and educator Priscilla Chan.

The atlas eclipses the AlphaFold Database of predicted protein structures by more than 800 million entries, and a previous ESM Atlas by some 300 million.

The predictions were made using ESMFold2, an AI model that Biohub says surpasses the performance of AlphaFold3, the latest version of Google DeepMind’s system and other protein-structure prediction AIs. The atlas is described in a preprint released today1.

“What this atlas does is it shows the totality of protein biology and especially the parts that are most unknown,” says Biohub science head Alex Rives, who led the effort. “We think it’s going to be a really powerful substrate for the discovery of new biology.”

Other scientists are impressed with the results, especially that ESMFold2 is fully open source. But the Biohub model enters an increasingly crowded field, in which competing open-source and proprietary protein models are making gains at breakneck speed.

Antibody predictions

ESMFold2 is based on a ‘protein language’ model that Rives’s team unveiled in 2024, which was trained on billions of proteins from across the tree of life. It includes ‘metagenomic’ sequences from soil, ocean and other environments, which are absent from the AlphaFold database of predicted protein structures.

Rives’ team say ESMFold2 outperforms existing methods, including AlphaFold3, at determining the correct structure of complexes of interacting proteins – including antibody molecules binding to their antigen molecular targets.

... continue reading