Skip to content
Tech News
← Back to articles

OpenAI starts offering a biology-tuned LLM

read original get AI Biology Textbook → more articles
Why This Matters

OpenAI's new biology-tuned LLM, GPT-Rosalind, represents a significant advancement in specialized AI for scientific research, addressing the challenges of large datasets and complex subfields in biology. By focusing on common workflows and public databases, it aims to accelerate discoveries and streamline research processes. This development highlights the growing importance of tailored AI models in advancing scientific innovation and supporting researchers across disciplines.

Key Takeaways

On Thursday, OpenAI announced it had developed a large language model specifically trained on common biology workflows. Called GPT-Rosalind after Rosalind Franklin, the model appears to differ from most science-focused models from major tech companies, which have generally taken a more generic approach that works for various fields.

In a press briefing, Yunyun Wang, OpenAI’s Life Sciences Product Lead, said the system was designed to tackle two major roadblocks faced by current biology researchers. One is the massive datasets created by decades of genome sequencing and protein biochemistry, which can be too much for any one researcher to take in. The second is that biology has many highly specialized subfields, each with its own techniques and jargon. So, for example, a geneticist who finds themselves working on a gene that’s active in brain cells might struggle to understand the immense neurobiological literature.

Wang said the company had taken an LLM and trained it on 50 of the most common biological workflows, as well as on how to access the major public databases of biological information. Further training has resulted in a system that can suggest likely biological pathways and prioritize potential drug targets. “We’re connecting genotype to phenotype through known pathways and regulatory mechanisms, infer likely structural or functional properties of proteins, and really leveraging this mechanistic understanding,” Wang said.