An agentic system for rare disease diagnosis with traceable reasoning

We introduce DeepRare—an agentic framework designed to support rare disease diagnosis, structured upon a modular, multi-tiered architecture. The system comprises three core components: (1) a central host agent, equipped with a memory bank that integrates and synthesizes diagnostic information while coordinating system-wide operations; (2) specialized local agent servers, each interfacing with specific diagnostic resource environments through tailored toolsets and (3) heterogeneous data sources that provide critical diagnostic evidence, including structured knowledge bases (for example, research literature, clinical guidelines) and real-world patient data. The architecture of DeepRare is described in a top-down manner, beginning with the central host’s core workflow and proceeding through the agent servers to the underlying data sources.

Problem formulation

In this paper, we focus on rare disease diagnosis, where the input of a rare disease patient’s case consists typically of two components: phenotype and genotype, denoted as ${\mathcal{I}}=\{{\mathcal{P}},{\mathcal{G}}\}$. Either ${\mathcal{P}}$ or ${\mathcal{G}}$ (but not both) may be an empty set ∅, indicating the absence of the corresponding input. Specifically, the input phenotype may consist of free-text descriptions ${\mathcal{T}}$, structured HPO terms ${\mathcal{H}}$ or both. Formally, we define, ${\mathcal{P}}=({\mathcal{T}},{\mathcal{H}})$, where either ${\mathcal{T}}$ or ${\mathcal{H}}$ may be empty (that is, ${\rm{\varnothing }}$) indicating the absence of that input modality. The ‘genotype input’ denotes the raw VCF file generated from WES.

Given ${\mathcal{P}}$, the goal of the system is to produce a ranked list of the top $K$ most probable rare diseases, ${\mathcal{D}}=\{{d}_{1},{d}_{2},\ldots ,{d}_{K}\}$, and a corresponding rationale ${\mathcal{R}}$ consisting of evidence-grounded explanations traceable to medical sources such as peer-reviewed literature, clinical guidelines and similar patient cases. This can be formalized as:

$$\{{\mathcal{D}},\,{\mathcal{R}}\}={\mathcal{A}}({\mathcal{P}}),$$ (1)

where ${\mathcal{A}}(\cdot )$ denotes the diagnostic model.

As shown in Extended Data Fig. 1b, our multi-agent system comprises three main components:

(1) A central host with a memory bank serves as the coordinating brain of the system. The memory bank is initialized as empty and updated incrementally with information gathered by agent servers. Powered by a LLM, the central host integrates historical context from the memory bank to determine the system’s next actions. (2) Several agent servers execute specialized tasks such as phenotype extraction and knowledge retrieval, enabling dynamic interaction with external data sources. (3) Diverse data sources serve as the external environment, providing crucial diagnostic evidence from PubMed articles, clinical guidelines, publicly available case reports and other relevant resources.

Main workflow

The system operates in two primary stages, orchestrated by the central host: information collection and self-reflective diagnosis, as illustrated in Extended Data Fig. 1c. For clarity, the specific functionalities of the agent servers involved in each stage are detailed in the following section.

... continue reading