Tech News
← Back to articles

Advancing regulatory variant effect prediction with AlphaGenome

read original related products more articles

Beyond impacting isoform composition through splicing modulation, non-coding variants can influence traits and cause diseases by altering gene expression23,24. We evaluated the ability of AlphaGenome to predict the impact of variants on gene expression across a range of regulatory mechanisms, including eQTLs, enhancer–gene interactions and alternative polyadenylation (APA).

Improved prediction of eQTL effects

We first evaluated the ability of AlphaGenome to predict the impact of eQTLs, which are variants associated with gene expression variation. We developed a variant scoring strategy to quantify a variant’s predicted effect on a gene’s expression level (Fig. 4a and Methods). An illustrative example of a known eQTL/sQTL locus (rs9610445; chr. 22: 36201698: A>C) is shown in Fig. 4b. The alternative ‘C’ allele is associated with lower expression of APOL4 in GTEx data (sum of single effects (SuSiE) ‘β posterior’ = −0.709; posterior inclusion probability (PIP) > 0.9), and both the RNA-seq coverage and direction of effect are recapitulated by the predictions of AlphaGenome (variant score = −1.52; quantile score = −1.00; Methods). Furthermore, ISM around the variant indicates a possible nucleotide splice donor sequence motif, which is disrupted by the variant, thereby leading to an aberrant transcript and reduced expression (Fig. 4b; inset). More examples of eQTL variant prediction and mechanistic interpretation by ISM are shown in Supplementary Fig. 8.

Using fine-mapped GTEx eQTL as ground truth, we benchmarked AlphaGenome against the SOTA models Borzoi2 and Enformer1. AlphaGenome demonstrated improved prediction of both the magnitude (‘coefficient’; Spearman ρ with SuSiE25 β posterior) and direction (‘sign’; area under the receiver operating characteristic (auROC) curve) of eQTL effects compared with the previous SOTA (Borzoi) (Fig. 4c–f). AlphaGenome improved the tissue-weighted mean Spearman ρ from 0.39 to 0.49 and the mean sign auROC from 0.75 to 0.80. These improvements in coefficient and sign prediction were observed broadly across most of GTEx tissues, variant-to-transcription start site (TSS) distance bins and variant functional annotation classes (Extended Data Fig. 5a–c).

As previously reported26, performance decays with distance to the target gene across all eQTL tasks (Extended Data Fig. 5f). However, AlphaGenome exhibited mild improvement on sign/coefficient prediction for distal variants (greater than 35 kb; Fig. 4f and Extended Data Fig. 5b), and top-scoring predictions exhibited high sign accuracy across distance categories (Extended Data Fig. 5g). Additionally, AlphaGenome outperformed Borzoi on coefficient and sign prediction for indel (insertion or deletion) eQTLs (Fig. 4c,e and Extended Data Fig. 5d). Notably, the effect size predictions of AlphaGenome for high-confidence eQTLs (scores greater than 99th percentile of common variant effects) highly correlate with observed effects (Spearman ρ = 0.73; Extended Data Fig. 5h) and consistently surpassed Borzoi’s performance across various quantile score thresholds (such as Spearman ρ 0.73 versus 0.61 at approximately 99th percentile threshold; Extended Data Fig. 5i).

The overall improvement in prediction accuracy, particularly for the sign of a variant’s effect, translates to substantial gains in practical applications. At a score threshold yielding 90% sign prediction accuracy, AlphaGenome recovered over twice as many GTEx eQTLs (41%) as Borzoi (19%; Fig. 4g). Applying this improved sign prediction capability to genome-wide association study (GWAS) interpretation, we evaluated the ability of AlphaGenome to assign a direction of effect to candidate target genes for 18,537 GWAS credible sets (Methods). Using a threshold calibrated to 80% accuracy on eQTLs (Fig. 4g), AlphaGenome assigned a confident sign prediction for at least one variant in 49% of GWAS credible sets (11% using a conservative PIP-weighted scoring approach; Fig. 4h). AlphaGenome and a widely used co-localization method for sign prediction (COLOC H 4 > 0.95)27 resolved the direction of effect for largely non-overlapping sets of loci (Fig. 4h). This indicates their complementary utility, collectively increasing the total yield of loci with determined effect directions. Furthermore, AlphaGenome resolved approximately 4-fold more credible sets in the lowest minor allele frequency quintile compared with COLOC, probably reflecting its reduced dependence on population genetics parameters that affect power to detect associations (Fig. 4h, stratified bars). Thus, AlphaGenome expands our ability to generate functional hypotheses about the direction of GWAS signals, particularly for low-frequency variants.

The performance of AlphaGenome on distinguishing fine-mapped eQTLs from distance-matched variants (‘causality’; auROC) was comparable with Borzoi (Fig. 4i). However, leveraging the predictions of AlphaGenome within a supervised framework boosted performance on the causality task; training a random forest model using the scores of AlphaGenome from several modalities increased the mean auROC from 0.68 to 0.75, also surpassing previous SOTA performance (mean auROC 0.71 for Borzoi; Fig. 4i). Notably, using features derived from variant scores across all predicted modalities provided a performance uplift in this supervised setting compared with using RNA-seq-derived scores alone (Extended Data Fig. 5e), highlighting the practical benefit of the multimodal predictions of AlphaGenome for identifying causal expression-modulating variants.

Competitive enhancer–gene linking

We then assessed whether AlphaGenome can link enhancer elements to their target genes, given that tissue-specific gene expression is modulated by enhancer–promoter interactions, often involving enhancers in distal genomic regions. For this task, we leveraged an independent CRISPRi perturbation dataset from the ENCODE–rE2G study12. Evaluated zero-shot, AlphaGenome outperformed Borzoi in identifying validated enhancer–promoter links, particularly for enhancers located beyond 10 kb from the TSS of the target gene (Fig. 4j, Extended Data Fig. 7a and Methods), although both models still underestimate the impact of very distal enhancers (Extended Data Fig. 7d). Furthermore, the zero-shot performance of AlphaGenome was comparable (within 1% auPRC) with the ENCODE–rE2G (extended) model, which was explicitly trained on this task and cell line data (Fig. 4j). It also strongly outperformed the simpler DNase-based ENCODE–rE2G model and a distance-to-TSS baseline. Beyond their stand-alone predictive power, features derived from AlphaGenome improved the supervised enhancer–promoter linking models. Incorporating AlphaGenome predictions into the ENCODE–rE2G (extended) model yielded a new SOTA performance across all distance-to-TSS categories (Fig. 4j, Extended Data Fig. 7b,c and Methods).

Altogether, these results demonstrate the improved capacity of AlphaGenome to capture long-range functional regulatory connections directly from sequence, which is vital for interpreting distal genetic variants.

... continue reading