Tech News
← Back to articles

Insights into DNA repeat expansions among 900,000 biobank participants

read original related products more articles

These results show how biobank WGS datasets contain abundant information about unstable DNA repeats. We observed that somatic expansion (in blood) of some repeats is under strong genetic control. Different repeats appear to be affected by a largely shared set of common alleles at DNA-repair genes, but the relative influence of these alleles varied across repeat loci, and their effect directions even varied: common modifier haplotypes at several loci appeared to have opposite effects on blood instability of the TCF4 CAG repeat compared with other repeats, including the HTT CAG repeat. These results, along with the repeat locus specificity that we observed in the relative mutation rates of repeats in blood versus the germline, reinforce recent evidence of tissue specificity of genetic modifiers of HTT expansion in blood versus brain18, pointing to highly complex regulation of somatic repeat expansion that varies across repeats and cell types. The modulation of genetic effects by locus-specific effects may suggest roles for locus-specific chromatinization or transcriptional dynamics and will be an interesting area for mechanistic studies.

The clear and strong differences in genetic effects on repeat expansion of different repeats and in different tissues suggest a need for care and caution in efforts to use DNA repeats in clinically accessible tissues (such as blood) to inform on the status of somatic expansion in disease-relevant tissues (such as brain). However, our results also suggest the potential for repeats that are unstable in blood to be used as biomarkers of target engagement for future expansion-slowing therapies. We identified several repeat loci at which common alleles expand in blood as humans age, at rates that are strongly influenced by genetic modifiers (for example, at MSH3). Future analyses using long read sequencing to identify hypermutable loci with longer alleles10 may detect even better candidates.

The deep phenotype data available in biobank datasets also enabled us to observe evidence suggestive of a dominant DNA-repeat disorder involving highly expanded 5′-UTR repeat alleles in GLS, which was associated with a severalfold higher risk of kidney and liver diseases. Large WGS cohorts provide an opportunity to identify pathogenic rare alleles that, despite their strong effects on disease risk, have not been identified to date owing to their low penetrance in families. Analyses of the phenotypic effects of common repeat variation, which we did not undertake here, may reveal subclinical phenotypes and may also resolve the question of whether intermediate-length alleles of pathogenic repeats have any beneficial effects (that could in principle cause them to persist in human populations); association analyses conducted to date6,7 have not detected evidence of such effects.

Analysis of repeat instability in population biobanks does have several limitations. Although we could study germline mutation rates by analysing IBD among unrelated individuals, we could not assess effects of genetic variation, parental age or parent-of-origin on germline mutability, as this requires ascertaining de novo mutations8,9,10. Moreover, the short-read WGS data that we analysed provided only glimpses of somatic mutation, through observations of one or a few reads spanning shorter mutated alleles and through read-count-based evidence of expanded alleles of unknown lengths. Nonetheless, the analytical tools that we have developed here for biobank-scale WGS analysis provide a useful complement to studying repeat instability in families8,9,10 and in patient cohorts using targeted sequencing techniques18,62, and combining these approaches should provide opportunities for further discovery.