The diverse species that make up the human gut microbiome evolve throughout the lifetimes of individual hosts and over longer timescales across many host colonization cycles. Recent work has shown that rapid evolution within hosts is common among commensal gut bacteria, with new mutations often arising and sweeping to high frequency in healthy adults over the course of days to months even in the absence of obvious perturbations such as antibiotics1,2,3,4,5. However, far less is known about the longer-term dynamics of adaptations as they spread across hosts.
A new adaptation first appearing in one host’s microbiome may potentially spread to many hosts through strain transmission and subsequent horizontal gene transfer (HGT). The human gut microbiome is known to be a hotspot for HGT6,7,8, allowing adaptive alleles to be recombined easily onto new genetic backgrounds. HGT has been shown to play a crucial role in the transmission of some genes, such as antibiotic resistance genes9, especially across species boundaries. However, the extent to which HGT facilitates the spread of adaptive alleles among strains of the same species of commensal gut microbiota, especially by homologous recombination, is unclear.
When an adaptive allele spreads in a population by means of a ‘gene-specific’ selective sweep, other nearby ‘hitchhiking’ variants, which may be neutral or even deleterious, will be transferred together with the adaptive variant. As a result, the same genomic sequence bearing both the adaptive allele and the hitchhikers will appear in otherwise distantly related strains present in different host microbiomes6,10,11. This local sequence sharing will result in a distinct signature of elevated linkage disequilibrium (LD)—a measure of the correlation between alleles at different positions—in the vicinity of the adaptive allele relative to the genomic background.
Whereas local elevations in LD have long been leveraged as a signature of selection in sexual eukaryotes12,13,14,15,16,17, LD-based scans for selection in bacteria have been limited so far18. One reason could be that the pervasiveness and dynamics of recombination in many species of bacteria, particularly gut commensal bacteria1,6,19, are just starting to emerge. Moreover, LD-based statistics can be confounded by other non-selective evolutionary forces, such as demographic contractions, which can also result in elevations in LD20. However, the understanding of how such forces operate among gut bacteria is still nascent21.
We suggest that one way to detect recombination-mediated selective sweeps in bacteria while controlling for non-selective forces is to compare LD between non-synonymous versus synonymous variants. Specifically, we expect common non-synonymous variants to have higher LD than synonymous variants in the vicinity of adaptive loci that have swept to high frequency (Fig. 1a). Although both types of variant are subject to the same non-selective forces, synonymous variants are far more likely to be neutral. The vast majority of non-synonymous mutations, by contrast, are deleterious in any population22, and are thus generally rare23,24. However, initially rare non-synonymous variants in the vicinity of a new adaptive mutation may hitchhike to high frequency during a sweep25 and will therefore be found predominantly on haplotypes bearing the adaptive mutation. By contrast, synonymous variants can reach high frequency through neutral drift alone, in addition to hitchhiking during a sweep, and may be found both on sweeping and non-sweeping haplotypes. Thus, in the vicinity of a selective sweep, we expect that common high frequency non-synonymous variants will exhibit high LD with one another, while equally common synonymous variants will exhibit lower LD.
Fig. 1: LD among common non-synonymous versus synonymous variants during a gene-specific selective sweep. a, During a gene-specific selective sweep, a genomic fragment bearing an adaptive variant transfers between strains. Here, each horizontal line represents a bacterial haplotype from a different host’s microbiome. The yellow region of each haplotype represents the fragment bearing the adaptive allele that has recombined onto different strains. b, \({r}_{{\rm{N}}}^{2}\) and \({r}_{{\rm{S}}}^{2}\) among common variants under neutrality. c, AUC\(({r}_{{\rm{N}}}^{2}-{r}_{{\rm{S}}}^{2})\) among common variants for which purifying selection is of strength \({s}_{{\rm{D}}}={-10}^{-3}\) and beneficial selection is of strength \({s}_{{\rm{B}}}={10}^{-2}\). See Supplementary Figs. 1–3 for \({r}_{{\rm{N}}}^{2}\) and \({r}_{{\rm{S}}}^{2}\) measured across a comprehensive set of simulated evolutionary scenarios. d, AUC\(({r}_{{\rm{N}}}^{2}-{r}_{{\rm{S}}}^{2})\) is expected to be greater than zero when s B > s D and both s D and s B are stronger than the effects of drift (1/N e , dashed lines). In this schematic and in all simulations (prior to a demographic contraction), N e = 104. See Supplementary Figs. 1–6 for \({r}_{{\rm{N}}}^{2}\) and \({r}_{{\rm{S}}}^{2}\) measured across a comprehensive set of simulated evolutionary scenarios. Full size image
Here we confirm these hypotheses in simulations, and then show through application of our new statistic—the integrated LD score (iLDS)—to metagenomic data that recombination-mediated selective sweeps are pervasive in gut microbiota.