Previous analyses of pre-LECA gene duplications have delivered contradictory conclusions, suggesting either that genes of archaeal2 or bacterial3 ancestry accumulated more duplications along their respective eukaryotic stems. Our analysis focuses on pre-LECA duplications involved in eukaryotic apomorphies for which we were able to obtain alignments and gene family trees of sufficient quality for molecular clock analyses. We found many more genes of archaeal than bacterial origin with those qualities, consistent with previous work showing that eukaryotic genes of archaeal origin are in general more highly conserved36, although this pattern may simply reflect a greater number of gene duplications on the archaeal versus alphaproteobacterial eukaryotic stem lineages.
Comparing molecular clocks to other methods
Previous studies have used gene tree branch lengths, expressed in number of substitutions per site and normalized by relative evolutionary rates post-LECA, to infer the relative timing of events during eukaryogenesis, finding support for a relatively late mitochondrial endosymbiosis2,12. Relaxed clock methods provide crucial additional information for investigating these questions. First, clock methods link sequence divergence to the geological record, constraining the timing of key steps in eukaryogenesis to absolute time and, therefore, environmental context. Second, clock models provide a more flexible way to model variation in evolutionary rate through time, based on all of the available calibrations and sequence data—although we acknowledge that the real pattern of rate heterogeneity during eukaryogenesis and its associated HGT and duplications is likely to be more complex than captured by any current methodology. Finally, the Bayesian relaxed clock framework provides a natural way to propagate time information from the dated species tree—estimated using more calibrations and sequence data—to the individual gene trees, greatly ameliorating the difficulties resulting from the limited signal of short single-gene alignments. We note that this hierarchical framework implies that the inferred ages of gene duplications are informed by the ages of nodes on the dated species tree. For example, the inference that mFECA is younger than nFECA (Fig. 1) supports younger ages among duplications of alphaproteobacterial origin than those of Asgard origin, although sensitivity analyses demonstrate that this conclusion is robust to substantial variation in species tree ages (Supplementary Note 3).
Timing eukaryogenesis
The timescale of lineage divergence in which we estimated the timing of gene duplications is broadly consistent with previous molecular dating analyses (Fig. 5a). Our estimates for the age of nFECA at 3.05–2.79 Ga and mFECA at 2.37–2.13 Ga are among the oldest, compared to a range of 2.90–2.09 Ga and 2.70–0.91 Ga, respectively, from previous studies19,37,38,39,40. Our 1.80–1.67 Ga estimate for LECA falls within the 2.39–0.95 Ga range of previous divergence time estimates19,37,38,39,40,41,42,43. These timescales are all based on relaxed molecular clock methods, but their differences reflect their underpinning data (sequence data), assumptions (clock model) and the nature of the calibrations used to disambiguate rates and times. Our estimates are closest to timescales that are based, as here, on analyses that used topology-based calibrations19,38,40. These enforce across-tree relative age constraints that reflect donor–recipient HGT and endosymbiosis events, spreading the limited temporal information from traditional node calibrations across the tree. As such, these timescales might be considered among the most realistic and, among them, our study has used among the most calibration constraints and sequence data. Our timescale infers a rapid radiation of the eukaryotic supergroups within about 300 million years (Myr) of LECA, consistent with the ‘Big Bang’ hypothesis of crown eukaryote diversification44.
Fig. 5: Timeline of development for eukaryotic key apomorphies. a, Our time-resolved species tree enables us to set a timeline for eukaryogenesis. Compared with other studies, our dates for nFECA and mFECA are among the oldest, whereas our date for LECA is intermediate19,37,38,39,40,42,43,69,70,71. b, Based on duplications in specific eukaryotic systems, we suggest a timeline for the emergence of these features. Vertical lines are suggested minimum limits for the emergence of features, and dashed horizontal lines denote the period of time for possible development and emergence. c, A tentative model that considers the interdependency of these features (arrowheads imply dependency; lines without arrowheads imply co-emergence but with as yet undetermined order). Data in a,b are aligned to the time axis, whereas in c, the nodes are grouped in relation to the nFECA and mitochondrial endosymbiosis boundaries. EGT, endosymbiotic gene transfer; ER, endoplasmic reticulum; MTOC, microtubule-organizing centre. Full size image
Given that our analysis is constrained by geologic evidence, it is pertinent to reflect on what aspect of eukaryote diversification the geologic record represents, not least because so little of it is used in calibration. Sterane records from the Neoarchaean Fortescue Supergroup of Australia45 have been attributed to contamination46 and, indeed, pre-Proterozoic biomarker records are generally considered questionable47. Large vesicles, compatible with (but not deterministic of) eukaryote affinity, are known from the Mesoarchean Moodies Group of South Africa48. Otherwise, the oldest widely accepted fossil eukaryotes are Dictyosphaera, Shuiyousphaeridium, Tappania and Valeria from the late Palaeoproterozoic (around 1.78 Ga) McDermott Formation of the Northern Territory, Australia49 and the (approximately 1.64 Ga) Changcheng and Ruyang groups of North China50,51,52,53,54,55, though their eukaryote affinity is based largely on inference of an actin cytoskeleton which evolved among archaeal ancestors22. Qingshania, also from the Chuanlinggou Formation, has a greater claim on eukaryote affinity, interpreted as a multicellular archaeplastid and, therefore, a late Palaeoproterozoic (approximately 1.63 Ga) crown eukaryote56. Otherwise, there is clear evidence of archaeplastids from the latest Mesoproterozoic57 and earliest Neoproterozoic58, among others59, and possible Amorphea from the latest Mesoproterozoic60. Given the sparse nature of the fossil record through the Archaean to much of the Mesoproterozoic, we should not anticipate that these oldest records approximate clade age, but there is good fossil evidence for archaeplastids and, therefore, crown eukaryotes having diverged deep in the Proterozoic, as our timescale suggests. Although some of these records may be of metabolically active cells61, the majority are cysts compatible with the prevalence of encystment among crown eukaryotes, the challenging environmental redox conditions that prevailed during eukaryogenesis and, evidently, crown eukaryote diversification.
There has been considerable debate about the environmental context of eukaryogenesis and of eukaryote diversification. It has long been argued that oxygenation of the biosphere, the Great Oxidation Event62 (GOE; 2.43–2.22 Ga), was an environmental driver underpinning mitochondrial endosymbiosis and the origin of eukaryotes and, indeed, our evolutionary timeline precludes a syntrophic association between nFECA and mFECA prior to the GOE. However, the formative stages of eukaryogenesis are likely to have taken place under anoxia or hypoxia, as oxic conditions were limited to the surface waters of the Earth’s oceans for much of the Proterozoic8,9. Specifically, our analyses point to an origin among archaea in the late Archaean, mitochondrial endosymbiosis almost coincident with the GOE, and diversification of crown eukaryotes in the late Palaeoproterozoic.
Testing hypotheses of eukaryogenesis
From dated duplications in genes underpinning eukaryotic features, we suggest a model (Fig. 5b) in which the archaeal host progressively complexified through the sequential evolution of various characters before mitochondrial endosymbiosis. Our data suggest that duplications in archaeal membrane and cytoskeletal protein families first developed internal membrane compartments with a role in membrane biogenesis, and later in combination with cytoskeletal development and endocytosis. Although this does not imply that a host cell of crown eukaryote size and system-complexity existed before mitochondrial endosymbiosis, our results suggest that the host had the prerequisites for nuclear compartmentalization (development of the nuclear localization system), possessed an endomembrane system (from the diversification of compartment-specific vesicle trafficking proteins) and had an evolved cytoskeleton (with branched actin-specific paralogues) capable of endocytosis before mitochondrial endosymbiosis. Bacterial genes from the mitochondrial endosymbiont and other sources seem to have driven the membrane transition and further development of the endolysosomal system and to have added to existing nuclear processes such as DNA repair and gene regulation and new ones such as meiosis (Supplementary Discussion 1 and Supplementary Fig. 4).
... continue reading