Computational prediction of T3SS effectors
To determine which Buchnera proteins are most likely to be secreted by a T3SS, the complete proteome of Buchnera-Ap (UniProt: UP000001806) was run in BastionX, an online tool that predicts the likelihood of a given protein to be a substrate for bacterial secretion systems (https://bastionx.erc.monash.edu/server.jsp)16. The benchmarking parameter for v.2.0 of BastionX was selected for the run (see Supplementary Dataset 1 at Figshare17). An earlier version, Bastion3, had a false positive rate of 4.1%53, and this rate is expected to be lower in BastionX.
To determine the likelihood of SyeA orthologues from diverse Buchnera being substrates for a T3SS, a total of 103 SyeA orthologues was run in BastionX, with the common use parameter for v.2.0 selected (see Supplementary Dataset 2 at Figshare17).
syeA, syeB and flagellar genes in Buchnera
To survey the presence and absence of syeA, syeB and flagellar basal body genes in a wide range of Buchnera strains, the genomes of Buchnera strains uploaded to NCBI GenBank were selected for comparative orthology analysis (see Supplementary Dataset 2 at Figshare17). The most recently uploaded Buchnera genome from each available host aphid species (22 April 2025) was chosen. Contig- and scaffold-level assemblies were excluded. If available, the RefSeq annotation was used over the GenBank annotation. A total of 113 proteomes was collected and run on OrthoFinder v.3.0 (using the default parameters)19 to generate orthogroups. Orthogroups corresponding to the 26 flagellar basal body genes present in Buchnera-Ap (12 flg, 2 flh and 12 fli genes), along with syeA and syeB, were identified using the known accession numbers of the protein sequences from Buchnera-Ap (see Supplementary Dataset 3 at Figshare17). The percentage of total flagellar basal body genes was calculated for each species (see Supplementary Dataset 4 at Figshare17).
Genomes deemed by OrthoFinder to lack syeA were manually examined to check for the presence of divergent syeA homologues. Genomic rearrangements are rare in Buchnera45. To examine the gene neighbourhood across Buchnera strains, we selected conserved neighbouring genes (hslV, hlsU and rho) retained in all lineages and flanking syeA (Fig. 2b). We downloaded these regions as fasta files from NCBI, re-annotated them with Prokka (v.1.14.6)54 and used Clinker55 and Gene Graphics56 for calculating pairwise sequence similarity and for visualization. If present, the corresponding protein sequence was run through NCBI BLASTp to check for hits against SyeA orthologues in other Buchnera strains. In two cases (Buchnera of the related species Cavariella theobaldi and Pterocomma populeum), the syeA region contains a short hypothetical protein on the strand normally encoding syeA flanked by expanded stretches of non-coding DNA, but the hypothetical proteins showed no detectable sequence or structural similarity to SyeA, which was therefore scored as missing. Likewise, Buchnera of tribe Macrosiphini encoded SyeB with recognizable sequence homology across species, whereas Buchnera of tribe Aphidini encoded a short protein in the same position, but without sequence similarity to SyeB. For the 10 genomes missing syeA, we manually examined all unannotated open reading frames (encoding hypothetical proteins) greater than 50 amino acids for sequence homology to SyeA.
To visualize the presence and absence patterns of syeA, syeB and flagellar genes across Buchnera, a published Buchnera phylogeny57 was used to create a cladogram using a select number of Buchnera genomes from host aphid species representing all major subfamilies and collapsing less-supported nodes (Fig. 2a).
Rate of SyeA sequence evolution
To examine the rates of evolution of Buchnera proteins, coding sequences for Buchnera strains Buchnera-Ap (GCF_000009605.1), Buchnera-Ak (GCF_000225445), and Buchnera-Sg (GCF_000007365.1) were accessed from NCBI. Calculations were implemented in R (v.4.3.0) using the package Orthologr58. Only orthologous genes shared by all three strains were included. The dNdS module was used to estimate dN (number of nonsynonymous substitutions per nonsynonymous site) for each gene of Buchnera-Ap paired with its orthologue in each of the other two species.
Structural similarity searches
... continue reading