Dataset collection and processing
Information on case detections in North America
In this study, a detection is defined as a positive PCR test from a collected sample. In Canada, year-round surveillance in wild and domestic populations is coordinated by the Canadian Food Inspection Agency, Environment Canada, the Public Health Agency of Canada and the Canadian Wildlife Health Centre64. In the USA, the United State Department of Agriculture Animal and Plant Health Inspection Service (APHIS) manages HPAI surveillance and testing in wild birds through investigation of reported morbidity and mortality events, hunter-collected game birds/waterfowl, sentinel species/live bird collection, and environmental sampling of water bodies and surfaces43,65. USDA APHIS also surveilles domestic birds using several reporting methods: mandatory testing through the National Poultry Improvement Plan, coordination with state agricultural agencies, routine testing in high-risk areas and backyard flock surveillance66.
Data on detections of HPAI in the USA used in analyses for this study were collected from USDA APHIS. Reports for mammals, wild birds and domestic poultry were all downloaded in November 2023 (download date: 25 November 2023)40. During the time period analysed in this study (November 2021-September 2023), most HPAI detections in the USA were reported in wild birds (Supplementary Fig. 1a). Data on domestic bird detections are reported with information on poultry type (such as duck, chicken) and by whether the farm is classified as a commercial operation or backyard flock. Backyard flocks are categorized by the USDA as operations with fewer than 1,000 birds47,67 and by the World Organization for Animal Health (WOAH) as any birds kept in captivity for reasons other than for commercial production68. Among domestic birds, detections (1,177 total) came predominantly from commercial chickens (9.3%), commercial turkeys (28.5%), commercial breeding operations (species unspecified) (15.3%) and birds designated WOAH non-poultry, which refers to backyard birds (42.3%) (Supplementary Fig. 1b). Other domestic bird detections occurred in game bird raising operations (2.5%) and commercial ducks (2.0%). The North American epizootic has impacted a broad range of mammalian hosts, with detections (399) reported in red foxes (24.3%), mice (24.1%), skunks (12.2%) and domestic cats (13.2%). Other mammalian hosts (26.2%) represent a wide range of species including harbour seals, bobcats, fishers and bears (Supplementary Fig. 1c).
Genomic data processing and initial phylogenetics
We downloaded all available nucleotide sequencing data and associated metadata for the haemagglutinin protein of all HPAI clade 2.3.4.4b H5Nx viruses from the GISAID database on 25 November 2023 (ref. 69). For each subset of the data described for further phylodynamic modelling, the following process was followed. We first aligned sequences using MAFFT v.7.5.20, sequence alignments were visually inspected using Geneious and sequences causing significant gaps were removed and nucleotides before the start codon and after the stop codon were removed70,71. We deduplicated identical sequences collected on the same day (retaining identical sequences that occurred on different days). We identified and removed temporal outliers for all genomic datasets by performing initial phylogenetic reconstruction in a maximum-likelihood framework using IQtree v.1.6.12 and the program TimeTree v.0.11.2 was used to remove temporal outliers and to assess the clockliness of the dataset before Bayesian phylogenetic reconstruction72,73. This resulted in a dataset of 1,824 sequences that were used in further analyses (Supplementary Fig. 17).
Biases in genomic data and N e inference
Sequencing data sampled in North America are heavily skewed toward sequences from the USA (USA, 1,590; Canada, 224; Central America, 8), and from the first 6 months of the outbreak, with 74% of all available sequences sampled from January to July 2022 (Supplementary Fig. 2). To evaluate whether sequencing data reflect case detections, we inferred the viral N e —a measure of viral genetic diversity shown to be mathematically related to disease prevalence and the disease transmission rate26. We inferred N e using a nonparametric population model (Skygrid), which captures relative changes in genetic diversity and the variability of growth rate in the virus population over time, providing a proxy for epidemic dynamics as previously described. N e is modestly correlated with detections (highest Spearman rank correlation: 0.65, P = 4.4 × 10−11) (Fig. 1c and Supplementary Figs. 3 and 4), with peaks in N e preceding peaks in detections by about 1 week (Supplementary Fig. 5), probably reflecting the lag between viral transmission and case detection. We interpret these results to suggest that, despite uneven sequence acquisition across time, the diversity of sampled sequences roughly reflect the amplitude of H5N1 cases. Given these results, we opted to use sequencing data for the entire sampling period for broad inferences on introductions and geographical spread, but supplement these analyses with controls for sampling differences between groups. For more-intensive reconstructions of transmission patterns between wild birds, commercial poultry and backyard birds, we focus on the initial 6-month period with the most densely sampled data, coupled with experiments to assess the impacts of sampling on results. Finally, although we retained data from Canada and Central America for all subsequent analyses, our results are probably most informative about transmission within the USA due to the heavy skewing of data towards the USA.
AVONET database
We downloaded the AVONET database for avian ecology data and merged it to available host metadata from GISAID for each sequence74. We used the species if provided to match the species indicated in the AVONET database. If host metadata in GISAID was defined using common name for a bird, we determined the taxonomic species name and used that for further merging with the AVONET data (for example, ‘mallard’ was replaced with Anas platyrhynchos) for the given region to match the species to its respective ecological data. Domesticity status (whether a sequence was isolated from a wild host or a domestic host) was determined using available metadata downloaded from GISAID using the ‘Note’ and ‘Domestic_Status’ fields in sequence associated metadata. Moreover, if a given sequence strain name (in the field ‘Isolate_Name’) indicated domestic status (for example, A/domestic_duck/2022) these sequences were labelled as belonging to domestic hosts.
... continue reading