ZOE PREDICT cohorts definition
The ZOE PREDICT programme comprises several distinct studies that together constitute one of the largest multi-omic health initiatives, linking diet, person-specific metabolic responses to foods, and the gut microbiome. In this work, we considered and harmonized five ZOE PREDICT cohorts: PREDICT 1, PREDICT 2, PREDICT 3 US21, PREDICT 3 US22A, and PREDICT 3 UK22A. The PREDICT 1 cohort (NCT03479866) was described previously9,51. In brief, PREDICT 1 enrolled 1,098 participants (n = 1,001 from the UK and n = 97 from the USA) who underwent a clinical visit to collect anthropometric information and blood samples, followed by an at-home phase during which postprandial responses to both standardized tests and ad libitum meals were recorded. Stool samples were collected at home before the in-person clinical visit. The PREDICT 2 study (NCT03983733) had a similar collection protocol to PREDICT 1 but was conducted entirely remotely and included data from 975 people from 48 US states (including the federal District of Columbia and without participants from North Dakota and Hawaii). The PREDICT 3 cohorts (US21, US22A and UK22A) are research cohorts (NCT04735835) embedded within the ZOE commercial product. Participants provide informed written consent for their data to be used for scientific research purposes. In total, 32,621 samples (n = 11,798 for US21, n = 8,470 for US22A and n = 12,353 UK22A) were collected and retrieved. The studies were fully remote, participants completed health and food questionnaires at baseline, and self-collected and shipped stool samples. Cardiometabolic markers were collected as described below. Furthermore, we considered and analysed two registered clinical nutritional intervention studies, namely METHOD36 (NCT05273268) and BIOME37 (NCT06231706), focusing on the microbiome changes and their links with the two derived SGB-level rankings (ZOE MB health-ranks and diet-ranks). All study protocols are registered and available on clinicaltrials.gov through the clinical trials number and link affiliated with each trial.
Sample collection, DNA extraction and sequencing
For the PREDICT 1 cohort, sample collection, DNA extraction and sequencing were described previously9. The PREDICT 2 samples were collected in Zymo buffer, DNA extraction was performed at QIAGEN Genomic Services using DNeasy 96 PowerSoil Pro, and sequencing was performed on the Illumina NovaSeq 6000 platform using the S4 flow cell and targeting 7.5 Gb per sample. The PREDICT 3 samples were self-collected into tubes containing the DNA-Shield Zymo buffer. Sample processing was performed by Zymo and Prebiomics. In brief, DNA extraction by Zymo used the ZymoBIOMICS-96 MagBead DNA kit, whereas Prebiomics used the DNeasy 96 PowerSoil Pro kits. Sequencing libraries were prepared using the Illumina DNA Prep Tagmentation kit, following the manufacturer’s guidelines. Whole-genome shotgun metagenomic sequencing on the Illumina NovaSeq 6000 platform used the S4 flow cell and targetted 3.75 Gb per sample.
All raw sequenced data were quality controlled using the preprocessing pipeline available at https://github.com/SegataLab/preprocessing, which comprises three steps: (1) removal of reads with low-quality (Q < 20), too short (length under 75 nt), or with more than two ambiguous bases; (2) removal of host contaminant DNAs (Illumina’s spike-in phiX 174 and human genomes, hg19); and (3) synchronization of paired-end and unpaired reads.
Dietary data processing
In the PREDICT cohorts, we assessed long-term food intakes using FFQs, which were largely consistent across cohorts. Specifically, for PREDICT 1 participants (UK), we used a modified 131-item European Prospective Investigation into Cancer and Nutrition (EPIC) FFQ52. Participants in PREDICT 2 (USA) were surveyed using a similarly validated Diet History Questionnaire-III FFQ, including 135 items about food and beverages, as well as 26 questions about dietary supplements53. In PREDICT 3 UK22A and US22A, we developed and used a 264-item FFQ adapted from the EPIC-Norfolk Study FFQ and the Diet History Questionnaire-III. Consequently, there is a large overlap between the food items collected across the FFQs; for example, 90% of questions in the EPIC FFQ are included in the PREDICT 3 FFQ. This FFQ also includes additional food items to accurately capture modern eating habits—a limitation of older FFQ versions54. In the PREDICT 3 US21 cohort, FFQs were not collected, and only short-term logged dietary data collected using the ZOE mobile phone app were used instead.
Starting from both long- and short-term dietary data, we computed three versions of the PDI55, namely, the overall PDI, the healthful PDI (measuring the adherence to a healthier plant-based foods diet) and the unhealthy PDI (measuring the intake of unhealthful plant-based foods), as well as the healthy eating index23 (measuring how consumed foods align with dietary guidelines), the alternative Mediterranean diet score (measuring the adherence to a Mediterranean diet)56 and the Healthy Food Diversity (HFD) index (measuring the number, distribution and health value of consumed foods)57. Specifically, to calculate PDIs and the healthy eating index, food items were first assembled into food groups by mapping them onto a ‘food tree’ consisting of a database of nutrient information arranged according to a hierarchical tree structure: level 1 (9 food groups), level 2 (52 food groups) and level 3 (195 food groups). UK foods were mapped onto the Composition of Foods Integrated Dataset (CoFID)58 using food categories or sub-group codes, whereas US foods were similarly mapped onto the US Department of Agriculture Food and Nutrient Database for Dietary Studies database. Level 3 foods were aggregated and harmonized by nutrition scientists to allow for comparisons across cohorts. The Mediterranean diet and HFD scores were calculated as described previously9.
Host health and anthropometric marker collection
In PREDICT 1, sex and age were self-reported, whereas height, weight and blood pressure were measured at a clinic visit (day 0). At the clinic visit, participants were also fitted with wearable continuous glucose monitor CGM) devices (Abbott Freestyle Libre Pro (FSL)), visceral fat mass was measured using dual-energy X-ray absorptiometry scans following standard manufacturer’s recommendations (DXA; Hologic QDR 4500 plus) and fasting GlycA was measured using a high-throughput NMR metabolomics (Nightingale Health) 2016 panel. Fasting and postprandial venous blood samples were also collected at the clinic; plasma glucose and serum total cholesterol, HDL-C and triglycerides were measured using Affinity 1.0, and whole blood HbA1c% was measured using Viapath. The ten-year ASCVD risk was calculated as per the 2019 American College of Cardiology (ACC) and American Heart Association (AHA) clinical guidelines59. Additional data were collected over the subsequent 13-day period at home; postprandial responses to eight standardized meals (seven in duplicate) of differing macronutrient (fat, carbohydrate, protein and fibre) content were measured using CGMs and dried-blood-spot analysis as described previously13. T2D and hyperlipidemia were self-reported via health questionnaires. The PREDICT 2 and PREDICT 3 studies were fully remote. Sex, age, height, weight and blood pressure were self-reported, and fasting and postprandial responses for total cholesterol, HDL-C, triglycerides and HbA1c were assessed using whole blood finger-prick samples collected at home using dried-blood-spot analysis by commercial laboratories (CRL, Eurofins Biomnis). CGMs were fitted at home by participants. A selection of standardized meals smaller than in PREDICT 1 was tested in PREDICT 2 and PREDICT 3 (a metabolic challenge meal, and medium-fat and carbohydrate breakfast and lunch meals). Some of the considered markers represent the same metabolic function over time and showed positive correlations between their fasting and postprandial measurements, whereas others represent opposite types of the same biomolecular pathway and showed negative correlations among them (Supplementary Table 4).
... continue reading