Biobank Japan
The BBJ is a prospective hospital-based biobank with 267,289 participants, all of whom were diagnosed with at least one of the target diseases of BBJ by physicians at the cooperating hospitals50,51,52. All of the participants provided written informed consent approved by the ethics committees of the Institute of Medical Sciences, the University of Tokyo and RIKEN Center for Integrative Medical Sciences. The BBJ comprises two cohorts, which were genotyped separately: the first (BBJ1, N = 182,536) and second (BBJ2, N = 68,534) cohorts. The participants in BBJ1 were genotyped with the Illumina HumanOmniExpressExome BeadChip or a combination of the Illumina HumanOmniExpress and HumanExome BeadChip, whereas the participants in BBJ2 were genotyped with the Illumina Asian Screening Array. All BBJ1 participants and 17% of the BBJ2 participants (N = 11,716) were recruited from 2003 to 2008. The remaining BBJ2 participants (N = 56,818) were recruited from 2013 to 2017.
Definition of the discovery and replication cohorts
We used BBJ1 (BBJ2) as the discovery (replication) cohort.
Quality control of genotype data
We conducted a quality control of the participants and the genotypes, and excluded sample relatedness in BBJ1 via the same approach described previously53. The genotype data were imputed with 1000 Genomes Project Phase 3 (N = 2,504) and Japanese whole-genome sequencing data (N = 1,037) using Minimac3 software54. We excluded variants with an imputation quality of R sq < 0.7 or a minor allele frequency (MAF) of less than 0.01, resulting in 7,444,735 autosomal variants analysed in total. We analysed 166,757 participants of the Japanese population as estimated by the visual inspection of principal component analysis (PCA).
In BBJ2, we excluded participants with a low call rate (<0.98) and outliers from the Japanese Hondo (that is, the main islands) cluster estimated on the basis of PCA. We excluded the variants meeting the following criteria: (1) with a low call rate (<0.99); (2) with low minor allele counts (<5); and (3) with a Hardy–Weinberg equilibrium test P value of <1.0 × 10−10. We performed statistical phasing of the genotype data using Shapeit4 (ref. 55) and imputation using Minimac4 (ref. 56) with the same reference panel as used in the discovery cohort. After imputation, we excluded variants with an imputation quality of <0.7 or a MAF less than 0.01. We used King57 to exclude relatives within second degrees, resulting in 65,373 participants being analysed.
UK Biobank
The UKB is a population-based biobank with approximately 500,000 participants recruited between 2006 and 2010, aged 40–69 years58. Participants were genotyped using either the UK BiLEVE Axiom Array or UK Biobank Axiom Array. The genotypes were then imputed by IMPUTE4 software using a combination reference panel of the Haplotype Reference Consortium, UK10K and 1000 Genomes Project Phase 3. We accessed the UKB data under the project number 47821.
Definition of the discovery and replication cohorts
... continue reading