Tech News
← Back to articles

Specificity, length and luck drive gene rankings in association studies

read original related products more articles

GWAS summary statistics

GWAS summary statistics for 305 continuous traits were downloaded from the Neale Lab (http://www.nealelab.is/uk-biobank/; v3). These regressions were run on inverse rank normal-transformed phenotypes in a subset of the UK Biobank consisting of approximately 360,000 individuals and included age, age2, inferred sex, age × inferred sex, age2 × inferred sex and principal components 1–20 as covariates. We used 5 × 10−8 as the threshold for genome-wide significance unless otherwise stated.

LoF burden test summary statistics

Summary statistics for 292 LoF burden tests were downloaded from Backman et al.4. Two-hundred and nine traits overlapped with traits for which we had GWAS summary data (Supplementary Table 1). Burden genotypes were calculated by calling individuals homozygous for the non-LoF variant at all sites as being homozygous non-LoF, calling individuals homozygous for the LoF allele at any site as being homozygous LoF, and calling all other individuals heterozygotes. Burden tests were run using REGENIE59 on inverse rank normal-transformed phenotypes. For our primary analyses, we used the result of the burden test with mask M1, which only includes variants that are predicted as being LoFs using the most stringent filtering criteria and an allele frequency upper bound of 1%. For analyses including missense variants, we used mask M3, which also includes ‘likely damaging’ missense variants, again upper bounding the frequency of included variants at 1% (see ref. 4 for more details). We used a per-trait genome-wide significance threshold of 2.7 × 10−6, derived by applying a Bonferroni correction to a significance threshold of 0.05 for testing approximately 18,000 genes per trait.

A subset of genetically uncorrelated traits

The set of 209 quantitative traits included some that were highly correlated, such as sitting height and standing height. For certain analyses, we selected a subset of 27 traits that were not highly correlated by intersecting the 209 traits with those analysed by Mostafavi et al.45 (Supplementary Table 1). In brief, the trait list was pruned to ensure that all pairwise genetic correlations, as reported by the Neale laboratory, were below 0.5, prioritizing traits with higher heritability. Biomarkers were excluded from this subset because their genetic correlations with other traits were not provided by the Neale laboratory. Genetic and phenotypic correlations between these 27 traits as reported by the Neale laboratory are listed in Supplementary Table 2. Genetic correlations ranged between −0.3096 and 0.2742. Phenotypic correlations for eight trait pairs were missing from the Neale laboratory (all including the trait ‘heel quantitative ultrasound index, direct entry’). The remaining phenotypic correlations ranged between −0.2117 and 0.1972.

We used this subset of traits to ensure that our results (Figs. 3b–d and 4b,c and Extended Data Figs. 1a–c and 3a–c) were not driven by many correlated phenotypes all sharing the same underlying biology. As such, slight correlations between these phenotypes should not substantively affect our results or interpretations.

Defining GWAS loci

For a systematic comparison of discoveries between GWAS and burden tests (shown in Fig. 1c,d), we grouped GWAS variants into large, non-overlapping genomic loci. This approach avoids multiple counting of the same GWAS genes, as nearby hits within a locus may map to the same gene, and it provides a conservative estimate of the overlap between GWAS and burden test results as described below.

We focused on 151 quantitative traits with at least one burden test hit and one GWAS hit. For each trait, we analysed the set of LD-clumped hits (\(P < 5\times {10}^{-8}\), clumping \({r}^{2} < 0.1\)) from 8,136,100 filtered SNPs provided by Mostafavi et al.45. A secondary analysis (Supplementary Figs. 11–13) used the same LD-clumping pipeline but with a stricter threshold of \({r}^{2} < 0.01\).

... continue reading