The registry of cCREs
Anchoring cCREs
We anchored cCREs on two classes of elements: rDHSs and representative transcription factor clusters (TF rClusters). To annotate rDHSs, we refined and filtered DNase hypersensitivity site calls across biosamples and then iteratively clustered and selected the strongest DHS as previously described9. To annotate TF rClusters, we downloaded peak BED files for all transcription factor ChIP–seq experiments from the ENCODE portal with a FRiP score >0.003. For each experiment, we used the ‘preferred default’ peak file. To match the cCRE size distribution, we resized peaks to 150–350 bp. Using the same iterative clustering and selection process that we used for rDHSs, we identified representative transcription factor peaks for each cluster. We then selected all TF rClusters representing at least five experiments that did not overlap an rDHS. For each anchor (rDHS or TF rCluster), we calculated the z-scores of the log-transformed signal for each DNase experiment within a biosample to normalize across datasets. cCREs were defined as regions with a DNase maximum z-score >1.64 or overlapping a TF rCluster.
Classifying cCREs
We classified cCREs into eight classes by analysing DNase-seq, H3K4me3, H3K27ac and CTCF ChIP–seq signals, and transcription factor binding across biosamples. For each cCRE, we calculated z-scores of the log-transformed signal for each mark within each biosample. The maximum z-score for each mark was determined across all biosamples. Based on combinations of high or low signals for these marks, genomic distance to annotated TSSs and overlap with TF rClusters, cCREs were categorized into promoter-like, enhancer-like (TSS-proximal or distal), CA-H3K4me3, CA-CTCF, CA-TF, CA or TF classes. Classifications were performed both agnostic of cell type and in specific biosamples, following previously published methods.
Gene annotations
Unless otherwise stated, all analyses used GENCODE 40 basic gene annotations for human and GENCODE M25 basic annotations for mouse. ENCODE RNA-seq data were uniformly processed by the ENCODE DCC using GENCODE 29 comprehensive for human and GENCODE M21 comprehensive for mouse.
Overlap with repeat elements
We downloaded repeat element annotations from the UCSC Genome Browser RepeatMasker track51 (https://www.repeatmasker.org; accessed November 2021; n = 5,633,664 elements) and used BEDTools (v.2.19.1)52, to calculate the fraction of cCREs overlapping repeat elements by major class (for example, LINEs, SINEs and LTRs). For cCRE sets that showed significant enrichment, we further analysed overlaps with specific repeat families within each class (for example, L1 and L2 for LINEs). Statistical significance was assessed using a two-sided Fisher’s exact test with FDR correction. This analysis was performed for multiple cCRE subsets, including multi-mapping cCREs and silencer cCREs.
Mammalian conservation
... continue reading