TSS selection
The selection of relevant TSSs was done as previously described60. In brief, we selected GENCODE-defined TSSs62, with the additional requirement that they are active in at least one cell type or tissue according to the FANTOM5 database63. This process resulted in a curated set of 30,607 TSSs.
Genome-wide MPRA dataset
The initial PARM models for HepG2 and K562 cells were trained using publicly available genome-wide MPRA data (GEO accession GSE128325). We combined the fragments from all the libraries and retrieved those overlapping a window of −300 bp to +100 bp relative to the 30,607 TSSs.
Construction of MPRA libraries
Focused library using DNA fragmentation and capture-based methods
To generate the focused library, 100 µg DNA was isolated from a human cell line (HG02601) using an Isolate II Genomic DNA kit (Bioline, BIO-52066). The isolated DNA was then fragmented using dsDNA Fragmentase (NEBNext, M0348L) for 30 min and subsequently size-selected by gel extraction for fragments sizes ranging from around 200 to 400 bp. DNA was end-repaired using an End-IT DNA End-Repair kit (Lucigen, ER81050) and subsequently A-tailed using Klenow Fragment (3′→5′ Exo-; NEB, M0212M). For cloning purposes, two custom 31 bp dsDNA adapters (oNK46 and oNK47) containing a T-overhang for the 5′ and 3′ ends of the fragments were ligated to the fragments using TA-ligation with a Quick Ligation kit (NEB, M2200L; see Supplementary Table 1 for oligonucleotide sequences). These adapters contain overlaps with the 3′ and 5′ ends of the linear barcoded p101 vector (see ref. 22) to allow Gibson assembly of the fragments after hybridization. PCR amplification was performed using eight cycles with the primers oNK51 and oNK52 and Equinox polymerase (Twist, 104176). To capture the promoter region of the TSS, we selected 30,607 TSSs and their −300 to +100 bp window and ordered a high stringency hybridization capture library from Twist for these custom regions consisting of 127,575 probes. To prevent nonspecific binding of probes to our fragments, we designed custom blockers complementary to the custom Gibson adapters (oNK57 and oNK58). Fragments were captured using the custom hybridization panel according to the manufacturer’s protocols. In brief, 1 µg fragments were hybridized to the custom hybridization panel for 16 h in the presence of specific (custom designed) and nonspecific (COT-1 DNA) blocker solution. Subsequently, hybridized fragments were enriched using streptavidin binding and amplified using the primers oNK51 and oNK52 using PCR for nine cycles. Captured fragments were purified and cloned into the linear barcoded p101 vector with Gibson assembly using HiFi DNA assembly master mix (NEB, E2621L) for 60 min at 50 °C. The Gibson assembly mix was then purified and subsequently transformed into MegaX DH10B ultracompetent bacteria (Thermo, C640003) according to the manufacturer’s protocols. Transformed bacteria were grown in LB overnight and the plasmid library was isolated using a Gigaprep Isolation kit (Thermo, K210009XP).
Oligonucleotide-based libraries
All three synthetic libraries (synthetic promoters, ISM and motif insertions) were generated using synthetic oligonucleotides that were ordered from Twist. The oligonucleotide library was ordered from Twist and amplified using KAPA HiFi Hotstart Readymix (Roche, KK2601) for 14 cycles using oNK69 and oNK71. The library was bead-purified, end-repaired using an End-IT DNA End-Repair kit (Lucigen, ER81050), subsequently bead-purified and digested with EcoRI-HF (NEB, R3101S) and NheI-HF (NEB, R3131S) for 1 h at 37 °C. Fragments were ligated in NheI-HF/EcoRI-HF double-digested vector (an adaptation of the p101 vector64) using T4 DNA ligase (Roche, 10799009001). The ligation mix was purified and transformed as described above.
Cell culture and transfection
... continue reading