Identification of noncoding regulatory variants impacting the pharmacogenomics of ALL treatment
Single-nucleotide variants (SNVs) impacting diverse pharmacological traits in ALL were identified for functional interrogation. We chose SNVs associated with relapse or persistence of MRD after induction chemotherapy in childhood ALL patients to investigate the role of inherited noncoding regulatory variants impacting clinical phenotypes (i.e., treatment outcome). These SNVs were identified from published GWAS of ALL patients enrolled in St. Jude Childrens Research Hospital and the Childrens Oncology Group clinical protocols3,4,5 (see Methods for variant selection criteria). Variant selection also included prioritization for treatment outcome SNVs associated with drug resistance phenotypes in primary ALL cells to enrich for variation impacting ALL cell biology (see Methods for variant selection criteria). These treatment outcome-associated variants, as well as all variants in high LD (r2>0.8) with the sentinel GWAS variants, were further evaluated (Fig.1a, b).
a SNVs of interest from GWAS were pursued based on association with ex vivo chemotherapeutic drug resistance in primary ALL cells from patients and/or treatment outcome. Dex dexamethasone, Pred prednisolone, VCR vincristine, 6MP 6-mercaptopurine, 6TG 6-thioguanine, LASP L-asparaginase. b GWAS SNVs were combined with ALL disease susceptibly control GWAS SNVs and SNVs in high LD (R2>0.8) and c mapped to accessible chromatin sites in ALL cell lines, ALL PDXs, and primary ALL cells from patients. Of the 1696 SNVs mapped to accessible chromatin sites, 35 are control SNVs. Source data are provided in the Source Data file.
We also identified variants directly associated with ex vivo chemotherapeutic drug resistance in primary ALL cells from patients by performing GWAS analyses using SNV genotype information and ex vivo drug resistance assay results for six antileukemic agents (prednisolone, dexamethasone, vincristine, L-asparaginase, 6-mercaptopurine [6MP] and 6-thioguanine [6TG]) in primary ALL cells from 312344 patients (not all patients were tested for all drugs) enrolled in the Total Therapy XVI clinical protocol at St. Jude Childrens Research Hospital (see Methods). We further prioritized functional ex vivo drug resistance SNVs by determining if they were eQTLs in primary ALL cells or related cell types (i.e., whole blood and EBV-transformed lymphocytes) from the Genotype-Tissue Expression (GTEx) consortium37 (see Methods for variant selection criteria). Ex vivo drug resistance variants that were also identified as eQTLs, as well as variants in high LD (r2>0.8) with these sentinel GWAS variants, were further evaluated (Fig.1a, b).
GWAS have also been performed for childhood ALL disease susceptibility and identified several GWAS loci harboring variants with genome-wide significance44,45,46,47,48,49,50. Several follow-up studies of these GWAS loci have identified candidate causal noncoding variants and mechanisms involving gene regulatory disruptions51,52,53. As a result, we used ALL disease susceptibility variants (n=11), as well as variants in high LD (r2>0.8) with them, for further analysis as positive controls in our study (Fig.1a, b).
Because most of these variants map to noncoding portions of the human genome, these data point to disruptions in gene regulation as the underlying mechanism of how these variants impact ALL cell biology. We therefore utilized assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq)54 chromatin accessibility data in 161 ALL cell models, comprised of primary ALL cells (cryopreserved, n=2455; fresh, n=12056), ALL cell lines (n=14) and ALL patient-derived xenografts (PDXs, n=3), to uncover which variants map to putative CREs in ALL cells57 (i.e., regulatory variants; Fig.1c). Although we detected variation in ATAC-seq TSS enrichment scores and peak counts that is to be expected from such a large, mixed cohort of ALL cell models, the peaks called were largely reproducible (found in >3 samples) within each group (Supplementary Fig.1ac). ATAC-seq data from primary ALL cells, ALL cell lines, and PDXs were combined and identified 1696 regulatory variants at accessible chromatin sites in ALL cells for functional investigation (Fig.1c and Supplementary Data1).
To examine the functional effects of these 1696 regulatory variants on transcriptional output in a high-throughput manner we utilized a barcode-based MPRA platform29,32 to measure differences in allele-specific transcriptional output (Fig.2a). Oligonucleotides containing 175-bp of genomic sequence centered on each reference (ref) or alternative (alt) variant allele, a restriction site, and a unique 10-bp barcode sequence were cloned into plasmids. An open reading frame containing a minimal promoter driving GFP was then inserted at the restriction site between the alleles of interest and their unique barcodes (Fig.2a). We utilized 28 unique 3UTR DNA barcodes per variant allele (56 barcodes per regulatory variant), and variants near bidirectional promoters (47 total variants) were tested using both sequence orientations. In total, 97,608 variant-harboring oligonucleotides were evaluated for allele-specific differences in gene regulatory activity (Fig.2a).
a Diagram describing design of MPRA (also see Methods). bd Significant MPRA hits were identified by BenjaminiHochberg FDR corrected two-tailed Students T tests. b Distribution of significant changes in allele-specific transcriptional activity across all SNVs. c Number of MPRA SNVs showing significant (Adj. p<0.05) changes in allele-specific transcriptional activity in each ALL cell line. d Pairwise linear correlation between changes in allele-specific transcriptional activity for all significant (Adj. p<0.05) changes across all cell lines. R2 correlation and p value are provided. All source data and statistical parameters are provided in the Source Data file.
Following transfection into 7 different B-cell precursor ALL (B-ALL; 697, BALL1, Nalm6, REH, RS411, SEM, SUPB15) and 3 T-cell ALL (T-ALL; CEM, Jurkat, P12-Ichikawa) human cell lines (n=4 transfections per cell line; 40 total), the transcriptional activity of each allele variant was measured by high-throughput sequencing to determine the barcode representation in reporter mRNA and compared to DNA counts obtained from high-throughput sequencing of the MPRA plasmid pool (Fig.2a). In the 10 cell lines MPRA detected 4633 instances of significant differential activity between alleles across 91% (1538/1696) of regulatory variants tested (Fig.2b, c, Supplementary Data2). The 10 ALL cell lines showed substantial differences in the total number of regulatory variants harboring significant allele-specific activity, which we suspect largely stems from differences in transfection efficiency (Fig.2c). Importantly, when comparing changes in allele-specific MPRA activity for each regulatory variant we found that significant changes in activity (adj. p<0.05) were highly correlated between ALL cell lines, with 87% concordance in allelic-specific activity, suggesting that significant MPRA hits were likely to be robust and reproducible between cell lines (Fig.2d). Allele-specific MPRA activities were also correlated using all pairwise cell line comparisons for each regulatory variant, irrespective of significance (Supplementary Fig.2a). Importantly, 31 of the 35 positive control variants (i.e., ALL disease susceptibility-associated variants and variants in high LD) showed significant allelic effects in at least 1 cell line, and 10 showed significant and concordant allelic effects in at least three ALL cell lines, including two variants (rs3824662 at GATA3 locus and rs75777619 at 8q24.21) directly associated with ALL susceptibility44,49,52 (Supplementary Data2). The risk A allele at rs3824662 was associated with higher GATA3 expression and chromatin accessibility and demonstrated significantly higher allele-specific activity in our MPRA44,52, thereby demonstrating that the MPRA could detect allelic effects previously identified by others.
To further validate MPRA hits in an ex vivo model, we performed MPRA using two B-ALL PDX samples that were freshly harvested from mice. These samples detected 26 and 67 significant gene regulatory variants, respectively, and showed significant correlation with the cell line MPRA data (Supplementary Fig.2b, c, Supplementary Data3). We attribute the detection of relatively lower numbers of variants in PDXs to technical effects stemming from poor transfection efficiency and limited cell survival ex vivo. Overall, our data suggest that the cohort of SNVs tested contained functional regulatory variants with the potential to impact gene regulation.
To further focus on regulatory variants most likely to broadly impact gene regulation in ALL cells, we prioritized 556 variants with significant (adj. p<0.05) and concordant allele-specific activities in at least three ALL cell lines (i.e., functional regulatory variants; Fig.3ad, Supplementary Data4). Most of these functional regulatory variants (318/556) mapped to accessible chromatin found only in primary ALL cell samples, underscoring the importance of incorporating chromatin architecture from primary ALL cells, and 54 functional regulatory variants mapped to transcription factor footprints in primary ALL cells (Supplementary Fig.3). Additionally, we used Genomic Regions Enrichment of Annotations Tool (GREAT) to associate these SNVs with their nearby genes and search for enrichment in gene ontology biological processes pathways58. Although GREAT identified gene associations for nearly all SNVs, we found no significant pathway associations (Supplementary Data4 and 5). Because further functional investigation of variants in primary ALL cells or PDXs ex vivo is largely intractable, we focused on 210 functional regulatory variants that were detected in open chromatin in one of the 14 ALL cell lines that we had generated ATAC-seq data (Fig.3d). Most of these variants (159/210; 76%) were also found in accessible chromatin in PDXs and/or in primary ALL cells from patients (Fig.3d).
a 556 of the 1696 SNVs assayed are functional regulatory variants with reproducible (FDR<0.05 in >2 cell lines) and concordant (same directionality in >2 cell lines) changes in allele-specific activity. b Frequency distribution plot showing the number of cell samples showing concordant and significant MPRA activity of variants. c Plot showing the distribution of log2-adjusted activity between alternative (Alt) and reference (Ref) alleles across 556 functional regulatory variants. 210 SNVs (in blue) mapped to accessible chromatin sites in ALL cell lines and 346 SNVs (in black) mapped only to accessible chromatin sites identified in primary ALL cells and/or PDXs. d Upset plot shows how many functional regulatory variants map to open chromatin in diverse ALL cell models. 210 of the 556 functional regulatory variants are found in accessible chromatin sites that were identified in an ALL cell line. Source data are provided in the Source Data file.
For additional validation using traditional luciferase reporter assays, we prioritized these 210 functional regulatory variants based on allele-specific effect size and selected high-ranking SNVs. Dual-luciferase reporter assays showed similar allele-specific changes in activity to that which was detected by MPRA for 7 SNVs tested (Supplementary Fig.4ak). In fact, a significant positive correlation (p=0.0017) was observed between the allelic effects detected by MPRA and luciferase reporter assays (Supplementary Fig.4l). Together, these analyses assessed the robustness of our MPRA screen of functional regulatory variants and identified 556 SNVs with reproducible and concordant allele-specific effects on gene regulation. Importantly, 210 of the 556 significant hits that were concordant in at least three cell lines were found in open chromatin sites in ALL cell lines and, therefore, warranted further exploration.
To better understand how these variants impact cellular phenotypes, we first determined if the 210 functional regulatory variants found in accessible chromatin sites in ALL cell lines could be directly associated with a target gene. While 35 functional regulatory variants were localized close (2.5kb) to nearby promoters (Fig.4a, Supplementary Data4 and 6), 175 variants were promoter-distal (>2.5kb), and therefore likely to map to CREs with unclear gene targets (Fig.4a). While CREs are often associated with the nearest genes, 3D chromatin looping methods are a more reliable method to associate a CRE with its target gene promoter. In pursuit of evidence-based association of promoters and specific CREs, we performed two related chromatin looping methods, H3K27Ac HiChIP59 and promoter capture HiC (CHiC)39, in 8 of 10 ALL cell lines used in MPRA and determined that 19 of the 175 non-promoter functional regulatory variants showed connectivity to distal promoters in the same cell line where allele-specific MPRA activity and chromatin accessibility were detected (Fig.4a, Supplementary Data6). Interestingly, H3K27Ac HiChIP and promoter CHiC called similar numbers of loops across all 8 cell lines (690,579 versus 660,313, respectively), but promoter CHiC loop calling was more consistent per cell line (Supplementary Fig.5, Supplementary Data7). HiChIP detected no looping at any of the 556 reproducible and concordant SNVs from the MPRA, and the 19 SNVs showing connectivity to a promoter were solely detected by promoter CHiC, further highlighting the utility of this method in GWAS-oriented studies41,60,61,62,63.
a Data show the number of functional regulatory variants mapping to open chromatin in cell lines that associate directly with promoters (within 2.5kb) or that are distally promoter-connected via promoter CHiC. b MPRA data show distal regulatory variants in accessible chromatin (some promoter-connected by promoter CHiC data) exhibit stronger effects on allele-specific activity than promoter-associated functional regulatory variants. ANOVA with KruskalWallis test was performed with Dunns correction for multiple comparisons. c Amongst distally promoter-connected functional regulatory, variants that map to intronic and distal intergenic sequences showed greater activity than those in UTRs. ANOVA with KruskalWallis test was performed with Dunns correction for multiple comparisons. d, e Data show the ranked allele-specific activity distribution of MPRA data for d promoter-associated functional regulatory variants and e distally promoter-connected functional regulatory variants. All source data and statistical parameters are provided in the Source Data file.
In prioritizing functional regulatory variants, we were interested in the gene regulatory impact of variants at TSS-proximal promoter-associated versus TSS-distal promoter-connected CREs as measured by MPRA. Interestingly, we found that SNVs found at TSS-distal open chromatin sites, promoter-associated or not, showed higher allele-specific changes in MPRA activity than those at promoters (Fig.4b). While we acknowledge that many of the 156 variants for which we did not detect a relationship with a promoter are likely to have meaningful gene targets, we focused on CREs containing variants with known gene targets in ALL cells for functional validation. Amongst the TSS-distal promoter-connected functional regulatory variants, we found that distal intergenic and intronic SNVs showed significantly higher allele-specific activity than those in UTRs (Fig.4c). These data suggest that the most robust allelic effects attributable to these regulatory variants are likely to occur at distal intergenic and intronic sites >2.5kb from the TSS of the target gene.
Next, we ranked TSS-proximal promoter-associated and TSS-distal promoter-connected functional regulatory variants by the geometric mean of their significant MPRA data to account for the magnitude of allele-specific activity and the reproducibility of a significant change across ALL cell lines (Fig.4d, e). This analysis identified rs1247117 as the most robust functional regulatory variants, which we then pursued for mechanistic understanding (Fig.4e).
We pursued functional validation of rs1247117 based on its highest-ranking geometric mean of MPRA allelic effect. rs1247117 is in high LD with two GWAS sentinel variants (rs1312895, r2=0.99; rs1247118, r2=1) that are associated with persistence of MRD after induction chemotherapy3. This functional regulatory variant maps to a distal intergenic region harboring chromatin accessibility downstream of the CACUL1 gene, for which it is an eQTL in EBV-transformed lymphocytes37. However, we found that rs1247117 loops to the EIF3A promoter in Nalm6 B-ALL cells (Fig.5a). We, therefore, explored how this accessible chromatin site might recruit transcriptional regulators that would depend on the allele present at rs1247117. For this, we first performed ChIP-seq for RNA pol II and H3K27Ac, which confirmed RNA Pol II occupancy and H3K27Ac enrichment in Nalm6 cells, indicating that rs1247117 is associated with an active CRE (Fig.5a). Through an examination of the underlying DNA sequence spanning rs1247117, we found that the reference guanine (G) risk allele at rs1247117 resides in a PU.1 transcription factor binding motif that is disrupted by the alternative adenine (A) allele (Fig.5b). Although the risk G allele is the reference allele, the alternative A allele is more common in human populations. Supporting PU.1 binding at this location, accessible chromatin profiling in primary ALL cells identified an accessible chromatin site and PU.1 footprint spanning rs1247117 in diverse ALL samples (Supplementary Fig.6a, b). Significantly greater chromatin accessibility at rs1247117 was also observed in heterozygous (GA) patient samples compared to patient samples homozygous for the alternative A allele (Supplementary Fig.6c), and the G allele at rs1247117 harbored significantly greater ATAC-seq read count compared to the A allele (Supplementary Fig.6d). Importantly, we determined that PU.1 was bound at this site in Nalm6 cells using CUT and RUN64 (Fig.5a).
a IGV genome browser image in Nalm6 cells showing the genomic context, chromatin accessibility, and EIF3A promoter connectivity using promoter CHiC of the top functional regulatory variant, rs1247117, with the highest allele-specific MPRA activity. Genomic binding profiles are also shown for RNA polymerase II (RNA Pol2), histone H3 lysine 27 acetylation (H3K27Ac), and PU.1. b rs1247117 lies in a PU.1 binding motif. The human genome reference sequence, Nalm6 genome sequence, location of rs1247117, and PU.1 position weight matrix are shown. c Design of biotinylated DNA probes for in vitro rs1247117 pulldown. d Biotinylated DNA pulldown shows rs1247117 allele-dependent enrichment of PU.1 binding. Blot shown is representative of two independent experiments. Densitometric quantification of two blots is shown. e CRISPR/Cas9 was used to change the allele at rs1247117 from A>G in Nalm6 cells. Data show the location of gRNA and ssODN, as well as NGS reads obtained from clone 1 and 2 at rs1247117. f PU.1 ChIP-PCR shows increased PU.1 binding at the rs1247117 locus using two A>G modified clones and 3 primer sets. Data shown are meanSD of three independent experiments for each primer set. Two-way ANOVA with Dunnetts multiple comparisons correction, n=3. g ATAC-seq data normalized for frequency of reads in peaks (FRIP) show a significantly higher count of G nucleotides in two clones of A>G modified Nalm6 cells compared to the count of A nucleotides detected in parental Nalm6 cells. Data shown are the meanSD. Bonferroni corrected, two-tailed Students T tests, n=3. h Western blots and quantification showing decreased EIF3A expression in A>G modified Nalm6 cells. Blots shown are representative of three independent experiments. Quantification data shown are the meanSD. Two-tailed Students T tests compare parental Nalm6 to combined data from A>G clones, n=3. All source data and statistical parameters are provided in the Source Data file.
Nalm6 cells contain the alternative A allele that disrupts the PU.1 motif at rs1247117, yet our data suggests that this site still binds PU.1 (Fig.5a, b). This led us to hypothesize that PU.1 binding affinity for the PU.1 motif surrounding rs1247117 would be strengthened by the risk G allele. Therefore, we designed biotinylated DNA probes containing two tandem 25-bp regions centered on reference G or alternative A allele-containing rs1247117 to test this hypothesis (Fig.5c). Using biotinylated probes, we performed an in vitro DNA-affinity pulldown from Nalm6 nuclear lysate and found that while PU.1 was indeed bound to the alternative A allele, PU.1 was more robustly bound to the reference G allele at rs1247117 (Fig.5d). To further assess the impact of the rs1247117 allele on PU.1 binding, we changed the Nalm6 allele from A to G using CRISPR/Cas9 (Fig.5e; AA = parental genotype, GG = mutated genotype). We used ChIP-PCR to determine that PU.1 binding was increased with the G allele relative to the A allele at the CRE containing rs1247117 in two A>G Nalm6 clones across 3 unique primer sets within the PU.1 peak at rs1247117 that was detected in Nalm6 cells (Fig.5f). We then asked if transposase accessibility was also increased at the CRE containing rs1247117 when the G allele was present. Using ATAC-seq, we found that accessibility was indeed increased at rs1247117 in mutated Nalm6 cells with the G allele when compared to the parental Nalm6 cells containing the A allele (Fig.5g). These data suggest that the risk G allele increases genomic accessibility and the affinity of PU.1 binding at rs1247117 relative to the alternative A allele.
We were next interested in how allele-specific PU.1 binding at rs1247117 was related to the expression of the protein encoded by the connected gene, EIF3A. We found that the G allele, which increased recruitment of PU.1, resulted in decreased expression of EIF3A when compared to Nalm6 cells with the A allele (Fig.5h). These data suggest that PU.1 recruitment to the CRE containing rs1247117 results in a net-repressive effect on EIF3A protein levels, and that less PU.1 recruitment with the A allele results in greater EIF3A expression.
Clonal selection can lead to the accumulation of random SNVs and even larger structural variations65 that can confound functional interpretation of more complex trans phenotypic effects. Therefore, to examine the connection between rs1247117 and the persistence of MRD after induction chemotherapy, we decided to use CRISPR/Cas9 to delete the CRE containing rs1247117 in heterogeneous cell pools of Nalm6 and SUPB15 cells (rs1247117 del) to avoid clonal selection (Fig.6a, b, Supplementary Fig.7a). Given that loss of the CRE containing rs1247117 would abolish PU.1 recruitment at this region, we hypothesized that rs1247117 del would result in increased EIF3A expression. Accordingly, we found that EIF3A expression was elevated in rs1247117 del cells relative to parental Nalm6 and SUPB15 cells, respectively (Fig.6c, d, Supplementary Fig.7b), further supporting an inverse relationship between PU.1 binding at rs1247117 and EIF3A expression.
a Diagram on the left showing the genomic context of the rs1247117 CRE deletion in Nalm6 cells in relation to chromatin accessibility, PU.1 binding and rs1247117. Black bar represents ATAC-seq peak, green par represents PU.1 peak, and red bar represents region deleted using CRISPR/Cas9 genome editing. b Gel shows validation of deletion using primers flanking deleted region. Arrow points to PCR fragment with deletion in heterogeneous Nalm6 cell pools harboring deletion compared to wild-type parental Nalm6 cells. c EIF3A gene expression is upregulated upon deletion of the CRE containing rs1247117. RT-qPCR data show the meanSD of three independent experiments. Two-tailed Students T test. d Western blots and quantification showing increased EIF3A expression in rs1247117 del Nalm6 cells. Blots shown are representative of four independent experiments. Quantification data show the meanSD. Two-tailed Students T tests, n=4. eg Drug sensitivity data comparing viability relative to vehicle treatment of wild-type parental Nalm6 cells and Nalm6 cells with rs1247117 CRE deletion after vincristine (VCR) treatment for 24 (n=3), 48 (n=3) and 72 (n=3) hours at the indicated concentrations. Non-linear regression and F test analysis indicate that these dose-response curves are significantly different. h Caspase 3/7 activity assays comparing Caspase activity relative to vehicle treatment of wild-type parental Nalm6 cells and Nalm6 cells with rs1247117 CRE deletion after vincristine (VCR) treatment for 72hours at the indicated concentrations (n=3). Dose-response curves of non-linear regression indicate that these curves are significantly different. Non-linear regression and F test analysis indicate that these dose-response curves are significantly different. All source data are provided in the Source Data file.
Because the risk G allele at rs1247117 was also associated with vincristine resistance in primary ALL cells from patients, we additionally sought to determine the impact of the CRE deletion containing rs1247117 on cellular response to vincristine treatment. We hypothesized that because the risk G allele is associated with enhanced PU.1 binding and resistance to vincristine, complete disruption of PU.1 binding in Nalm6 cells harboring the CRE deletion would show increased sensitivity to vincristine relative to parental Nalm6 cells. As predicted, Nalm6 cells with the CRE deletion exhibited significantly increased sensitivity to vincristine across a range of concentrations after 24, 48, and 72hours of treatment (Fig.6eg), and we found consistent effects on cell viability in SUPB15 cells (Supplementary Fig.7c). Consistent with enhanced sensitivity to vincristine, we also found increased caspase 3/7 activity in rs1247117 del Nalm6 cells relative to parental Nalm6 cells after 72hrs and across a range of vincristine concentrations (Fig.6h). These data suggest that a functional regulatory variant alters the binding affinity of a key transcription factor, PU.1, and disruption of this locus impacts EIF3A expression and vincristine sensitivity in ALL cells. To further validate our methodology utilizing CRISPR/Cas9 to delete CREs, we deleted CREs spanning two additional top variants, rs7426865 and rs12660691 (see Fig.4e), that was associated with the ex vivo resistance to 6-mercaptopurine and dexamethasone, respectively, in primary ALL cells. Deletion of these CREs also impacted protein expression and sensitivity to the associated chemotherapeutic agent, thereby supporting our functional approach (Supplementary Figs.8 and 9).
We next wanted to connect EIF3A directly to vincristine resistance. Given that EIF3A is an essential gene per the Broad Institutes DepMap, we opted to test the hypothesis EIF3A overexpression alone was sufficient to impact the Nalm6 cell response to vincristine. We, therefore, used lentiviral transduction to overexpress EIF3A in Nalm6 cells and compared EIF3A overexpression (EIF3A OE) cells to control infected cells (Nalm6 WT, Supplementary Fig.10a). Using two independent infections of EIF3A OE, we found that at 48hr and 72hr, EIF3A OE cells were more sensitive to vincristine than Nalm6 WT cells (Supplementary Fig.10b). These data suggest that EIF3A expression impacts the ALL cell response to vincristine, with higher expression sensitizing cells to the drug, and further establishes this gene as the likely target of the association.
See the article here:
Investigation of inherited noncoding genetic variation impacting the pharmacogenomics of childhood acute ... - Nature.com