Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, identifying individual causal regulatory variants is challenging. Here, we use a massively parallel reporter assay to measure the cis-regulatory consequences of 5832 wild-type DNA variants on the promoters of 2503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, the variant pairs showed non-additive epistatic interactions. Causal variants were enriched for conserved nucleotides, tended to have a low frequency of derived alleles, and were depleted of promoters for essential genes, which is consistent with the action of negative selection. Causal variants were also enriched for alterations in transcription factor binding sites. Models that integrate these characteristics provided a modest but statistically significant ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.
ADD COMMENT +
Individual genomes have thousands of sequence differences in gene regulatory elements. Together, these variants contribute to the variation of many phenotypic traits by altering the expression of one or more genes (Albert and Kruglyak, 2015). The presence of individual DNA variants that alter gene expression can be detected by mapping genomic regions called "expression quantitative trait loci" (eQTL). Among these, "local" eQTLs are found near or in the gene whose expression they influence. Eukaryotic species ranging from yeast to humans have large amounts of local regulatory variation (Brem et al., 2002; Hasin-Brumshtein et al., 2016; Heyne et al., 2014; Rockman et al., 2010; Stranger et al., 2005; West et al., 2007). In human populations, most genes are affected by one or more local eQTLs (GTEx Consortium et al., 2017). Similarly, in a cross between two genetically different yeast isolates, 74% of the genes are influenced by local eQTL (Albert et al., 2018). Most of these local eQTLs arise from DNA variants that disrupt cis-acting regulatory mechanisms (Albert et al., 2018; Ronald et al., 2005). When such cis-acting variants are found in the transcribed region of a gene, they can alter the stability, splicing, polyadenylation, or regulation of mRNA by RNA-binding proteins. Variants that act on cis promoters or enhancers can alter the transcription of their target genes.
As a consequence of genetic linkage in experimental crosses (Albert et al., 2018) and linkage disequilibrium in inbred populations (GTEx Consortium et al., 2017; Kita et al., 2017), regions mapped as eQTL almost always contain multiple sequence variants. It is generally assumed that most of these variants have no effect, obscuring the identity of one or a few causal variants in each eQTL (Figure 1). Although causal variants have been identified in several local eQTLs (Chang et al., 2013; Claussnitzer et al., 2015; Lutz et al., 2019; Musunuru et al., 2010; Ronald et al., 2005), the majority variants remain unknown. Due to this lack of systematic information, many questions about local eQTLs remain open, including whether local eQTLs are typically caused by one or multiple variants, whether multiple variants interact non-additively, what evolutionary forces act on the causal variants , what the causal mechanisms of the variants disturb, and whether it may ultimately be possible to combine the answers to these questions to build models that can predict the consequences of regulatory variants of the genome sequence.