Welcome to SuREVizHeart, your go-to tool for exploring the functional impacts of genetic variants assessed by SuRE. This user manual will guide you through the platform’s features, functionalities, and capabilities.
Key Features:
SuRE, or Survey of Regulatory Elements, is a massively parallel reporter assay to study gene regulation. In this approach, genomic DNA from an individual is randomly fragmented into small pieces and cloned into a plasmid vector. Each fragment is placed just upstream of a unique 20-base-pair (bp) barcode, followed by a GFP coding sequence and a polyA signal. By mapping these barcoded DNA fragments back to the human reference genome and measuring their expression levels in a specific cell line, we can directly associate allele-specific gene expression with their genomic locations.
To investigate regulatory variants potentially involved in human heart development and function, we applied SuRE to samples from six patients or their close relatives, all from families with a history of multiple heart defects. From this work, we created six highly complex plasmid libraries, each containing approximately 600 million unique clones. These libraries, named SuREX38, SuREX57, SuREX59, SuREX86, SuREX67, and SuREX68, represent a valuable resource for identifying regulatory variants associated with heart disease.
The browser includes a comprehensive dataset of 4.7 million genetic variants evaluated in a myocardial cell line, categorized into reporter assay QTLs (raQTLs) and non-raQTLs. Variants are classified as raQTLs (18,201 variants) if they exhibit a significant difference in expression between the reference and alternate alleles, indicating their potential to impact gene expression. Variants that do not show such differential expression are labeled as non-raQTLs.
SuRE Assay: An overview of the SuRE strategy to functionally assess genomic variants (SNPs and InDels) for their regulatory activity in a heart-derived cell line. Among 4.7 million variants analysed, 18,201 reporter assay quantitative trait loci (raQTLs) were identified. gDNA, genomic DNA fragment; ORF, open reading frame; PAS, polyadenylation signal; REF, reference; ALT, alternate; FDR, false discovery rate.
You can access the SuREVizHeart app at: SuREVizHeart
Adjust the Flanking Region Slider to explore regions ranging from 1 kb to 100 kb around your query. This allows for a flexible search area.
Use the Browse button to upload your own custom MPRA data, BigWig, and BED files. These files will be visualized in the Uploaded Track Viewer tab.
Once your search results are rendered, you have the option to download the visualized data as a zipped file. The zip file contains the following:
JASPAR2022_info.csv
This file contains information about Transcription Factors
(TFs) that are affected by the variant you are querying. This is only available when downloading variant view data. If you’re downloading gene view data, this file will not be included. The contents of the file
are similar to what you see in the Transcription Factor Binding Site Impact (TFBSi) sub-tab within the highlighted variant tab, but with additional data. The columns include:
CHROM, POS, REF, ALT
These columns define the chromosome, position, reference allele, and alternate allele of the variant.
motif_alt_id, motif_id
The motif name and ID of the TF as specified in the JASPAR 2022 database.
start, end, strand, pos
The start and end positions of where the TF binding site/motif aligns to the human sequence, along with the strand and position.
refs.score, alts.score, refs.pval,
alts.pval
The TFBSi scores and p-values as calculated by the algorithm. The scores indicate the affinity of the TF to the reference and alternate sequences, and the p-values reflect the significance of
this binding — how significantly well the TF binds to the predicted binding site compared to a random sequence.
absdiff, max.score, effect.JASPAR
tax_group, TF_family, TF_class, pubmed_ids, uniprot_ids,
data_type, Gene_Name
Additional information about the TF:
SNP_info.csv
A CSV file containing tabular data as presented in the Variant
Data Overview tab. This includes detailed information about the SNPs, such as their chromosomal positions and related information.
SuREX_.bedGraph
A BEDGraph file containing the SuRE
Profiles for the locus you queried. This file is specifically generated for the variant locus and provides the SuRE data relevant to that region.
Search Functionality: This screenshot highlights the search bar, query options for variants and genes, the flanking region slider for customizing your search, and the upload functionality for custom data. Use the Browse button to upload your own MPRA, BigWig, or BED files and visualize them in the Uploaded Track Viewer tab.
Visualization Modes: This screenshot highlights the two visualisation modes possible where in one you can target a variant of intrest that has been functionally assessed by SuRE assay and the gene of intrest.
Tips:
Functional Impact Assessment Tab: This screenshot highlights the SuRE impact and gene plot in the functional impact assessment tab for variant NC_000012.12:g.128797635T>C.
Provides a detailed table of all variants in the query region:
Column | Description |
---|---|
Chromosome | Indicates the chromosome where the variant is located. |
Position (hg38) | Specifies the genomic position of the variant according to the human hg38 reference genome. |
Reference Allele | Indicates the reference allele at the variant’s position. |
Alternate Allele | Indicates the alternate allele observed at the variant’s position. |
rsID | The rsID, if available in dbSNP150, assigned to the variant. |
Population Allele Frequency | Provides the allele frequency of the variant in the gnomADv3.1.2 database across various populations. |
SuRE Impact Score | SuRE MPRA impact score indicating the variant’s effect on transcription. |
Genotype in SuREXX | Indicates the genotype of the variant in each SuREXX sample (e.g., SuRE38, SuRE57, etc.). |
Alternate Allele Coverage | Specifies the coverage (number of fragments) for the alternate allele. |
Reference Allele Coverage | Specifies the coverage (number of fragments) for the reference allele. |
Reference Allele Mean Expression | Indicates the mean expression level of the reference allele across samples. |
Alternate Allele Mean Expression | Indicates the mean expression level of the alternate allele across samples. |
p-value | Indicates the p-value calculated based on the Wilcoxon rank sum test between the reference and alternate alleles for each variant. |
Description | Indicates if the variant is an raQTL or not. |
In this tab, we visualize gene expression of genes within the window shown in the Functional Impact Assessment tab. The data shown below is a collection of transcriptomic data from various sources, such as ENCODE, GEO, and ArrayExpress. The data encompasses gene expression in the following contexts:
We specifically highlight gene expression for TBX18 and GATA4, which are known cardiac stem cell marker genes highly expressed during development, as well as HNF4A, a liver-specific gene, to show contrast in tissue-specific expression.
Gene Expression Overview Tab: This screenshot highlights the expression of genes within the locus of interest as shown in Functional Impact Assessment Tab.
This section explores the SuRE profiles obtained for patients analyzed in this study, complemented by insights from the AC16 ATAC-seq dataset and conservation scores across multiple species. Together, these datasets provide a comprehensive understanding of regulatory mechanisms and evolutionary conservation within the loci of interest.
SuRE Profile of Patients:
The analysis begins with SuRE Profiless derived from congenital heart disease patients. These profiles offer valuable insights into the functional behavior of genomic
regions within individuals.
AC16 ATAC-seq Data:
Since SuRE libraries were tested in the AC16 Human Cardiomyocyte Cell Line, we incorporate AC16 ATAC-seq data. This dataset highlights regions of open chromatin within the
endogenous AC16 genome, providing context for chromatin accessibility and regulatory potential.
PhasCon Score:
These scores represent conservation scores across 30 mammalian species. This dataset highlights evolutionary conservation patterns within the locus of interest, offering insights
into its functional and evolutionary significance.
SuRE Profiles Tab: This screenshot highlights the SuRE expression profile of patients (subset) centered around variant NC_000012.12:g.128797635T>C. The dotted line indicating the location of selected variant.
The Highlighted Variants section displays selected variant’s potential impact on gene regulation, transcription factor binding, and disease association.
When exploring a variant, this feature evaluates disruptions in transcription factor (TF) interactions due to sequence changes only for raQTLs (please refer to the cited manuscript for defination of raQTL). Key details include:
Predicted Alignment:
Alignment of the TF motif with both REF and ALT sequences, offering insights into binding behavior.
Predicted Affinity Change:
Metrics showing how TF affinity is altered from REF to ALT, indicating whether binding strength is enhanced or diminished.
Transcription Factor Binding Site Impact (TFBSi) Tab: This screenshot highlights the alignment of the TFBS affected by NC_000012.12:g.128797635T>C to the reference and alternate sequence as shown in the TFBSi tab.
The ClinVar tab provides access to disease-related annotations for selected variants, including classifications such as Pathogenic or Benign and detailed reports on variant-disease associations, including drug response. Additional information can be found at ClinVar.
The gnomAD tab offers population-level data on allele frequencies across diverse populations, along with tools for functional impact evaluation, including VEP, CADD scores, and phyloP conservation metrics. Explore more at gnomAD.
SuREVizHeart provides robust support for users to upload and visualize their own data alongside the existing SuRE MPRA results. Whether you’re working with MPRA-specific data, genomic annotations, or coverage tracks, SuREVizHeart allows you to integrate and analyze your custom datasets.
MPRA Data: MPRA data files (TSV/CSV format) must include columns like chromosome (prefixed with “chr_”), position, reference/alternate alleles, expression signals, and p-values to enable visualization of variant effects on transcription through markers on plots.
BigWig Files: BigWig files, used for genomic coverage data, are processed with R’s rtracklayer
package, ensuring file integrity through partial reads and error detection for invalid files.
BED Files: BED files annotate genomic regions of interest and should contain at least three columns (chromosome, start position, and end position) to integrate with the visualization tracks.
Tips:
bcftools
or bedtools
for chromosome-specific extractions.bigWigToBedGraph
.
For any questions or feedback, please contact us at reachout.vartika@gmail.com or visit our website.