1 Introduction to SuREVizHeart

Welcome to SuREVizHeart, your go-to tool for exploring the functional impacts of genetic variants assessed by SuRE. This user manual will guide you through the platform’s features, functionalities, and capabilities.

Key Features:

Interactive visualization of over 4.7 million variants.
Integration of functional, genomic, and clinical datasets.
Tools for uploading and comparing your own bigwig, bed and MPRA data.

2 What is SuRE ?

SuRE, or Survey of Regulatory Elements, is a massively parallel reporter assay to study gene regulation. In this approach, genomic DNA from an individual is randomly fragmented into small pieces and cloned into a plasmid vector. Each fragment is placed just upstream of a unique 20-base-pair (bp) barcode, followed by a GFP coding sequence and a polyA signal. By mapping these barcoded DNA fragments back to the human reference genome and measuring their expression levels in a specific cell line, we can directly associate allele-specific gene expression with their genomic locations.

To investigate regulatory variants potentially involved in human heart development and function, we applied SuRE to samples from six patients or their close relatives, all from families with a history of multiple heart defects. From this work, we created six highly complex plasmid libraries, each containing approximately 600 million unique clones. These libraries, named SuREX38, SuREX57, SuREX59, SuREX86, SuREX67, and SuREX68, represent a valuable resource for identifying regulatory variants associated with heart disease.

The browser includes a comprehensive dataset of 4.7 million genetic variants evaluated in a myocardial cell line, categorized into reporter assay QTLs (raQTLs) and non-raQTLs. Variants are classified as raQTLs (18,201 variants) if they exhibit a significant difference in expression between the reference and alternate alleles, indicating their potential to impact gene expression. Variants that do not show such differential expression are labeled as non-raQTLs.

SuRE Assay: An overview of the SuRE strategy to functionally assess genomic variants (SNPs and InDels) for their regulatory activity in a heart-derived cell line. Among 4.7 million variants analysed, 18,201 reporter assay quantitative trait loci (raQTLs) were identified. gDNA, genomic DNA fragment; ORF, open reading frame; PAS, polyadenylation signal; REF, reference; ALT, alternate; FDR, false discovery rate.

You can access the SuREVizHeart app at: SuREVizHeart

3 App Features and Layout

3.1 Query Box

3.1.1 Search Functionality

Variants: Enter the chr:pos format (e.g., chr12:128797635).
- The variant must be functionally assessed by SuRE MPRA. If the variant is not available, the app will display an error message.
Genes: Enter the gene name (case-insensitive, e.g., TBP or tbP).
- Gene names should follow the specifications of the GENCODE GTF GRCh38.p14 Human Release 46 .

3.1.2 Customizing Search

Adjust the Flanking Region Slider to explore regions ranging from 1 kb to 100 kb around your query. This allows for a flexible search area.

3.1.3 Upload Functionality

Use the Browse button to upload your own custom MPRA data, BigWig, and BED files. These files will be visualized in the Uploaded Track Viewer tab.

3.1.4 Download Functionality

Once your search results are rendered, you have the option to download the visualized data as a zipped file. The zip file contains the following:

JASPAR2022_info.csv
This file contains information about Transcription Factors (TFs) that are affected by the variant you are querying. This is only available when downloading variant view data. If you’re downloading gene view data, this file will not be included. The contents of the file are similar to what you see in the Transcription Factor Binding Site Impact (TFBSi) sub-tab within the highlighted variant tab, but with additional data. The columns include:
- CHROM, POS, REF, ALT
  These columns define the chromosome, position, reference allele, and alternate allele of the variant.
- motif_alt_id, motif_id
  The motif name and ID of the TF as specified in the JASPAR 2022 database.
- start, end, strand, pos
  The start and end positions of where the TF binding site/motif aligns to the human sequence, along with the strand and position.
- refs.score, alts.score, refs.pval, alts.pval
  The TFBSi scores and p-values as calculated by the algorithm. The scores indicate the affinity of the TF to the reference and alternate sequences, and the p-values reflect the significance of this binding — how significantly well the TF binds to the predicted binding site compared to a random sequence.
- absdiff, max.score, effect.JASPAR
  - absdiff: The absolute difference between the reference and alternate scores.
  - max.score: The higher of the reference or alternate score.
  - effect.JASPAR: Quantifies the change in binding using the formula: -log10(p-value of REF / p-value of ALT), indicating the effect of the variant on TF binding.
- tax_group, TF_family, TF_class, pubmed_ids, uniprot_ids, data_type, Gene_Name
  Additional information about the TF:
  - tax_group: The taxonomic group of the TF.
  - TF_family: The family of the TF.
  - TF_class: The class of the TF.
  - pubmed_ids: PubMed IDs of relevant papers where the TF is mentioned.
  - uniprot_ids: UniProt ID of the gene linked to the TF.
  - data_type: The experimental data type used to define this TF, e.g., HT-SELEX.
  - Gene_Name: The gene name associated with the TF according to the UniProt database.
SNP_info.csv
A CSV file containing tabular data as presented in the Variant Data Overview tab. This includes detailed information about the SNPs, such as their chromosomal positions and related information.
SuREX_.bedGraph
A BEDGraph file containing the SuRE Profiles for the locus you queried. This file is specifically generated for the variant locus and provides the SuRE data relevant to that region.

Search Functionality: This screenshot highlights the search bar, query options for variants and genes, the flanking region slider for customizing your search, and the upload functionality for custom data. Use the Browse button to upload your own MPRA, BigWig, or BED files and visualize them in the Uploaded Track Viewer tab.

3.2 Visualization Modes

3.2.1 Variant View

Focused analysis of specific variants or when you put query of format chr:pos in the search bar.
Displays allele expression levels with interactive elements.

3.2.2 Region View

Gene-centric exploration of surrounding genomic features or when you write the name of the gene you want to look for in the query bar.
If a variant is not found in the SuRE database, SuREVizHeart will automatically navigate to the corresponding genomic region, allowing you to explore nearby variants and associated genomic features within that area.

Visualization Modes: This screenshot highlights the two visualisation modes possible where in one you can target a variant of intrest that has been functionally assessed by SuRE assay and the gene of intrest.

3.3 Tabs Overview

3.3.1 Functional Impact Assessment

SuRE Impact Plot:
Displays the expression levels of REF and ALT alleles.
- Height: Represents SuRE expression levels.
- Width/Opacity: Indicates statistical significance of differences.
Gene Plot:
Highlights genes in the query region, with interactive tools for exploration.

Tips:

Zoom: To zoom in or out, right-click and drag to create a rectangular area of interest.
Reset: To return to the original view, double-click anywhere on the plot or click the Home icon in the top-right corner.
Save: To save the plot as a PNG, click the Camera icon next to the Home icon.

Functional Impact Assessment Tab: This screenshot highlights the SuRE impact and gene plot in the functional impact assessment tab for variant NC_000012.12:g.128797635T>C.

3.3.2 Variant Data Overview

Provides a detailed table of all variants in the query region:

Column	Description
Chromosome	Indicates the chromosome where the variant is located.
Position (hg38)	Specifies the genomic position of the variant according to the human hg38 reference genome.
Reference Allele	Indicates the reference allele at the variant’s position.
Alternate Allele	Indicates the alternate allele observed at the variant’s position.
rsID	The rsID, if available in dbSNP150, assigned to the variant.
Population Allele Frequency	Provides the allele frequency of the variant in the gnomADv3.1.2 database across various populations.
SuRE Impact Score	SuRE MPRA impact score indicating the variant’s effect on transcription.
Genotype in SuREXX	Indicates the genotype of the variant in each SuREXX sample (e.g., SuRE38, SuRE57, etc.).
Alternate Allele Coverage	Specifies the coverage (number of fragments) for the alternate allele.
Reference Allele Coverage	Specifies the coverage (number of fragments) for the reference allele.
Reference Allele Mean Expression	Indicates the mean expression level of the reference allele across samples.
Alternate Allele Mean Expression	Indicates the mean expression level of the alternate allele across samples.
p-value	Indicates the p-value calculated based on the Wilcoxon rank sum test between the reference and alternate alleles for each variant.
Description	Indicates if the variant is an raQTL or not.

3.3.3 Gene Expression Overview

In this tab, we visualize gene expression of genes within the window shown in the Functional Impact Assessment tab. The data shown below is a collection of transcriptomic data from various sources, such as ENCODE, GEO, and ArrayExpress. The data encompasses gene expression in the following contexts:

Fetal human heart at developmental stages
Fetal tissues during development
Healthy (non-failing) adult human heart
In vitro differentiated cardiomyocytes and undifferentiated human embryonic stem cells (H1)
AC16 cell line used for SuRE experiments
Human neural crest cells (migrating from the neural tube and entering the heart from pharyngeal arches)

We specifically highlight gene expression for TBX18 and GATA4, which are known cardiac stem cell marker genes highly expressed during development, as well as HNF4A, a liver-specific gene, to show contrast in tissue-specific expression.

Gene Expression Overview Tab: This screenshot highlights the expression of genes within the locus of interest as shown in Functional Impact Assessment Tab.

3.3.4 SuRE Profiles

This section explores the SuRE profiles obtained for patients analyzed in this study, complemented by insights from the AC16 ATAC-seq dataset and conservation scores across multiple species. Together, these datasets provide a comprehensive understanding of regulatory mechanisms and evolutionary conservation within the loci of interest.

SuRE Profile of Patients:
The analysis begins with SuRE Profiless derived from congenital heart disease patients. These profiles offer valuable insights into the functional behavior of genomic regions within individuals.
AC16 ATAC-seq Data:
Since SuRE libraries were tested in the AC16 Human Cardiomyocyte Cell Line, we incorporate AC16 ATAC-seq data. This dataset highlights regions of open chromatin within the endogenous AC16 genome, providing context for chromatin accessibility and regulatory potential.
PhasCon Score:
These scores represent conservation scores across 30 mammalian species. This dataset highlights evolutionary conservation patterns within the locus of interest, offering insights into its functional and evolutionary significance.

SuRE Profiles Tab: This screenshot highlights the SuRE expression profile of patients (subset) centered around variant NC_000012.12:g.128797635T>C. The dotted line indicating the location of selected variant.

3.3.5 Highlighted Variants

The Highlighted Variants section displays selected variant’s potential impact on gene regulation, transcription factor binding, and disease association.

3.3.5.1 Transcription Factor Binding Site Impact (TFBSi)

When exploring a variant, this feature evaluates disruptions in transcription factor (TF) interactions due to sequence changes only for raQTLs (please refer to the cited manuscript for defination of raQTL). Key details include:

Predicted Alignment:
Alignment of the TF motif with both REF and ALT sequences, offering insights into binding behavior.
Predicted Affinity Change:
Metrics showing how TF affinity is altered from REF to ALT, indicating whether binding strength is enhanced or diminished.

Transcription Factor Binding Site Impact (TFBSi) Tab: This screenshot highlights the alignment of the TFBS affected by NC_000012.12:g.128797635T>C to the reference and alternate sequence as shown in the TFBSi tab.

3.3.5.2 ClinVar Database

The ClinVar tab provides access to disease-related annotations for selected variants, including classifications such as Pathogenic or Benign and detailed reports on variant-disease associations, including drug response. Additional information can be found at ClinVar.

3.3.5.3 Genome Aggregation Database (gnomAD)

The gnomAD tab offers population-level data on allele frequencies across diverse populations, along with tools for functional impact evaluation, including VEP, CADD scores, and phyloP conservation metrics. Explore more at gnomAD.

3.3.6 Uploading Your Data

SuREVizHeart provides robust support for users to upload and visualize their own data alongside the existing SuRE MPRA results. Whether you’re working with MPRA-specific data, genomic annotations, or coverage tracks, SuREVizHeart allows you to integrate and analyze your custom datasets.

MPRA Data: MPRA data files (TSV/CSV format) must include columns like chromosome (prefixed with “chr_”), position, reference/alternate alleles, expression signals, and p-values to enable visualization of variant effects on transcription through markers on plots.
BigWig Files: BigWig files, used for genomic coverage data, are processed with R’s rtracklayer package, ensuring file integrity through partial reads and error detection for invalid files.
BED Files: BED files annotate genomic regions of interest and should contain at least three columns (chromosome, start position, and end position) to integrate with the visualization tracks.