MutSpliceDB: a Database of Splice Sites Variants
Developed by: Computational and Systems Biology Branch (Biometric Research Program, DCTD/NCI)
About MutSpliceDB
Splice site mutations are one of the well-known classes of genetic alterations playing an important
role in biology.
Splice site mutations in cancer are most frequently observed as inactivating alterations in tumor
suppressor genes (for example, TP53 or RB1) and to a lesser degree as activating alterations in
oncogenes (for example MET).
Splice site mutations may lead to alterations in mRNA transcripts, causing for example exon(s)
inclusion/exclusion or, intron retention. Interpreting the consequences of a specific splice site
mutation is not straightforward, especially if the mutation is located outside of the canonical splice
sites.
Accurate interpretation of the impact a splice site mutation has can further our understanding of
biology, influence patient treatment, and, in case of germline splice site mutations, may also have
relevance to familial disease predisposition.
To facilitate the interpretation of splice site mutation effects, we developed MutSpliceDB: a database
of splice sites variants, documenting mutation effect(s) on splicing based on RNA-seq BAM files from
sample(s) with particular splice site mutations.
For each splice site mutation, the
resource contains the following information:
- gene symbol;
- Entrez gene ID;
- HGVS compliant transcript based variant notation;
- allele registry ID;
- description of the splicing effect;
- sample name;
- sample source;
- name of RNA-seq BAM file;
- splicing effect image snapshot;
- mini BAM file with reads only for relevant gene (if there is no restrictions on nucleotide level data distribution);
- if the RNA-seq BAM file does not contain reads with splice site mutation (e.g., due to exon skipping), the name of BAM file with DNA sequencing data.
All entries in MutSpliceDB are based on publicly available RNA-seq BAM files. The initial release of
MutSpliceDB (2019) contains detailed information for a subset of splice site mutations derived from
publicly available RNA-seq data from Cancer Cell Lines Encyclopedia (CCLE) and The Cancer Genome Atlas
(TCGA). We anticipate adding information for more splice site mutations as soon as the necessary
evidence becomes available.
How to explore the MutSpliceDB Database
Figure 1: How to explore the MutSpliceDB database:
- On MutSpliceDB main web page, click the "Access MutSpliceDB" button to open the resource database (we recommend the use of Google Chrome or Mozilla Firefox web browser to access MutSpliceDB);
- The database table lists the genetic alterations included in the resource. The table has
dynamic controls that allow sorting (by clicking on the column headers) and search filters. The
table can be exported as CSV or Excel, and active weblinks connect each entry to the corresponding
GeneCard, NCBI and ClinGen Allele Registry entry.
To further explore the details of a specific entry, click the "BAM file(s) page" link to open the page with supporting evidence. - Each entry contains the supporting evidence regarding the splice variants. The table can contain multiple entries, each with links to external resources (for example, GDC or CCLE) and files (image and mini BAM) that researchers can download. If the user does not want to download the data, the mini BAM file can be visualized using the web IGV website by clicking the provided link.
How to submit an entry
MutSpliceDB is open for submissions from the molecular genetics community. Requests to add entry to
MutSpliceDB should be addressed to Dr. Dmitriy Sonkin (dmitriy.sonkin@nih.gov) and should contain all the splice
site mutation information mentioned above, image snapshots and mini BAM files (if there is no
restrictions on nucleotide level data distribution) obtained as explained below.
Image snapshot files showing the splicing effect of the mutations should contain Gene Symbol, relevant
exon numbers and HGVS nomenclature compliant transcript based variant notation. MANE Select/Plus
transcript ID should be used if possible.
Image snapshot file names should have the following structure:
SampleName_GeneSymbol_AlleleRegistryID.jpeg.
For example, an image showing the splicing effects of TP53 mutation (NM_000546.5:c.375+5G>A) with
Allele Registry ID CA645589233 in cell line PK-45H should have the following name:
PK-45H_TP53_CA645589233.jpeg.
Allele Registry ID for a variant can be found or generated using the ClinGen Allele Registry (https://reg.clinicalgenome.org/site/cg-registry).
Mini RNA-seq BAM file names should have the following structure: RNAseq
BamFileName_GeneSymbol_mini.bam. For example, mini BAM file for cell line PK-45H with TP53 mutation
should have the following name: G27478.PK-45H.2_TP53_mini.bam. In this case G27478.PK-45H.2 is taken
from the CCLE RNA-seq BAM file name G27478.PK-45H.2.bam.
To create the mini BAM files using Samtools (https://www.htslib.org/) follow the steps below:
- samtools view RNA-seq.bam chr:start-end -b > mini.bam
- samtools index mini.bam
RNA-seq.bam file should be sorted and indexed. The instructions above create a sorted mini bam file
(mini.bam) and the corresponding index file (mini.bam.bai). In Samtools view command 'chr' should be
replaced with chromosome number, 'start' should be replaced with genomic position 100 bp before the
start of first coding exon, and 'end' should be replaced with genomic position 100 bp after the end of
last coding exon. Select the first and last coding exons in a way that covers all existing gene
isoforms.
*Disclaimer*: MutSpliceDB is a free resource intended for research purposes only. It should NOT be used for emergencies or medical or professional advice.
Getting Started
For inquiries or to submit evidence,
please contact
Dr. Dmitriy Sonkin (dmitriy.sonkin@nih.gov)
Palmisano, A., Vural, S., Zhao, Y., & Sonkin, D. (2021).
MutSpliceDB: A database of splice sites variants with RNA-seq based evidence on effects on splicing.
Human Mutation.
https://doi.org/10.1002/humu.24185