Skip to Content
Biometric Research Program


Contact BRP
Show menu
Search this site
Last Updated:

MutSpliceDB: a Database of Splice Sites Variants

Developed by: Computational and Systems Biology Branch (Biometric Research Program, DCTD/NCI)

About MutSpliceDB

Splice site mutations are one of the well-known classes of genetic alterations playing an important role in biology. Splice site mutations in cancer are most frequently observed as inactivating alterations in tumor suppressor genes (for example, TP53 or RB1) and to a lesser degree as activating alterations in oncogenes (for example MET). Splice site mutations may lead to alterations in mRNA transcripts, causing for example exon(s) inclusion/exclusion or, intron retention. Interpreting the consequences of a specific splice site mutation is not straightforward, especially if the mutation is located outside of the canonical splice sites. Accurate interpretation of the impact a splice site mutation has can further our understanding of biology, influence patient treatment, and, in case of germline splice site mutations, may also have relevance to familial disease predisposition.

To facilitate the interpretation of splice site mutation effects, we developed MutSpliceDB: a database of splice sites variants, documenting mutation effect(s) on splicing based on RNA-seq BAM files from sample(s) with particular splice site mutations.

For each splice site mutation, the resource contains the following information:

  • gene symbol;
  • Entrez gene ID;
  • HGVS compliant transcript based variant notation;
  • allele registry ID;
  • description of the splicing effect;
  • sample name;
  • sample source;
  • name of RNA-seq BAM file;
  • splicing effect image snapshot;
  • mini BAM file with reads only for relevant gene (if there is no restrictions on nucleotide level data distribution);
  • if the RNA-seq BAM file does not contain reads with splice site mutation (e.g., due to exon skipping), the name of BAM file with DNA sequencing data.

All entries in MutSpliceDB are based on publicly available RNA-seq BAM files. The initial release of MutSpliceDB (2019) contains detailed information for a subset of splice site mutations derived from publicly available RNA-seq data from Cancer Cell Lines Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA). We anticipate adding information for more splice site mutations as soon as the necessary evidence becomes available.

How to explore the MutSpliceDB Database

MutSpliceDB figure 1

Figure 1: How to explore the MutSpliceDB database:

  1. On MutSpliceDB main web page, click the "Access MutSpliceDB" button to open the resource database (we recommend the use of Google Chrome or Mozilla Firefox web browser to access MutSpliceDB);
  2. The database table lists the genetic alterations included in the resource. The table has dynamic controls that allow sorting (by clicking on the column headers) and search filters. The table can be exported as CSV or Excel, and active weblinks connect each entry to the corresponding GeneCard, NCBI and ClinGen Allele Registry entry.
    To further explore the details of a specific entry, click the "BAM file(s) page" link to open the page with supporting evidence.
  3. Each entry contains the supporting evidence regarding the splice variants. The table can contain multiple entries, each with links to external resources (for example, GDC or CCLE) and files (image and mini BAM) that researchers can download. If the user does not want to download the data, the mini BAM file can be visualized using the web IGV website by clicking the provided link.

How to submit an entry

MutSpliceDB is open for submissions from the molecular genetics community. Requests to add entry to MutSpliceDB should be addressed to Dr. Dmitriy Sonkin ( and should contain all the splice site mutation information mentioned above, image snapshots and mini BAM files (if there is no restrictions on nucleotide level data distribution) obtained as explained below. Image snapshot files showing the splicing effect of the mutations should contain Gene Symbol, relevant exon numbers and HGVS nomenclature compliant transcript based variant notation. MANE Select/Plus transcript ID should be used if possible. Image snapshot file names should have the following structure: SampleName_GeneSymbol_AlleleRegistryID.jpeg. For example, an image showing the splicing effects of TP53 mutation (NM_000546.5:c.375+5G>A) with Allele Registry ID CA645589233 in cell line PK-45H should have the following name: PK-45H_TP53_CA645589233.jpeg. Allele Registry ID for a variant can be found or generated using the ClinGen Allele Registry ( Mini RNA-seq BAM file names should have the following structure: RNAseq BamFileName_GeneSymbol_mini.bam. For example, mini BAM file for cell line PK-45H with TP53 mutation should have the following name: G27478.PK-45H.2_TP53_mini.bam. In this case G27478.PK-45H.2 is taken from the CCLE RNA-seq BAM file name G27478.PK-45H.2.bam.
To create the mini BAM files using Samtools ( follow the steps below:
- samtools view RNA-seq.bam chr:start-end -b > mini.bam
- samtools index mini.bam

RNA-seq.bam file should be sorted and indexed. The instructions above create a sorted mini bam file (mini.bam) and the corresponding index file (mini.bam.bai). In Samtools view command 'chr' should be replaced with chromosome number, 'start' should be replaced with genomic position 100 bp before the start of first coding exon, and 'end' should be replaced with genomic position 100 bp after the end of last coding exon. Select the first and last coding exons in a way that covers all existing gene isoforms.

*Disclaimer*: MutSpliceDB is a free resource intended for research purposes only. It should NOT be used for emergencies or medical or professional advice.

Getting Started



For inquiries or to submit evidence,
please contact
Dr. Dmitriy Sonkin (


Palmisano, A., Vural, S., Zhao, Y., & Sonkin, D. (2021).
MutSpliceDB: A database of splice sites variants with RNA-seq based evidence on effects on splicing.
Human Mutation.