Detailed Information for Outputted Files

from Somatic Mutation Annotators

We provide here detailed Description about the files outputted from the somatic mutation annotators via ANNOVAR and SnpEff.

  • *_annoTable.txt from the annotator via SnpEff
  • Column Names Description
    CHROM Chromosome number
    POS Position
    ID semi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s).
    REF Reference base(s)
    ALT Alternate non-reference alleles called on at least one of the samples
    EFFECT Functional consequences of one variant, e.g., missense_variant, synonymous_variant. please click here for details.
    REGION Regions (e.g., exonic, intronic) that one variant hits
    IMPACT Putative impact of the variant (e.g. HIGH, MODERATE or LOW impact).
    GENE Gene name (usually HUGO)
    GENEID Gene ID)
    FEATURE The type of feature is in the next field (e.g. transcript, motif, miRNA, etc.)
    FEATUREID Transcript ID (preferably using version number), Motif ID, miRNA, ChipSeq peak, Histone mark, depending on the annotation.
    BIOTYPE Description on whether the transcript is {“Coding”, “Noncoding”}. Whenever possible, use ENSEMBL biotypes. .
    HGVS_C Variant using HGVS notation (DNA level). For example, c.352A>G stands for A to G substitution of nucleotide 352. Click here for details.
    HGVS_P Coding variant using HGVS notation (Protein level). For example, p.Ile118Val stands for Isoleucine at position number 66 substitution to Valine. p.Ile118Val can be also be represented by p.I118V using the 1-letter symbol here. Click here for details.
    SIFT_score SIFT score. See the dbNSFP information table for details.
    SIFT_pred SIFT prediction. See the dbNSFP information table for details.
    Polyphen2_HDIV_score Pholyphen2 score based on HDIV. See the dbNSFP information table for details.
    Polyphen2_HDIV_pred Pholyphen2 prediction based on HDIV. See the dbNSFP information table for details.
    Polyphen2_HVAR_score Polyphen2 score based on HVAR. See the dbNSFP information table for details.
    Polyphen2_HVAR_pred Polyphen2 prediction based on HVAR. See the dbNSFP information table for details.
    LRT_score LRT score. See the dbNSFP information table for details.
    LRT_pred LRT prediction. See the dbNSFP information table for details.
    MutationTaster_score MutationTaster score. See the dbNSFP information table for details.
    MutationTaster_pred MutationTaster prediction. See the dbNSFP information table for details.
    MutationAssessor_score MutationAssessor score. See the dbNSFP information table for details.
    MutationAssessor_pred MutationAssessor prediction. See the dbNSFP information table for details.
    FATHMM_score FATHMM score. See the dbNSFP information table for details.
    FATHMM_pred FATHMM prediction. See the dbNSFP information table for details.
    PROVEAN_score PROVEAN score<. See the dbNSFP information table for details./td>
    PROVEAN_pred PROVEAN prediction. See the dbNSFP information table for details.
    VEST3_score VEST V3 score. See the dbNSFP information table for details.
    CADD_raw CADD raw score. See the dbNSFP information table for details.
    CADD_phred CADD phred-like score. See the dbNSFP information table for details.
    MetaSVM_score MetaSVM score. See the dbNSFP information table for details.
    MetaSVM_pred MetaSVM prediction. See the dbNSFP information table for details.
    MetaLR_score MetaLR score. See the dbNSFP information table for details.
    MetaLR_pred MetaLR prediction. See the dbNSFP information table for details.
    GERP++_NR GREP++ conservation score. See the dbNSFP information table for details.
    GERP++_RS GREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details.
    phyloP100way_vertebrate Phylogenetic p-values for 100 vertebrate species. See the dbNSFP information table for details.
    phastCons100way_vertebrate PhastCons score for 7 vertebrate species. See the dbNSFP information table for details.
    SiPhy_29way_logOdds SiPhy log odds score for 29 species. See the dbNSFP information table for details.
  • *_genelist.txt from the annotators via ANNOVAR and SnpEff
  • Column Names Description
    Gene Gene name associated with each variant; one gene name may correspond to several variants.
    Mutations Amino acid change information. For example, SAMD11:NM_152486:exon10:c.T1027C:p.W343R stands for gene name, Known RefSeq accession, region, cDNA level change, protein level change..
  • dbNSFP Information
  • Columns of Annotations from dbNSFP Database Pediction Algorithm/Conservation Score Description Method Categorical Prediction Author(s)
    SIFT_pred
    SIFT_score
    SIFT Sort intolerated from tolerated P(An amino acid at a position is tolerated | The most frequentest amino acid being tolerated) D: Deleterious (sift<=0.05);
    T: tolerated (sift>0.05)
    Pauline Ng, Fred Hutchinson
    Cancer Research Center, Seattle, Washington
    Polyphen2_HDIV_pred
    Polyphen2_HDIV_score
    Polyphen v2 Polymorphism phenotyping v2 D: Probably damaging (>=0.957),
    P: possibly damaging (0.453<=pp2_hdiv<=0.956),
    B: benign (pp2_hdiv<=0.452)
    Probablistic Classifier Training sets: HumDiv Havard Medical School/td>
    Polyphen2_HVAR_pred
    Polyphen2_HVAR_score
    Polyphen v2 Polymorphism phenotyping v2 Machine learning Training sets: HumVar D: Probably damaging (>=0.957),
    P: possibly damaging (0.453<=pp2_hdiv<=0.956);
    B: benign (pp2_hdiv<=0.452)
    Shamil Sunyaev
    Havard Medical School
    LRT_pred
    LRT_score
    LRT Likelihood ratio test LRT of H0: each codon evolves neutrally vs H1: the codon evovles under negative selection D: Deleterious;
    N: Neutral;
    U: Unknown
    Lower scores are more deleterious
    Sung Chung, Justin Fay Washington University
    MutationTaster_pred
    MutationTaster_score
    MutationTaster Bayes Classifier A: (""disease_causing_automatic"");
    D: (""disease_causing"");
    N: (""polymorphism [probably harmless]"");
    P: (""polymorphism_automatic[known to be harmless]"
    higher values are more deleterious"
    Markus Schuelke
    the Charité - Universitätsmedizin Berlin
    MutationAssessor_pred
    MutationAssessor_score
    MutationAssessor Entropy of multiple sequence alighnment H: high;
    M: medium;
    L: low;
    N: neutral.
    H/M means functional and L/N means non-functional higher values are more deleterious
    Reva Boris
    Computation Biology Center Memorial Sloan Kettering Cancer Center
    FATHMM_pred
    FATHMM_score
    FATHMM HMM Functional analysis through hidden markov model HMM D: Deleterious;
    T: Tolerated;
    lower values are more deleterious
    Shihab Hashem
    University of Bristol, UK
    PROVEAN_pred
    PROVEAN_score
    Protein Variation Effect Analyzer Clustering of homologus sequences D: Deleterious;
    N: Neutral
    higher values are more deleterious
    Choi Y J. Craig Venter Institute
    VEST3_score VEST V3 Variant effect scoring tool Random forest classifier higher values are more deleterious Rachel Karchin John Hopkins University
    CADD_raw CADD_phred CADD Combined annotation dependent depletion Linear kernel SVM higher values are more deleterious Jay Shendure, Xiaohui Xie University of California - Irvine
    DANN_score DANN Deleterious Annotation of genetic variants using Neural Networks Neural network higher values are more deleterious Jay Shendure, Xiaohui Xie
    University of California - Irvine
    fathmm-MKL_coding_pred FATHMM-MKL predicting the effects of both coding and non-coding variants using nucleotide-based HMMs Classifier based on multiple kernel learning D: Deleterious;
    T: Tolerated
    Score >= 0.5: D;
    Score < 0.5: T
    Shihab Hashem
    University of Bristol, UK
    MetaSVM_pred
    MetaSVM_score
    MetaSVM Support vector machine D: Deleterious; T: Tolerated;
    higher scores are more deleterious
    Coco Dong
    USC Biostatiscs Department
    MetaLR_pred
    MetaLR_score
    MetaLR Logistic regression D: Deleterious;
    T: Tolerated;
    higher scores are more deleterious
    Coco Dong
    USC Biostatiscs Department
    integrated_fitCons_score
    integrated_confidence_value
    FitCons Fitness consequences of functional annotation Integrate functional assays like ChIP-Seq with conservation measure of transcription factor binding sites higher scores are more deleterious Abriza
    Cold Spring Harbor Lab
    GERP++_RS
    GERP++_NR
    Genome Evolutionary Rate Profiling ++ maximum likelihood estimation procedure higher scores are more deleterious Eugne Davydov
    Stanford University, CS Department
    phyloP7way_vertebrate PhyloP Phylogentic p-values Phylogentic p-values calculated from a LRT, score-based test, GERP test Use 7 species higher scores are more deleterious Adam Siepel
    UCSC
    phyloP20way_mammalian PhyloP Phylogentic p-values a phylogenetic hidden Markov model (phylo-HMM) Use 20 species higher scores are more deleterious Adam Siepel
    UCSC
    phastCons7way_vertebrate phastCons A phylogenetic hidden Markov model (phylo-HMM) Use 7 species higher scores are more deleterious Adam Siepel
    UCSC
    phastCons20way_mammalian phastCons a phylogenetic hidden Markov model (phylo-HMM) Use 20 species higher scores are more deleterious Adam Siepel
    UCSC
    SiPhy_29_way SiPhy Probablistic framework, HMM Use 29 species higher scores are more deleterious Manual Garber
    Broad Institute of MIT & Harvard
>