dbNSFP

Database type:          variant
Number of records:      89,617,785
Distinct variants:      84,484,850
Reference genome hg18:  chr, hg18_pos, ref, alt
Reference genome hg19:  chr, pos, ref, alt

Field:                  chr
Type:                   string
Comment:                Chromosome number
Missing entries:        0 
Unique Entries:         24

Field:                  pos
Type:                   integer
Comment:                physical position on the chromosome as to hg19
                        (1-based coordinate)
Missing entries:        0 
Unique Entries:         28,060,014
Range:                  6007 - 249212562

Field:                  ref
Type:                   string
Comment:                Reference nucleotide allele (as on the + strand)
Missing entries:        0 
Unique Entries:         4

Field:                  alt
Type:                   string
Comment:                Alternative nucleotide allele (as on the + strand)
Missing entries:        0 
Unique Entries:         4

Field:                  aaref
Type:                   string
Comment:                reference amino acid
Missing entries:        0 
Unique Entries:         22

Field:                  aaalt
Type:                   string
Comment:                alternative amino acid
Missing entries:        0 
Unique Entries:         22

Field:                  hg18_pos
Type:                   integer
Comment:                physical position on the chromosome as to hg19
                        (1-based coordinate)
Missing entries:        44,904 (0.1% of 89,617,785 records)
Unique Entries:         28,043,425
Range:                  4381 - 247179185

Field:                  genename
Type:                   string
Comment:                common gene name
Missing entries:        0 
Unique Entries:         20,264

Field:                  Uniprot_acc
Type:                   string
Comment:                Uniprot accession number. Multiple entries separated
                        by ";".
Missing entries:        17,068,597 (19.0% of 89,617,785 records)
Unique Entries:         55,816

Field:                  Uniprot_id
Type:                   string
Comment:                Uniprot ID number. Multiple entries separated by ";".
Missing entries:        20,254,026 (22.6% of 89,617,785 records)
Unique Entries:         37,250

Field:                  Uniprot_aapos
Type:                   integer
Comment:                amino acid position as to Uniprot. Multiple entries
                        separated by ";".
Missing entries:        17,068,597 (19.0% of 89,617,785 records)
Unique Entries:         2,687,476
Range:                  1 - 9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9

Field:                  Interpro_domain
Type:                   string
Comment:                Interpro_domain: domain or conserved site on which the
                        variant locates. Domain annotations come from Interpro
                        database. The number in the brackets following a
                        specific domain is the count of times Interpro assigns
                        the variant position to that domain, typically coming
                        from different predicting databases. Multiple entries
                        separated by ";".
Missing entries:        60,454,832 (67.5% of 89,617,785 records)
Unique Entries:         9,922

Field:                  cds_strand
Type:                   string
Comment:                coding sequence (CDS) strand (+ or -)
Missing entries:        0 
Unique Entries:         5

Field:                  refcodon
Type:                   string
Comment:                reference codon
Missing entries:        2,270,742 (2.5% of 89,617,785 records)
Unique Entries:         1,754

Field:                  SLR_test_statistic
Type:                   float
Comment:                SLR test statistic for testing natural selection on
                        codons. A negative value indicates negative selection,
                        and a positive value indicates positive selection.
                        Larger magnitude of the value suggests stronger
                        evidence.
Missing entries:        46,683,780 (52.1% of 89,617,785 records)
Unique Entries:         511,811
Range:                  -188.177 - 108.85

Field:                  codonpos
Type:                   integer
Comment:                position on the codon (1, 2 or 3)
Missing entries:        2,270,742 (2.5% of 89,617,785 records)
Unique Entries:         4
Range:                  1 - 3;2;3

Field:                  fold_degenerate
Type:                   integer
Comment:                degenerate type (0, 2 or 3)
Missing entries:        2,270,742 (2.5% of 89,617,785 records)
Unique Entries:         79
Range:                  0 - 2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;0

Field:                  Ancestral_allele
Type:                   string
Comment:                Ancestral allele (based on 1000 genomes reference
                        data). The following comes from its original README
                        file: ACTG - high-confidence call, ancestral state
                        supproted by the other two sequences actg - low-
                        confindence call, ancestral state supported by one
                        sequence only N    - failure, the ancestral state is
                        not supported by any other sequence -    - the extant
                        species contains an insertion at this postion .    -
                        no coverage in the alignment
Missing entries:        2,488,820 (2.8% of 89,617,785 records)
Unique Entries:         10

Field:                  Ensembl_geneid
Type:                   string
Comment:                Ensembl gene id
Missing entries:        0 
Unique Entries:         20,839

Field:                  Ensembl_transcriptid
Type:                   string
Comment:                Ensembl transcript ids (separated by ";")
Missing entries:        0 
Unique Entries:         112,159

Field:                  aapos
Type:                   integer
Comment:                : amino acid position as to the protein "-1" if the
                        variant is a splicing site SNP (2bp on each end of an
                        intron)
Missing entries:        0 
Unique Entries:         4,315,466
Range:                  -1 - 9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;9;27;9;9;9;9;35;9;9;9

Field:                  SIFT_score
Type:                   float
Comment:                SIFT score, If a score is smaller than 0.05 the
                        corresponding NS is predicted as "D(amaging)";
                        otherwise it is predicted as "T(olerated)".
Missing entries:        12,024,501 (13.4% of 89,617,785 records)
Unique Entries:         101
Range:                  0 - 1

Field:                  SIFT_score_converted
Type:                   float
Comment:                SIFTnew=1-SIFTori. The larger the more damaging.
Missing entries:        12,024,501 (13.4% of 89,617,785 records)
Unique Entries:         101
Range:                  0 - 1

Field:                  SIFT_pred
Type:                   string
Comment:                If SIFTori is smaller than 0.05 (SIFTnew>0.95) the
                        corresponding NS is predicted as "D(amaging)";
                        otherwise it is predicted as "T(olerated)".
Missing entries:        12,024,501 (13.4% of 89,617,785 records)
Unique Entries:         2

Field:                  Polyphen2_HDIV_score_max
Type:                   float
Comment:                The maximum (most damaging) value of Polyphen2 score
                        based on HumDiv, i.e. hdiv_prob. Use
                        Polyphen2_HDIV_score to get a list of all scores.
Missing entries:        17,086,068 (19.1% of 89,617,785 records)
Unique Entries:         1,001
Range:                  0 - 1

Field:                  Polyphen2_HDIV_score
Type:                   string
Comment:                Polyphen2 score based on HumDiv, i.e. hdiv_prob. The
                        score ranges from 0 to 1, and the corresponding
                        prediction is "probably damaging" if it is in
                        [0.957,1]; "possibly damaging" if it is in
                        [0.453,0.956]; "benign" if it is in [0,0.452]. Score
                        cutoff for binary classification is 0.5, i.e. the
                        prediction is "neutral" if the score is smaller than
                        0.5 and "deleterious" if the score is larger than 0.5.
                        Multiple entries separated by ";".
Missing entries:        17,084,053 (19.1% of 89,617,785 records)
Unique Entries:         8,590,602

Field:                  Polyphen2_HDIV_pred
Type:                   string
Comment:                Polyphen2 prediction based on HumDiv, "D" ("probably
                        damaging"), "P" ("possibly damaging") and "B"
                        ("benign"). Multiple entries separated by ";". Because
                        the availability of multiple values, use expression
                        such as 'D' in Polyphen2_HDIV_pred instead of 'D' =
                        Polyphen2_HDIV_pred to filter variants that are
                        probably damaging.
Missing entries:        17,084,053 (19.1% of 89,617,785 records)
Unique Entries:         83,942

Field:                  Polyphen2_HVAR_score_max
Type:                   float
Comment:                The maximum (most damaging) value of all Polyphen2
                        score based on HumVar, i.e. hvar_prob. Use
                        Polyphen2_HVAR_score_all to get a list of all scores.
Missing entries:        17,086,068 (19.1% of 89,617,785 records)
Unique Entries:         1,001
Range:                  0 - 1

Field:                  Polyphen2_HVAR_score
Type:                   string
Comment:                Polyphen2 score based on HumVar, i.e. hvar_prob. The
                        score ranges from 0 to 1, and the corresponding
                        prediction is "probably damaging" if it is in
                        [0.909,1]; "possibly damaging" if it is in
                        [0.447,0.908]; "benign" if it is in [0,0.446]. Score
                        cutoff for binary classification is 0.5, i.e. the
                        prediction is "neutral" if the score is smaller than
                        0.5 and "deleterious" if the score is larger than 0.5.
                        Multiple entries separated by ";".
Missing entries:        17,084,053 (19.1% of 89,617,785 records)
Unique Entries:         10,999,020

Field:                  Polyphen2_HVAR_pred
Type:                   string
Comment:                Polyphen2 prediction based on HumVar, "D" ("porobably
                        damaging"), "P" ("possibly damaging") and "B"
                        ("benign"). Multiple entries separated by ";". Because
                        the availability of multiple values, use expression
                        such as 'D' in Polyphen2_HVAR_pred instead of 'D' =
                        Polyphen2_HVAR_pred to filter variants that are
                        probably damaging.
Missing entries:        17,084,053 (19.1% of 89,617,785 records)
Unique Entries:         83,681

Field:                  LRT_score
Type:                   float
Comment:                The original LRT two-sided p-value (LRTori).
Missing entries:        21,548,464 (24.0% of 89,617,785 records)
Unique Entries:         826,817
Range:                  0 - 1

Field:                  LRT_score_converted
Type:                   float
Comment:                Converted LRT original p-value (LRTnew). We converted
                        the LRTori to a score suggested by our Human Muation
                        (2011) paper: LRTnew=1-LRTori*0.5 if Omega<1, or
                        LRTnew=LRTori*0.5 if Omega>=1.
Missing entries:        21,548,464 (24.0% of 89,617,785 records)
Unique Entries:         1,168,826
Range:                  0 - 1

Field:                  LRT_pred
Type:                   string
Comment:                LRT prediction, D(eleterious), N(eutral) or U(nknown)
Missing entries:        21,548,464 (24.0% of 89,617,785 records)
Unique Entries:         3

Field:                  MutationTaster_score
Type:                   float
Comment:                MutationTaster score
Missing entries:        1,143,911 (1.3% of 89,617,785 records)
Unique Entries:         598,533
Range:                  0 - 1

Field:                  MutationTaster_score_converted
Type:                   float
Comment:                The converted score suggested by our Human Mutation
                        (2011) paper: if the prediction is "A" or "D"
                        MTnew=MTori; if the prediction is "N" or "P",
                        MTnew=1-MTori.
Missing entries:        4,373,664 (4.9% of 89,617,785 records)
Unique Entries:         999,050
Range:                  0 - 1

Field:                  MutationTaster_pred
Type:                   string
Comment:                MutationTaster prediction, "A"
                        ("disease_causing_automatic"), "D"
                        ("disease_causing"), "N" ("polymorphism") or "P"
                        ("polymorphism_automatic")
Missing entries:        1,143,911 (1.3% of 89,617,785 records)
Unique Entries:         4

Field:                  MutationAssessor_score
Type:                   float
Comment:                MutationAssessor functional impact combined score
                        (MAori)
Missing entries:        14,986,410 (16.7% of 89,617,785 records)
Unique Entries:         2,145
Range:                  -5.545 - 5.975

Field:                  MutationAssessor_score_converted
Type:                   float
Comment:                Scaled to 0-1: MAnew=(MAori-(-5.545))/(5.975-(-5.545))
Missing entries:        14,986,410 (16.7% of 89,617,785 records)
Unique Entries:         2,139
Range:                  0 - 1

Field:                  MutationAssessor_pred
Type:                   string
Comment:                MutationAssessor's functional impact of a variant :
                        predicted functional (high, medium), predicted non-
                        functional (low, neutral)" Please refer to Reva et al.
                        Nucl. Acids Res. (2011) 39(17):e118 for details
Missing entries:        14,986,410 (16.7% of 89,617,785 records)
Unique Entries:         4

Field:                  FATHMM_score
Type:                   float
Comment:                FATHMM default score (weighted for human inherited-
                        disease mutations with Disease Ontology); If a score
                        is smaller than -1.5 the corresponding NS is predicted
                        as "D(AMAGING)"; otherwise it is predicted as
                        "T(OLERATED)". If there's more than one scores
                        associated with the same NS due to isoforms, the
                        smallest score (most damaging) was used. Please refer
                        to Shihab et al Hum. Mut. (2013) 34(1):57-65 for
                        details
Missing entries:        19,342,889 (21.6% of 89,617,785 records)
Unique Entries:         2,135
Range:                  -16.13 - 10.64

Field:                  FATHMM_score_converted
Type:                   float
Comment:                Scaled to 0-1 and reverse direction (the larger the
                        more damaging):
                        FATHMMnew=1-(FATHMMori-(-16.13))/(10.64-(-16.13))
Missing entries:        19,342,889 (21.6% of 89,617,785 records)
Unique Entries:         2,135
Range:                  0 - 1

Field:                  FATHMM_pred
Type:                   string
Comment:                If a FATHMM_score is <=-1.5 the corresponding NS is
                        predicted as "D(AMAGING)"; otherwise it is predicted
                        as "T(OLERATED)".
Missing entries:        19,342,889 (21.6% of 89,617,785 records)
Unique Entries:         2

Field:                  GERP_NR
Type:                   float
Comment:                GERP++ neutral rate
Missing entries:        541,067 (0.6% of 89,617,785 records)
Unique Entries:         1,258
Range:                  0.0465 - 6.17

Field:                  GERP_RS
Type:                   float
Comment:                GERP++ RS score, the larger the score, the more
                        conserved the site.
Missing entries:        541,067 (0.6% of 89,617,785 records)
Unique Entries:         8,412
Range:                  -12.3 - 6.17

Field:                  PhyloP_score
Type:                   float
Comment:                PhyloP score, the larger the score, the more conserved
                        the site.
Missing entries:        64,695 (0.1% of 89,617,785 records)
Unique Entries:         10,245
Range:                  -11.958 - 2.941

Field:                  mg29way_pi
Type:                   string
Comment:                The estimated stationary distribution of A, C, G and T
                        at the site, using SiPhy algorithm based on 29 mammals
                        genomes.
Missing entries:        0 
Unique Entries:         7,239,991

Field:                  mg29way_logOdds
Type:                   float
Comment:                SiPhy score based on 29 mammals genomes. The larger
                        the score, the more conserved the site.
Missing entries:        1,348,155 (1.5% of 89,617,785 records)
Unique Entries:         223,955
Range:                  0.0003 - 37.9718

Field:                  LRT_Omega
Type:                   float
Comment:                estimated nonsynonymous-to-synonymous-rate ratio
                        (reported by LRT)
Missing entries:        21,548,464 (24.0% of 89,617,785 records)
Unique Entries:         842,708
Range:                  0 - 7780.54

Field:                  UniSNP_ids
Type:                   string
Comment:                "rs numbers from UniSNP, which is a cleaned version of
                        dbSNP build 129, in format: rs number1;rs number2;..."
Missing entries:        89,510,596 (99.9% of 89,617,785 records)
Unique Entries:         100,701

Field:                  KGp1_AC
Type:                   integer
Comment:                Alternative allele count in the whole 1000Gp1 data.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         2,172
Range:                  0 - 2184

Field:                  KGp1_AF
Type:                   float
Comment:                Alternative allele frequency in the whole 1000Gp1
                        data.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         2,571
Range:                  0 - 1

Field:                  KGp1_AFR_AC
Type:                   integer
Comment:                Alternative allele counts in the 1000Gp1 African
                        descendent samples.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         493
Range:                  0 - 492

Field:                  KGp1_AFR_AF
Type:                   float
Comment:                Alternative allele frequency in the 1000Gp1 African
                        descendent samples.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         1,062
Range:                  0 - 1

Field:                  KGp1_EUR_AC
Type:                   integer
Comment:                Alternative allele counts in the 1000Gp1 European
                        descendent samples.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         759
Range:                  0 - 758

Field:                  KGp1_EUR_AF
Type:                   float
Comment:                Alternative allele frequency in the 1000Gp1 European
                        descendent samples.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         1,185
Range:                  0 - 1

Field:                  KGp1_AMR_AC
Type:                   integer
Comment:                Alternative allele counts in the 1000Gp1 American
                        descendent samples.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         363
Range:                  0 - 362

Field:                  KGp1_AMR_AF
Type:                   float
Comment:                Alternative allele frequency in the 1000Gp1 American
                        descendent samples.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         735
Range:                  0 - 1

Field:                  KGp1_ASN_AC
Type:                   integer
Comment:                Alternative allele counts in the 1000Gp1 Asian
                        descendent samples.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         573
Range:                  0 - 572

Field:                  KGp1_ASN_AF
Type:                   float
Comment:                Alternative allele frequency in the 1000Gp1 Asian
                        descendent samples.
Missing entries:        89,278,976 (99.6% of 89,617,785 records)
Unique Entries:         939
Range:                  0 - 1

Field:                  ESP6500_AA_AF
Type:                   float
Comment:                Alternative allele frequency in the Afrian American
                        samples of the NHLBI GO Exome Sequencing Project
                        (ESP6500 data set).
Missing entries:        88,817,528 (99.1% of 89,617,785 records)
Unique Entries:         27,424
Range:                  0 - 1

Field:                  ESP6500_EA_AF
Type:                   float
Comment:                Alternative allele frequency in the European American
                        samples of the NHLBI GO Exome Sequencing Project
                        (ESP6500 data set).
Missing entries:        88,817,528 (99.1% of 89,617,785 records)
Unique Entries:         22,975
Range:                  0 - 1