Variant Consequences

 

For each variant that is mapped to the reference genome, we identify each Ensembl transcript that overlap the variant. We then use a rule-based approach to predict the effects that each allele of the variant may have on the transcript. The set of consequence terms, defined by the Sequence Ontology (SO), that can be currently assigned to each combination of an allele and a transcript is shown in the table below. Note that each allele of each variant may have a different effect in different transcripts.

This approach is applied to all germline variants and somatic mutations stored in the Ensembl variation databases (though we do not yet currently calculate consequences for structural variants). The resulting consequence type calls, along with information determined as part of the process, such as the cDNA and CDS coordinates, and the affected codons and amino acids in coding transcripts, are stored in the Ensembl Variation database and displayed on the website. You can use this pipeline for your own data via the VEP.

We used SO terms by default since the Ensembl release 68. There is an equivalent SO term for each of our old Ensembl terms but in a few cases there is a more specific SO term available, as shown in the table below. If you have text files or VEP outputs with our old Ensembl terms, you can easily update these to using the SO terms by running the following script e.g. 

perl convert_ensembl_to_SO_consequences.pl input.txt > converted.txt

See below a diagram showing the location of each display term relative to the transcript structure: 
consequence diagram

 

Spliceosomal introns often reside within the sequence of eukaryotic protein-coding genes. Within the intron, a donor site (5' end of the intron), a branch site (near the 3' end of the intron) and an acceptor site (3' end of the intron) are required for splicing. The splice donor site includes an almost invariant sequence GU at the 5' end of the intron, within a larger, less highly conserved region. The splice acceptor site at the 3' end of the intron terminates the intron with an almost invariant AG sequence. Upstream (5'-ward) from the AG there is a region high in pyrimidines (C and U), or polypyrimidine tract. Further upstream from the polypyrimidine tract is the branchpoint, which includes an adenine nucleotide involved in lariat formation.The consensus sequence for an intron (in IUPAC nucleic acid notation) is: G-G-[cut]-G-U-R-A-G-U (donor site) ... intron sequence ... Y-U-R-A-C (branch sequence 20-50 nucleotides upstream of acceptor site) ... Y-rich-N-C-A-G-[cut]-G (acceptor site). However, it is noted that the specific sequence of intronic splicing elements and the number of nucleotides between the branchpoint and the nearest 3’ acceptor site affect splice site selection.[6][7] Also, point mutations in the underlying DNA or errors during transcription can activate a cryptic splice site in part of the transcript that usually is not spliced. This results in a mature messenger RNA with a missing section of an exon. In this way, a point mutation, which might otherwise affect only a single amino acid, can manifest as a deletion or truncation in the final protein.

Simple illustration of exons and introns in pre-mRNA

 

The terms in the table below are shown in order of severity (more severe to less severe) as estimated by Ensembl, and this ordering is used on the website summary views. This ordering is necessarily subjective and API and VEP users can always get the full set of consequences for each allele and make their own severity judgement. The IMPACT rating is a separate rating given for compatibility with other variant annotation tools (e.g. snpEff).

* SO term SO description SO accession Display term IMPACT
  transcript_ablation A feature ablation whereby the deleted region includes a transcript feature SO:0001893 Transcript ablation HIGH
  splice_acceptor_variant A splice variant that changes the 2 base region at the 3' end of an intron SO:0001574 Splice acceptor variant HIGH
  splice_donor_variant A splice variant that changes the 2 base region at the 5' end of an intron SO:0001575 Splice donor variant HIGH
  stop_gained A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript SO:0001587 Stop gained HIGH
  frameshift_variant A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three SO:0001589 Frameshift variant HIGH
  stop_lost A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript SO:0001578 Stop lost HIGH
  start_lost A codon variant that changes at least one base of the canonical start codo SO:0002012 Start lost HIGH
  transcript_amplification A feature amplification of a region containing a transcript SO:0001889 Transcript amplification HIGH
  inframe_insertion An inframe non synonymous variant that inserts bases into in the coding sequenc SO:0001821 Inframe insertion MODERATE
  inframe_deletion An inframe non synonymous variant that deletes bases from the coding sequenc SO:0001822 Inframe deletion MODERATE
  missense_variant A sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved SO:0001583 Missense variant MODERATE
  protein_altering_variant A sequence_variant which is predicted to change the protein encoded in the coding sequence SO:0001818 Protein altering variant MODERATE
  splice_region_variant A sequence variant in which a change has occurred within the region of the splice site, either within 1-3 bases of the exon or 3-8 bases of the intron SO:0001630 Splice region variant LOW
  incomplete_terminal_codon_variant A sequence variant where at least one base of the final codon of an incompletely annotated transcript is changed SO:0001626 Incomplete terminal codon variant LOW
  stop_retained_variant A sequence variant where at least one base in the terminator codon is changed, but the terminator remains SO:0001567 Stop retained variant LOW
  synonymous_variant A sequence variant where there is no resulting change to the encoded amino acid SO:0001819 Synonymous variant LOW
  coding_sequence_variant A sequence variant that changes the coding sequence SO:0001580 Coding sequence variant MODIFIER
  mature_miRNA_variant A transcript variant located with the sequence of the mature miRNA SO:0001620 Mature miRNA variant MODIFIER
  5_prime_UTR_variant A UTR variant of the 5' UTR SO:0001623 5 prime UTR variant MODIFIER
  3_prime_UTR_variant A UTR variant of the 3' UTR SO:0001624 3 prime UTR variant MODIFIER
  non_coding_transcript_exon_variant A sequence variant that changes non-coding exon sequence in a non-coding transcript SO:0001792 Non coding transcript exon variant MODIFIER
  intron_variant A transcript variant occurring within an intron SO:0001627 Intron variant MODIFIER
  NMD_transcript_variant A variant in a transcript that is the target of NMD SO:0001621 NMD transcript variant MODIFIER
  non_coding_transcript_variant A transcript variant of a non coding RNA gene SO:0001619 Non coding transcript variant MODIFIER
  upstream_gene_variant A sequence variant located 5' of a gene SO:0001631 Upstream gene variant MODIFIER
  downstream_gene_variant A sequence variant located 3' of a gene SO:0001632 Downstream gene variant MODIFIER
  TFBS_ablation A feature ablation whereby the deleted region includes a transcription factor binding site SO:0001895 TFBS ablation MODIFIER
  TFBS_amplification A feature amplification of a region containing a transcription factor binding site SO:0001892 TFBS amplification MODIFIER
  TF_binding_site_variant A sequence variant located within a transcription factor binding site SO:0001782 TF binding site variant MODIFIER
  regulatory_region_ablation A feature ablation whereby the deleted region includes a regulatory region SO:0001894 Regulatory region ablation MODERATE
  regulatory_region_amplification A feature amplification of a region containing a regulatory region SO:0001891 Regulatory region amplification MODIFIER
  feature_elongation A sequence variant located within a regulatory region SO:0001907 Feature elongation MODIFIER
  regulatory_region_variant A sequence variant located within a regulatory region SO:0001566 Regulatory region variant MODIFIER
  feature_truncation A sequence variant that causes the reduction of a genomic feature, with regard to the reference sequence SO:0001906 Feature truncation MODIFIER
  intergenic_variant A sequence variant located in the intergenic region, between genes SO:0001628 Intergenic variant MODIFIER

* Corresponding colours for the Ensembl web displays.