Nomenclature for the description of sequence variants (mutation nomenclature)

Versioning

Last modified January 14, 2013

NOTE: this website is frozen since May 1, 2016. It has been replaced by a new version at http://www.HGVS.org/varnomen. These pages serve as archival copy only.

Introduction

The recommendations for the description of sequence variants are designed to be stable, meaningful, memorable and unequivocal. Still, every now and then small modifications will need to be made to remove small inconsistencies and/or to clarify confusing conventions. In addition, the recommendations may be extended to resolve cases that were hitherto not covered. To allow users to specify up to what point they follow the HGVS recommendations we have started to work with version numbers.

As of now, any change in the recommendations will get a new version number based on the date of the change. Both in the version list, and on the page giving details of the change, it will be clearly marked using a format like date 2012-08-31. The version of the HGVS recommendations including that change will be version 2.120831.

At the top of all pages on this site you will also find a Last modified date. This date indicates when the respective page was modified last. When this includes changes/extensions of the HGVS recommendation, the version number of the recommendation will also change. Note however that it often happens that simply a typing error was corrected, an example was added, an explanation was further clarified, a question answered, etc. In such cases the recommendations do not actually change and the version number will thus also not change.

Versions

Version 0 - On the page "History regarding the description of sequence variants" we give an overview of all publications on the description of sequence variants. These papers can be considered as pre-versions of the first recommendations, a version 0.

Version 1 - we consider the 2000 publication of den Dunnen JT and Antonarakis SE (Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum.Mutat. 15 (1): 7-12) as a more formal set of recommendations, i.e. version 1.

Version 2 - We are currently preparing a new publication that will summarize the current HGVS recommendations. The most significant and latest changes for version 2.0 compared to version 1.0 include;

Reference sequence - the recommendation is to use a Locus Reference Genomic sequence (LRG) (Dalgleish et al. 2010) as the reference sequence for variant descriptions. LRGs support descriptions using both genomic and coding DNA reference sequences and have been specifically made for application in a diagnostic setting (see Reference Sequences).
In addition, indicators for new types of reference sequences have been added (e.g. m. and n., see Standards) as well as indicators to specify different transcripts / protein isoforms generated from one gene (see Standards)
Definitions - to enhance clarity as well as to facilitate computational analysis and description of sequence variants, the basic types of variants had to be defined more strictly. In addition descriptions have been prioritized, meaning that when a description is possible according to several classes, e.g. as a duplication or an insertion, one specific class is preferred. For an overview see Standards - definitions)
Pre-existing standards - several scientist have pointed out that we have thus far neglected the fact that some standards were already existing before those for the description of sequence changes were made. It is thus essential that we follow these standards in our recommendations. The most important of these are the pre-existing standards from the IUPAC (International Union of Pure and Applied Chemistry) and IUBMB (International Union of Biochemistry and Molecular Biology) for the description of nucleic acids and amino acids (see below). These include letter codes to describe incompletely specified residues at both DNA and protein level (see Standards). The most controversial of these changes is that where the description of the stop codon at protein/amino acid level changed from 'X' to 'Ter'/'*' since 'X' in the IUPAC-IUB nomenclature means an "unspecified" or "unknown" amino acid.
- Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences - IUBMB (NC-IUB)
  specifying the description of nucleotides (see list)
- Nomenclature and Symbolism for Amino Acids and Peptides - IUPAC-IUB
  specifying the description of amino acids (see list)
- NCBI standards for sequence files and database searches (e.g. BLAST)
Incorporate ISCN standards - to describe microscopically visible chromosomal changes, the cytogenetics community uses the ISCN (International System for Human Cytogenetic Nomenclature) standards (see ISCN-2005); the latest update is from 2009 (editors Lisa Schaffer, Marilyn Slovak, Lynda Campbell). Were initially direct chromosome spreads used only, later hybridisation technologies like FISH (Fluorescent In Situ Hybridisation) and arrays (arrayCHG, SNP-arrays) were introduced to determine the state of specific sequences tested. On the HGVS pages we have since 2005 suggested ways to describe changes detected using such technologies (see Uncertainties). These recommendations have now matured and been incorporated. Furthermore, where possible, we have incorporated established ISCN standards in the HGVS recommendations. Examples include the use of "/" to describe somatic variants and "//" for chimerism (see Standards).
Simplification - in the 2000 recommendations (v1.0), some symbols were used for more then one purpose which may lead to undesired confusion. For example the "+" character was used both in nucleotide numbering (indicating an intronic position) and to separate two alleles while for the latter also the ";" character was used. The recommendation is now to use only ";". A complete overview of the characters and codes use can be found at the Standards page.
Prediction / experimental proof - it is often not clear whether a description of a variant at protein level is based on experimental evidence or merely a prediction based on what was detected at DNA level. To make this distinction more obvious, the recommendation is to describe the variant at protein level between brackets, like p.(Arg12Gly), when it is a prediction based on DNA data only. When RNA has been analysed, and some experimental evidence exists to support the prediction, the variant may be described without brackets, like p.Arg12Gly.
Repeated sequences - the 2000 recommendations where not very specific regarding the description of a variability in repeated sequences, mono-, di-, tri-nucleotide stretches, etc. Recommendations for the description of such variability have now set (see Recommendations). The format designed is also used to describe more complex copy number variation of larger stretches of DNA, e.g. the presence of two additional copies of one or more exons of a gene, often with the breakpoints not fully characterised.

See the Version list for additional changes and the latest version number.