Recommendations for the description of DNA sequence variants - v2.0


Last modified January 28, 2016

NOTE: this website is frozen since May 1, 2016. It has been replaced by a new version at http://www.HGVS.org/varnomen. These pages serve as archival copy only.


Contents


DNA level

(suggestions extending the published recommendations are in italics)



Substitutions

A nucleotide substitution is a sequence change where one nucleotide is replaced by one other nucleotide (see Standards - Definition). Nucleotide substituions are described using a ">"-character (indicating "changes to"). 
NOTE: changes involving two or more consecutive nucleotides are described as deletion/insertions (indels, see Deletion/insetions).

NOTE: it is not correct to describe "polymorphisms" as c.76A/G (see Discussion)


Deletions

A nucleotide deletion is a sequence change where one or more nucleotides are removed (see see Standards - Definition). Deletions are described using "del" after an indication of the first and last nucleotide(s) deleted, separated by a "_" (underscore). For all descriptions the most 3' position possible is arbitrarily assigned to have been changed.
NOTE: to discriminate known variable sequences from other changes it is recommended to describe individual alleles differing from the reference sequence like g.210T[5] (preferred over g.210_211delTT) or g.121T[9] (preferred over g.210_211dupTT) (see Repeated sequences). 


Duplications

Duplications are designated by "dup" after an indication of the first and last nucleotide(s) duplicated. It should be noted that the description "dup" (see Standards) may by definition only be used when the sequence copy is directly 3'-flanking the original copy.  For all descriptions the most 3' position possible is arbitrarily assigned to have been changed. For the addition of more then 1 copy (3, 4, 5, etc.) see Repeated sequences and see Discussion.
NOTE: to discriminate known variable sequences from other changes it is recommended to describe individual alleles differing from the reference sequence like g.210T[5] (preferred over g.215_216del) or g.210T[9] (preferred over g.215_216dup) (see Repeated sequences). 


Insertions

Insertions are designated by "ins" after an indication of the nucleotides flanking the insertion site, followed by a description of the nucleotides inserted. Duplicating insertions should be described as duplications (see Discussion), not as insertion. For large insertions the number of inserted nucleotides should be mentioned, together with an accession.version number referring to a sequence database file containing the complete inserted sequence. 


Deletion / insertions (indels)

Deletion/insertions of two or more consecutive nucleotides (indels) are described as a deletion followed by an insertion (see Discussion).


Inversions

Inversions are designated by "inv" after an indication of the first and last nucleotides affected by the inversion.


Conversions

Conversions are designated by "con" after an indication of the first and last nucleotides affected by the conversion, followed by a description of the origin of the new nucleotides (see Discussion).


Translocations

Translocations are described at the molecular level using the format "t(X;4)(p21.2;q34)", followed by the usual numbering, indicating the position translocation breakpoint. The sequences of the translocation breakpoints need to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession.version numbers should be given (see Discussion).


More changes in one individual

Two or more changes in a gene are described by combining the changes, per chromosome (maternal and paternal), between square brackets ("[;];[;]") and using a semicolon (";") as separator:  [first change maternal; second change maternal]; [first change paternal; second change paternal]" (see Discussion). When changes are in different genes on different chromosomes a space (" ") is used to separate the different chromosomes ("[;] [;]").
NOTE: mixed descriptions like c.[76A>C];g.[91C>G] should not be used.

Mosaicism

Mosaicism - two different nucleotides in one position caused by somatic mosaicims are described as "[=/nucleotide 2]" (see FAQ).

Chimerism

Chimerism - two different nucleotides in one position caused by chimerism are described as "[=//nucleotide 2]"


Repeated sequences

A frequently occuring sequence change is the variability of repeated sequences. Within this category we discriminate both small sequences (mono-, di-, tri-, etc nucleotide repeats) as well as the much larger ones. Such changes are described using the format "position-first-repeat-unit_[number]" (e.g. g.123_124[4]) where position-first-repeat-unit gives the location of the first unit of the variable sequence repeat and [number] the number of units present in the allele described.

Examples


Complex changes

Sequence changes can be very complex, involving several changes at a specific location. The description of such changes using the recommendations given above can become rather complicated and at some point, although literally correct, effectively meaningless. In such cases the recommendation is to submit the sequence that has been determined to GenBank and to use the accession.version number in the description.


| Top of page | MutNomen homepage | Check-list | Symbols, codons, etc. |
| Recommendations:  generalDNARNAprotein, uncertain |
| Discussions | FAQ's | Symbols, codons, etc. | History |
| Example descriptions:  QuickRefDNARNAprotein |

Copyright � HGVS 2007 All Rights Reserved
Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer