Recommendations for the description of DNA sequence variants - v2.0


Last modified November 11, 2013

Since references to WWW-sites are not yet acknowledged as citations, please mention den Dunnen JT and Antonarakis SE (2000). Hum.Mutat. 15: 7-12 when referring to tseparated by at least one nucleotidehese pages.


Contents


DNA level

(suggestions extending the published recommendations are in italics)



Substitutions

A nucleotide substitution is a sequence change where one nucleotide is replaced by one other nucleotide (see Standards - Definition). Nucleotide substituions are described using a ">"-character (indicating "changes to"). 
NOTE: changes involving two or more consecutive nucleotides are described as deletion/insertions (indels, see Deletion/insetions).

NOTE: it is not correct to describe "polymorphisms" as c.76A/G (see Discussion)


Deletions

A nucleotide deletion is a sequence change where one or more nucleotides are removed (see see Standards - Definition). Deletions are described using "del" after an indication of the first and last nucleotide(s) deleted, separated by a "_" (underscore). For all descriptions the most 3' position possible is arbitrarily assigned to have been changed.
NOTE: to discriminate known variable sequences from other changes it is recommended to describe individual alleles differing from the reference sequence like g.210T[5] (preferred over g.210_211delTT) or g.121T[9] (preferred over g.210_211dupTT) (see Repeated sequences). 


Duplications

Duplications are designated by "dup" after an indication of the first and last nucleotide(s) duplicated. It should be noted that the description "dup" (see Standards) may by definition only be used when the sequence copy is directly 3'-flanking the original copy.  For all descriptions the most 3' position possible is arbitrarily assigned to have been changed. For the addition of more then 1 copy (3, 4, 5, etc.) see Repeated sequences and see Discussion.
NOTE: to discriminate known variable sequences from other changes it is recommended to describe individual alleles differing from the reference sequence like g.210T[5] (preferred over g.215_216del) or g.210T[9] (preferred over g.215_216dup) (see Repeated sequences). 


Insertions

Insertions are designated by "ins" after an indication of the nucleotides flanking the insertion site, followed by a description of the nucleotides inserted. Duplicating insertions should be described as duplications (see Discussion), not as insertion. For large insertions the number of inserted nucleotides should be mentioned, together with an accession.version number referring to a sequence database file containing the complete inserted sequence. 


Deletion / insertions (indels)

Deletion/insertions of two or more consecutive nucleotides (indels) are described as a deletion followed by an insertion (see Discussion).


Inversions

Inversions are designated by "inv" after an indication of the first and last nucleotides affected by the inversion.


Conversions

Conversions are designated by "con" after an indication of the first and last nucleotides affected by the conversion, followed by a description of the origin of the new nucleotides (see Discussion).


Translocations

Translocations are described at the molecular level using the format "t(X;4)(p21.2;q34)", followed by the usual numbering, indicating the position translocation breakpoint. The sequences of the translocation breakpoints need to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession.version numbers should be given (see Discussion).


More changes in one individual

Two or more changes in one individual are described by combining the changes, per allele (chromosome) between square brackets ("[]").

Changes in different alleles (e.g. in recessive diseases) are described as "[change allele 1];[change allele 2]". Mixed descriptions like c.[76A>C];g.[91C>G] should not be used.

Two variations in one allele, separated by at least one nucleotide, are described as "[first change ; second change]". For the description of haplotypes see Discussion.
NOTE: "separated by at least one nucleotide" means the description c.76_77delinsTT is preferred over c.[76A>T; 77G>T].

Mosaicism - two different nucleotides in one position caused by somatic mosaicims are described as "[=/nucleotide 2]" (see FAQ).

Chimerism - two different nucleotides in one position caused by chimerism are described as "[=//nucleotide 2]"

Two sequence changes with alleles unknown are described as "[change allele 1 (;) change allele 2]" (see FAQ).


Repeated sequences

A frequently occuring sequence change is the variability of repeated sequences. Within this category we discriminate both small sequences (mono-, di-, tri-, etc nucleotide repeats) as well as the much larger ones. Such changes are described using the format "position-first-repeat-unit_[number]" (e.g. g.123_124[4]) where position-first-repeat-unit gives the location of the first unit of the variable sequence repeat and [number] the number of units present in the allele described.

Examples


Complex changes

Sequence changes can be very complex, involving several changes at a specific location. The description of such changes using the recommendations given above can become rather complicated and at some point, although literally correct, effectively meaningless. In such cases the recommendation is to submit the sequence that has been determined to GenBank and to use the accession.version number in the description.


| Top of page | MutNomen homepage | Check-list | Symbols, codons, etc. |
| Recommendations:  generalDNARNAprotein, uncertain |
| Discussions | FAQ's | Symbols, codons, etc. | History |
| Example descriptions:  QuickRefDNARNAprotein |

Copyright HGVS 2007 All Rights Reserved
Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer