Description of sequence changes: 
examples DNA-level


Last modified November 16, 2015

Since references to WWW-sites are not yet acknowledged as citations, please mention den Dunnen JT and Antonarakis SE (2000). Hum.Mutat. 15: 7-12 when referring to these pages.


Contents


Introduction

Within this page examples will be given for the description of sequence variations. The examples will be given independently for descriptions at DNA, RNA and protein level. All examples are described relative to a Reference Sequence, depending on the level a genomic or coding DNA sequence (DNA-level), an mRNA sequence (RNA-level) or an amino acid sequence (protein level). 


Reference sequence DNA-level


Within this page examples will be given for the description of sequence variations in a DNA sequence. For other examples go to those describing changes in RNA. Examples for protein level are given at the protein page. All examples are described relative to a Reference Sequence, here a coding DNA sequence.

Part of gene nucleotide numbering
genomic
Reference Sequence 
nucleotide numbering
coding DNA
Reference Sequence
 nucleotide numbering
protein
Reference Sequence
5' gene flanking region 1 to 270 (-300 to -31)  -
exon 1 5' UTR 271 to 300 -30 to -1  -
coding region 301 to 312 1 to 12 1 to 4
intron 1 313 to 412

12+1 ... 12+50,
13-50 ... 13-1

-
exon 2 413 to 488 13 to 88 5 to 29 (30)
intron 2 489 to 688 88+1 ... 88+100,
89-100 ... 89-1 
-
exon 3 689 to 723 89 to 123 30 to 41
intron 3 contains rare alternatively spliced exon from 800 to 859 (coding DNA 123+77 to 123+136) 724 to 1023 123+1 ... 123+150,
124-150 ... 124-1 
-
exon 4 1024 to 1200 124 to 300 42 to 100
intron 4 1201 to 1600 300+1 ... 300+200,
301-200 ... 301-1 
-
exon 5 coding region 1601 to 1630 301 to 330 101 to 109
3' UTR, containing a (CA)7-stretch from nucleotides 1700 to 1713 (coding DNA *70 to *83); poly-A addition site at 1825 (coding DNA *195) 1631 to 1850 *1 to *220 -
3' gene flanking region 1851 to 2000 (*221 to *370)

NOTE: nucleotides in introns in the 5' UTR are numbered like -23+1, -23+2, ..., -22-2, -22-1. Nucleotides in introns in the 3' UTR are numbered like *154+1, *154+2, ..., *155-2, *155-1. 

Legend:
Reference sequence of imaginary gene used for the exaples given on this page. Nucleotide +1 in the coding DNA reference sequence is the A of the ATG translation initiation codon. Abbreviations used: nt = nucleotide, UTR = untranslated region of the mRNA. For a picture of part of this hypothetical sequence see Figure.


General


Publications reporting changes in different sequences (genes) or which report linkage or association studies should prevent any confusion regarding which variant resides in which sequence. An easy way to achieve this is to include an unequivocal identifier to the reference sequence used in the description, e.g. NM_004006.2:c.3G>T or DMD:c.3G>T (see Discussion).


Substitutions


Substitutions are designated by a ">"-character after the number of the affected nucleotide.


Deletion


Deletions are designated by "del" after a description of the deleted segment, i.e. the first (and last) nucleotide(s) deleted (see also Discussion). To describe deletions with unknown breakpoints, e.g. based on Southern blotting, PCR, arrayCGH, SNP array data, etc. see Uncertainties.


Duplication


Duplications are designated by "dup" after a description of the duplicated segment, i.e. the first (and last) nucleotide(s) duplicated (even when a mono-nucleotide is duplicated, see Recommendations). To describe duplications with unknown breakpoints, e.g. based on Southern blotting, PCR, arrayCGH, SNP array data, etc. see also Uncertainties.


Insertion


Insertions are designated by "ins" after the nucleotides flanking the insertion. NOTE: duplicating insertions (incl. duplication of a mono-nucleotide) should be described as duplications (see above).


Sequence repeat variability


For the recommendations how to describe sequence repeat variability see Recommendations


Inversion


Inversions are designated by "inv" after the nt number of the nucleotides inverted.


Gene conversion


Gene conversions are designated by "con" after the nt number of the nucleotides converted, followed by a description of the origin on the new sequence; "region_changed" con "region of origin" (see Discussion).


Translocation


Translocations are designated in the format  "t(X;4)(p21.2;q34)", followed by the usual description, placed between brackets, indicating the exact translocation breakpoint. The sequences of the translocation breakpoints need to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession numbers should be given (see Discussion).


Complex


Complex rearrangements are rearrangements which consist of several different types of the six elementary content changes substitution, deletion, duplication, insertion, inversion and translocation. Such rearrangements can be very complex and difficult to describe. Specific recommendations to describe such changes have not made. Complex rearrangements can be best described as a combination of the elementary changes. 

Deletion / insertions ("indels") are described as a deletion ("del"), followed by an insertion ("ins") after a description of the deleted segment, i.e. the first (and last) nucleotide(s) deleted (see Discussion).


Miscellaneous



| Top of page | MutNomen homepage | Check-list |
| Recommendations:  DNARNAprotein, uncertain |
| Discussions | FAQ's | Symbols, codons, etc. | History |
| Example descriptions:  QuickRef / symbolsRNAprotein |

Copyright © HGVS 2007 All Rights Reserved
Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer