 |
Description
of sequence changes:
examples protein-level
|
Last modified May 12, 2007
|
Since references to WWW-sites
are not yet acknowledged as citations, please mention den
Dunnen JT and Antonarakis SE (2000). Hum.Mutat. 15:7-12 when referring to
these pages.
Contents
Introduction
Within this page examples will be given for the description of sequence
variants on protein
level, examples to describe changes at DNA and RNA level are given at other pages. All examples are described
relative to a reference sequence, here the amino acid (protein)
sequence.
Reference sequence
| Part of gene |
nucleotide numbering
protein
Reference Sequence |
nucleotide numbering
coding DNA
Reference Sequence |
nucleotide numbering
genomic
Reference Sequence |
| 5' gene flanking region |
- |
(-300 to -31) |
1 to 270 |
| exon 1 |
5' UTR |
- |
-30 to -1 |
271 to 300 |
| coding region |
1 to 4 |
1 to 12 |
301 to 312 |
| intron 1 |
- |
12+1 ... 12+50,
13-50 ... 13-1 |
313 to 412 |
| exon 2 |
5 to 29 (30) |
13 to 88 |
413 to 488 |
| intron 2 |
- |
88+1 ... 88+100,
89-100 ... 89-1 |
489 to 689 |
| exon 3 |
30 to 41 |
89 to 123 |
689 to 723 |
| intron 3 |
contains rare alternatively spliced exon from 800 to 859 (coding DNA 123+77 to
123+136) |
- |
123+1 ... 123+150,
124-150 ... 124-1 |
724 to 1023 |
| exon 4 |
42 to 100 |
124 to 300 |
1024 to 1200 |
| intron 4 |
- |
300+1 ... 300+200,
301-200 ... 301-1 |
1201 to 1600 |
| exon 5 |
coding region |
101 to 109 |
301 to 330 |
1601 to 1630 |
| 3' UTR, containing a (CA)7-stretch
from nts 1700 to 1713 (coding DNA *71 to *83); poly-A addition site at 1825 (coding DNA *195) |
- |
*1 to *220 |
1631 to 1850 |
| 3' gene flanking region |
- |
(*221 to *370) |
1851 to 2000 |
Legend:
Reference sequence of imaginary gene used for the exaples given on this page. Nucleotide
+1 in the coding DNA reference sequence is the A of the ATG translation initiation
codon.
Abbreviations used: nt = nucleotide, nts = nucleotides, UTR = untranslated region of the
mRNA. For a picture of part of this hypothetical sequence see
Figure.
General
It should be noted that the descriptions at protein level, even more than those at RNA
level, are mostly deduced and not based on experimental evidence.
Publications describing changes at protein level should make it clear whether experimental
proof was available or not. In fact, when changes are reported for which experimental
proof is not available one should consider to list them between brackets.
Sequence changes at protein level are basically described like those at the DNA level, with a few modifications;
- the three-letter amino acid code is prefered (see Discussion),
with "X" designating a translation termination codon;
for clarity we this page describes changes using the three-letter amino acid
- amino acid numbering
- descriptions start with the amino acid, followed by its number (like p.Cys24 or p.C24)
- the translation initiator Methionine is numbered as +1
- amino acids after the stop codon (here after X110) are numbered relative to the stop
codon as Gly110+1, Trp110+2, etc.
- "silent" changes
description of so called "silent" changes in the format p.Leu54Leu (or
p.L54L) is not allowed; descriptions should be given at DNA level, it is
non-informative and not unequivocal (there are five possibilities at DNA
level which may underlie p.Leu54Leu); correct description has the
format c.162C>G.
Substitutions
Substitutions can be described without using the specific ">"-character
which is used on DNA and RNA level (i.e. p.Trp26Cys in stead of p.Trp26>Cys).
- translation initiation site (Methionine codon, see Discussion)
example: at DNA ATG (Met) changes to CTG (Leu) - description depends on the
consequences of the change on the translation product (protein) produced;
- effect unknown (most cases) - p.0? (alternatively
p.Met1?)
- no protein produced - p.0 (experimental data should be available)
- new upstream translation initiation site -
p.[Met1extMet-8; Met1Leu] (alternatively p.[M1extM-8; M1L])
NOTE: effectively this is an insertion (of Met-8 to Thr-1 between Thr-1 and
Met1) combined with an amino acid change (Met1Leu)
- new downstream translation initiation site - p.Met1_His15del (alternatively
p.M1_H15del)
NOTE: effectively this is a deletion
- one amino acid to
- another amino acid (missense change)
Tryptophan 26 to a Cysteine; p.Trp26Cys (alternatively p.W26C)
- a stop codon (nonsense change)
Tryptophan 26 to a stop codon; p.Trp26X (alternatively p.W26X)
- translation termination site (stop codon, nonstop change)
- p.X110SerextX*17 (alternatively p.X110SextX*17)
the stop codon (X) at position 110 is changed to a codon for Glutamine (Gln, Q), adding a tail of 16 new amino acids to the protein's
C-terminus after which a new stop codon is reached
NOTE: effectively this is a substitution (X110Ser) combined with an
insertion (from Gly*1 to the new stop codon X*17)
NOTE: polymorphic variants are sometimes
described as p.36Leu/Ile (p.36L/I) or p.36Leu/Leu (p.36L/L) but this is not correct (see Discussion).
Deletions
Deletions are designated by "del" after a description of the
deleted segment, i.e. the first (and last) amino acid(s) deleted.
- MKLGHQQQCC to M_LGHQQQCC is described as p.Lys2del
(alternatively p.K2del)
- MKLGHQQQCC to MKL___QQCC is described as p.Gly4_Gln6del
(alternatively p.G4_Q6del)
- MKLGHQQQCC to MKLGHQQCC is described as p.Glndel (p.Q8del)
NOTE: for deletions in single amino acid stretches or tandem repeats, the
most 3' residue is arbitrarily assigned to have been deleted
- if a deletion creates a new amino acid at the deletion junction the change is described
as a insertion/deletion (see indels)
Duplications
Duplications are designated by "dup" after a description of the
duplicated segment, i.e. the first (and last) amino acid(s) duplicated.
- MKLGHQQQCC to MKLGHQGHQQQCC is described as p.Gly4_Gln6dup
(alternatively p.G4_Q6dup)
- MKLGHQQQCC to MKLGHQQQQCC is described as p.Gln8dup (alternatively
p.Q8dup)
NOTE: for duplications in single amino acid stretches or tandem repeats, the
most 3' residue is arbitrarily assigned to have been duplicated
- MKLGHQQQCC to MKLGHQHQQQCC is described as p.His5_Gln6dup (alternatively
p.H5_Q6dup)
NOTE: duplicating insertions in single amino acid stretches (or short tandem
repeats) should be described as a duplication and not as an insertion - so in the example
shown p.Gln6_Gln7insHisGln (alternatively p.Q6_C7insHQ) is not correct
- variability of short sequence repeats are designated as
p.Gln6(3_6)
(alternatively p.Q6(3_6)) describing that the Glutamine (Gln, Q) stretch
starting at position 6
in MKLGHQQQCC is found repeated 3 to 6 times in the population
Insertions
Insertions are designated by "ins" after a description of the
amino acids flanking the insertion site, followed by a description of the inserted amino
acids. When the insertion is large it may be described by its length (e.g. p.K2_L3ins34)
but the inserted sequence should be described in detail in a footnote or as the accession
number of the sequence as submitted to a sequence database (Genbank, EMBL, DDJB).
Duplicating insertions should be described as duplications (see
Discussion).
- MKLGHQQQCC to MKQSLGHQQQCC is described as p.Lys2_Leu3insGlnSer
(alternatively p.K2_L3insQS)
- insertion of a 345 nucleotide sequence in intron 3; the sequence of the insertion need
to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession numbers
should be given
When an insertion creates a new amino acid at the insertion junction the change is
described as an insertion/deletion (see indels)
Translocations
Translocations at protein level occur when a translocation at DNA level leads to the
production of a fusion protein, joining the N-terminal end of the protein on one
chromosome to the C-terminal end of the protein on the other chromosome (and vice versa).
No recommendations have been made sofar to describe protein translocations.
- t(X;17)(DMD:p.Met1_Val1506; SGCA:p.Val250_X387) describes a fusion protein resulting from a
translocation between the chromosomes X and 17; the fusion protein contains an N-terminal
segment of DMD (dystrophin, amino acids Methionine-1 to Valine-1506), and a C-terminal
segment of SGCA (alpha-sarcoglycan, amino acids Valine-250 to the stop codon at 387)
Complex rearrangements
Complex rearrangements are rearrangements which consist of several different types of
the six elementary content changes substitution, deletion, duplication,
insertion, inversion and translocation. Such rearrangements can be very complex and
difficult to describe. Specific recommendations to describe such changes have not made.
Complex rearrangements can be best described as a combination of the elementary
changes.
Deletion/insertions (indels) are described as a deletion
followed by an insertion (see Discussion)
- p.Cys28_Lys29delinsTrp (alternatively p.C28_K29delinsW) denotes a 3 bp deletion
affecting the codons for Cysteine-28 and Lysine-29, substituting them for a codon for
Tryptophan
- p.Cys28delinsTrpVal (alternatively p.C28delinsWV) denotes a 3 bp insertion in the codon
for Cysteine-28, generating codons for Tryptophan (W) and Valine (V)
- frame shift changes (see
Discussion) cause, from a specific point onwards, the replacement of the normal
C-terminal end of a protein for a new segment, encoded by the shifted reading frame. Frame
shift changes can thus be best considered as deletion/insertions (indels).
Frame shifts are designated by "fs" after the amino acid(s) affected by the
change. Descriptions either use a short ("fs" only) or
long ("fsX#") notation; the long description should include the change occurring at the site of the frame shift
(see
Discussion). In "fsX#", "X#" indicates at which codon position the new reading frame ends in a stop codon
(X). The position of the stop in the new reading frame is calculated starting at the first amino acid that is changed by the frame shift, and ending at the first stop codon (X#).
Thus, effectively, on DNA/RNA level the change might be
one or more coding triplets up- or downstream.
NOTE: the shifted reading frame is thus open for '#-1' amino acids.
- p.Arg97ProfsX23 (short p.Arg97fs) denotes a frame shifting change with Arginine-97 as the first affected amino acid, changing into a Proline and the
new reading frame ending in a stop
at position 23
- p.Arg97HisfsX5 (short p.Arg97fs) denotes a frame shifting change deleting amino acids Arginine-97 to Tryptophan-99,
changing into a Histidine at the deletion junction and ending in the
new reading frame in a stop
at position 5
- p.Leu30SerfsX3 (short p.Leu30fs) denotes a frame shifting change that deletes amino acids
Leucine-30 to Cysteine-42 (exon 3 of the gene), substituting these for a Serine at the deletion junction and ending in the
new reading frame in a stop at position 3
Miscellaneous
- Two changes in one allele are described as "[first
change; second change]" (see Discussion)
- p.[Trp13X; Pro43Ala] (alternatively p.[W13X; P43A]) describes a Tryptophan-13
to stop codon and Proline-43 to Alanine change in one allele (on one chromosome)
- Two changes in one individual when unknown in
which allele are described as "[first change (+) second change]" (see Discussion)
- p.[Trp13X(+)Glu61Gln] (alternatively p.[W13X(+)E61Q]) describes a Tryptophan-13
to stop codon and Glutamine-61 to Glutamic acid change in one individual
while it is unknown whether these changes are on the same or different
alleles
- Recessive disease (one change in each allele) are are
described as "[change allele-1]+[change allele-2]" (see
Discussion)
- p.[Trp13X]+[Cys28Arg] (alternatively p.[W13X]+[C28R]) describes a Tryptophan-13
to stop codon change in one allele (chromosome) and a Cysteine-28 to Arginine change in
the other allele (chromosome)
- p.[Trp13X]+[?] (alternatively p.[W13X]+[?]) describes a Tryptophan-13 to stop
codon change in one allele (chromosome) and an unknown change in the other allele
(chromosome)
- p.[Trp13X]+[=] (alternatively p.[W13X]+[=]) describes a Tryptophan-13 to stop
codon change in one allele (chromosome) and a normal sequence in the other allele
(chromosome)
- two changes in different genes -
SGCA:p.[Arg175X]+SGCB:p.[Cys305Ser] - the SGCA gene contains a nonsense
change, the SGCB gene a Cys-to-Ser substitution
- When the effect on protein level is unknown, changes can be best described
as "p.?" (see also above)
| Top of page | MutNomen
homepage | Check-list |
| Recommendations: DNA, RNA,
protein, uncertain |
| Discussions | FAQ's | Codons
/ amino acids | History |
| Example descriptions: QuickRef / symbols,
DNA, RNA |
Copyright © HGVS 2007 All Rights Reserved
Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer |