 |
Description
of sequence changes:
examples protein-level
|
Last modified May 08, 2009
|
Since references to WWW-sites are not
yet acknowledged as citations, please mention den Dunnen JT
and Antonarakis SE (2000). Hum.Mutat. 15:7-12 when referring to these pages.
Contents
Introduction
Within this page examples will be given for the description of sequence variants on protein
level, examples to describe changes at DNA and RNA level are given at other pages. All examples are described
relative to a reference sequence, here the amino acid (protein)
sequence.
Reference sequence
| Part of gene |
nucleotide numbering
protein
Reference Sequence |
nucleotide numbering
coding DNA
Reference Sequence |
nucleotide numbering
genomic
Reference Sequence |
| 5' gene flanking region |
- |
(-300 to -31) |
1 to 270 |
| exon 1 |
5' UTR |
- |
-30 to -1 |
271 to 300 |
| coding region |
1 to 4 |
1 to 12 |
301 to 312 |
| intron 1 |
- |
12+1 ... 12+50,
13-50 ... 13-1 |
313 to 412 |
| exon 2 |
5 to 29 (30) |
13 to 88 |
413 to 488 |
| intron 2 |
- |
88+1 ... 88+100,
89-100 ... 89-1 |
489 to 689 |
| exon 3 |
30 to 41 |
89 to 123 |
689 to 723 |
| intron 3 |
contains rare alternatively spliced exon from 800 to 859 (coding DNA
123+77 to 123+136) |
- |
123+1 ... 123+150,
124-150 ... 124-1 |
724 to 1023 |
| exon 4 |
42 to 100 |
124 to 300 |
1024 to 1200 |
| intron 4 |
- |
300+1 ... 300+200,
301-200 ... 301-1 |
1201 to 1600 |
| exon 5 |
coding region |
101 to 109 |
301 to 330 |
1601 to 1630 |
| 3' UTR, containing a (CA)7-stretch from nts 1700 to 1713
(coding DNA *71 to *83); poly-A addition site at 1825 (coding DNA *195) |
- |
*1 to *220 |
1631 to 1850 |
| 3' gene flanking region |
- |
(*221 to *370) |
1851 to 2000 |
Legend:
Reference sequence of imaginary gene used for the exaples given on this page. Nucleotide
+1 in the coding DNA reference sequence is the A of the ATG translation initiation codon.
Abbreviations used: nt = nucleotide, nts = nucleotides, UTR = untranslated region of the
mRNA. For a picture of part of this hypothetical sequence see
Figure.
General
It should be noted that the descriptions at protein level, even more than those at RNA
level, are mostly deduced and not based on experimental evidence.
Publications describing changes at protein level should make it clear whether experimental
proof was available or not. In fact, when changes are reported for which experimental
proof is not available one should consider to list them between brackets.
Sequence changes at protein level are basically described like those at the DNA level, with a few modifications;
- the three-letter amino acid code is prefered (see Discussion),
with "X" designating a translation termination codon;
for clarity we this page describes changes using the three-letter amino acid
- amino acid numbering
- descriptions start with the amino acid, followed by its number (like p.Cys24 or p.C24)
- the translation initiator Methionine is numbered as +1
- amino acids after the stop codon (here after X110) are numbered relative to the stop
codon as Gly110+1, Trp110+2, etc.
- "silent" changes
description of so called "silent" changes in the format p.Leu54Leu (or p.L54L)
is not allowed; descriptions should be given at DNA level, it is non-informative
and not unequivocal (there are five possibilities at DNA level which may
underlie p.Leu54Leu); correct description has the format c.162C>G.
Substitutions
Substitutions can be described without using the specific ">"-character
which is used on DNA and RNA level (i.e. p.Trp26Cys in stead of p.Trp26>Cys).
- translation initiation site (Methionine codon, see Discussion)
example: at DNA ATG (Met) changes to CTG (Leu) - description depends on the
consequences of the change on the translation product (protein) produced;
- effect unknown (most cases) - p.0? (alternatively p.Met1?)
- no protein produced - p.0 (experimental data should be available)
- new upstream translation initiation site - p.[Met1extMet-8; Met1Leu]
(alternatively p.[M1extM-8; M1L])
NOTE: effectively this is an insertion (of Met-8 to Thr-1 between Thr-1 and
Met1) combined with an amino acid change (Met1Leu)
- new downstream translation initiation site - p.Met1_His15del (alternatively
p.M1_H15del)
NOTE: effectively this is a deletion
- one amino acid to
- another amino acid (missense change)
Tryptophan 26 to a Cysteine; p.Trp26Cys (alternatively p.W26C)
- a stop codon (nonsense change)
Tryptophan 26 to a stop codon; p.Trp26X (alternatively p.W26X)
- translation termination site (stop codon, nonstop change)
- p.X110SerextX*17 (alternatively p.X110SextX*17)
the stop codon (X) at position 110 is changed to a codon for Serine (Ser, S), adding a
tail of 16 new amino acids to the protein's C-terminus after which a new stop codon is
reached
NOTE: effectively this is a substitution (X110Ser) combined with an
insertion (from Gly*1 to the new stop codon X*17)
NOTE: polymorphic variants are sometimes
described as p.36Leu/Ile (p.36L/I) or p.36Leu/Leu (p.36L/L) but this is not correct (see Discussion).
Deletions
Deletions are designated by "del" after a description of the
deleted segment, i.e. the first (and last) amino acid(s) deleted.
- MKLGHQQQCC to M_LGHQQQCC is described as p.Lys2del
(alternatively p.K2del)
- MKLGHQQQCC to MKL___QQCC is described as p.Gly4_Gln6del
(alternatively p.G4_Q6del)
- MKLGHQQQCC to MKLGHQQCC is described as p.Glndel (p.Q8del)
NOTE: for deletions in single amino acid stretches or tandem repeats, the
most 3' residue is arbitrarily assigned to have been deleted
- if a deletion creates a new amino acid at the deletion junction the change is described
as a insertion/deletion (see indels)
Duplications
Duplications are designated by "dup" after a description of the
duplicated segment, i.e. the first (and last) amino acid(s) duplicated.
- MKLGHQQQCC to MKLGHQGHQQQCC is described as p.Gly4_Gln6dup
(alternatively p.G4_Q6dup)
- MKLGHQQQCC to MKLGHQQQQCC is described as p.Gln8dup (alternatively
p.Q8dup)
NOTE: for duplications in single amino acid stretches or tandem repeats, the
most 3' residue is arbitrarily assigned to have been duplicated
- MKLGHQQQCC to MKLGHQHQQQCC is described as p.His5_Gln6dup (alternatively
p.H5_Q6dup)
NOTE: duplicating insertions in single amino acid stretches (or short tandem
repeats) should be described as a duplication and not as an insertion - so in the example
shown p.Gln6_Gln7insHisGln (alternatively p.Q6_C7insHQ) is not correct
- variability of short sequence repeats are designated as p.Gln6(3_6)
(alternatively p.Q6(3_6)) describing that the Glutamine (Gln, Q) stretch starting at
position 6 in MKLGHQQQCC is found repeated 3 to 6 times in the population
Insertions
Insertions are designated by "ins" after a description of the
amino acids flanking the insertion site, followed by a description of the inserted amino
acids. When the insertion is large it may be described by its length (e.g. p.K2_L3ins34)
but the inserted sequence should be described in detail in a footnote or as the accession
number of the sequence as submitted to a sequence database (Genbank, EMBL, DDJB).
Duplicating insertions should be described as duplications (see
Discussion).
- MKLGHQQQCC to MKQSLGHQQQCC is described as p.Lys2_Leu3insGlnSer
(alternatively p.K2_L3insQS)
- insertion of a 345 nucleotide sequence in intron 3; the sequence of the insertion need
to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession numbers
should be given
When an insertion creates a new amino acid at the insertion junction the change is
described as an insertion/deletion (see indels)
Translocations
Translocations at protein level occur when a translocation at DNA level leads to the
production of a fusion protein, joining the N-terminal end of the protein on one
chromosome to the C-terminal end of the protein on the other chromosome (and vice versa).
No recommendations have been made sofar to describe protein translocations.
- t(X;17)(DMD:p.Met1_Val1506; SGCA:p.Val250_X387) describes a fusion protein resulting
from a translocation between the chromosomes X and 17; the fusion protein contains an
N-terminal segment of DMD (dystrophin, amino acids Methionine-1 to Valine-1506), and a
C-terminal segment of SGCA (alpha-sarcoglycan, amino acids Valine-250 to the stop codon at
387)
Complex rearrangements
Complex rearrangements are rearrangements which consist of several different types of
the six elementary content changes substitution, deletion, duplication,
insertion, inversion and translocation. Such rearrangements can be very complex and
difficult to describe. Specific recommendations to describe such changes have not made.
Complex rearrangements can be best described as a combination of the elementary
changes.
Deletion/insertions (indels) are described as a deletion
followed by an insertion (see Discussion)
- p.Cys28_Lys29delinsTrp (alternatively p.C28_K29delinsW) denotes a 3 bp deletion
affecting the codons for Cysteine-28 and Lysine-29, substituting them for a codon for
Tryptophan
- p.Cys28delinsTrpVal (alternatively p.C28delinsWV) denotes a 3 bp insertion in the codon
for Cysteine-28, generating codons for Tryptophan (W) and Valine (V)
- frame shift changes (see
Discussion) cause, from a specific point onwards, the replacement of the normal
C-terminal end of a protein for a new segment, encoded by the shifted reading frame. Frame
shift changes can thus be best considered as deletion/insertions (indels). Frame shifts
are designated by "fs" after the amino acid(s) affected by the
change. Descriptions either use a short ("fs" only) or long
("fsX#") notation; the long description should include the change occurring at
the site of the frame shift (see Discussion). In
"fsX#", "X#" indicates at which codon position the new reading frame
ends in a stop codon (X). The position of the stop in the new reading frame is calculated
starting at the first amino acid that is changed by the frame shift, and ending at the
first stop codon (X#). Thus, effectively, on DNA/RNA level the change might be one or more
coding triplets up- or downstream.
NOTE: the shifted reading frame is thus open for '#-1' amino acids.
- p.Arg97ProfsX23 (short p.Arg97fs) denotes a frame shifting change with Arginine-97 as
the first affected amino acid, changing into a Proline and the new reading frame ending in
a stop at position 23
- p.Arg97HisfsX5 (short p.Arg97fs) denotes a frame shifting change deleting amino
acids Arginine-97 to Tryptophan-99, changing into a Histidine at the deletion junction and
ending in the new reading frame in a stop at position 5
- p.Leu30SerfsX3 (short p.Leu30fs) denotes a frame shifting change that deletes
amino acids Leucine-30 to Cysteine-42 (exon 3 of the gene), substituting these for a
Serine at the deletion junction and ending in the new reading frame in a stop at position
3
Miscellaneous
- Two changes in one allele are described as "[first
change; second change]" (see Discussion)
- p.[Trp13X; Pro43Ala] (alternatively p.[W13X; P43A]) describes a Tryptophan-13 to
stop codon and Proline-43 to Alanine change in one allele (on one chromosome)
- Two changes in one individual when unknown in which allele are
described as "[first change (+) second change]" (see
Discussion)
- p.[Trp13X(+)Glu61Gln] (alternatively p.[W13X(+)E61Q]) describes a Tryptophan-13 to
stop codon and Glutamine-61 to Glutamic acid change in one individual while it is unknown
whether these changes are on the same or different alleles
- Recessive disease (one change in each allele) are are
described as "[change allele-1]+[change allele-2]" (see
Discussion)
- p.[Trp13X]+[Cys28Arg] (alternatively p.[W13X]+[C28R]) describes a Tryptophan-13 to
stop codon change in one allele (chromosome) and a Cysteine-28 to Arginine change in the
other allele (chromosome)
- p.[Trp13X]+[?] (alternatively p.[W13X]+[?]) describes a Tryptophan-13 to stop codon
change in one allele (chromosome) and an unknown change in the other allele (chromosome)
- p.[Trp13X]+[=] (alternatively p.[W13X]+[=]) describes a Tryptophan-13 to stop codon
change in one allele (chromosome) and a normal sequence in the other allele (chromosome)
- two changes in different genes -
SGCA:p.[Arg175X]+SGCB:p.[Cys305Ser] - the SGCA gene contains a nonsense change, the SGCB
gene a Cys-to-Ser substitution
- When the effect on protein level is unknown, changes can be best described
as "p.?" (see also above)
| Top of page | MutNomen
homepage | Check-list |
| Recommendations: DNA, RNA, protein, uncertain |
| Discussions | FAQ's | Codons / amino acids | History |
| Example descriptions: QuickRef / symbols,
DNA, RNA |
Copyright HGVS 2007 All Rights Reserved
Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer |