 |
Recommendations
for the description of protein sequence variants (v2.0)
|
Last modified January 14, 2013
|
Since references to WWW-sites are not yet acknowledged as citations, please mention den Dunnen JT
and Antonarakis SE (2000). Hum.Mutat. 15:7-12 when referring to these pages.
Contents
Protein level
(suggestions extending the published
recommendations
in italics)
Protein designations describe a sequence variant at amino acid level. Protein level
variants will rarely be experimentally determined, i.e. using amino acid sequencing. In
some cases indirect evidence might come from from sizing (Western blot analysis) or
localisation (immuno-histochemical staining). In most cases protein descriptions will be
deduced, predicted from the changes detected on DNA and/or RNA level.
Sequence changes at protein level are described like those at the DNA level with the
following modifications / additions;
- a "p." is used to indicate description at protein level
- amino acids are described as "Trp26" (i.e. with Capital first
letter, not as "trp26" or "Trp26")
- amino acid numbering
- the translation initiator Methionine is numbered as +1
- the protein reference sequences should represent the primary translation product,
not a processed mature protein, and thus include signal peptide sequences (see FAQ)
- amino acids originating from changes introducing upstream translation initiation are
numbered like nucleotides (like ..., Gln-2, Thr-1)
- amino acids originating from changes resulting in translation of intronic sequences are
numbered like nucleotides (like Val4+1, Ser4+2, ..., Phe5-2, Gln5-1)
- amino acids originating from no-stop changes causing translation
downstream of the translation termination codon numbered like nucleotides (like Gln*1,
Ser*2, ...)
- descriptions at protein level should describe the changes
observed on protein level and not try to incorporate any knowledge regarding the
change at DNA-level (see FAQ)
to
indicate that the description at protein level is without any experimental proof it is
recommended that, when RNA nor protein has been analysed, the description is
given between brackets (e.g. p.(Arg22Ser), see
Discussion 2012-10-12)
- protein variants
- substitution variants are described using the format p.Trp26Cys
and do not use the ">"-character (indicating "changes
to") used on DNA- and RNA level
- initiating methionine (Met1); p.0, p.Met1?,
p.Met1_Lys45del, p.Met1ValextMet-12
- nonsense; p.Trp26*
- no-stop change (stop codon); p.*110Glnext*17
- deletion variants are described using "del"
after an indication of the first and last amino acid(s) deleted separated by a "_"
(underscore); p.Lys2del, p.Cys28_Met30del
- duplication variants are described using "dup"
after an indication of the first and last amino acid(s) duplicated separated by a "_"
(underscore); p.Gly4dup, p.His7_Gln8dup
- inserstion variants are described using "ins"
after an indication of the amino acids flanking the insertion site, separated by a "_"
(underscore) and followed by a description of the amino acid(s) inserted; p.Lys2_Met3insGln,
p.Ala24_Thr25insGlnSerLys, p.Trp182_Gln183ins17
NOTE: duplicating insertions should be described as duplications (see Discussion), not as insertion
- variability of short sequence repeats are described as; p.Gln6(3_6)
- insertion-deletion variants (indels) are described using "delins"
as a deletion followed by an insertion after an indication of the amino acid(s) flanking
the site of the insertion/deletion separated by a "_"
(underscore); p.Cys28delinsTrpVal, p.Cys28_Met29delinsTrp
- frame shift variants are described using "fs"
after the first amino acid affected by the change. The description does not include the
deletion at protein level from the site of the frame shift to the natural end of the
protein (stop codon), so p.Arg97Profs*23 and not p.Arg97_Gln2309delProfs*23)
- short ("fs"); p.Arg97fs
- long ("fs*#"); p.Arg97Glyfs*16
NOTE: the shifted reading frame includes the new amino acid (Gly) and is
open for 15 amino acids
NOTE: for frame shifting insertions the inserted amino acid residues are not
described, only the total length of the new shifted frame is given (i.e. including the
inserted amino acids); p.Glu5Valfs*5 and not something like p.Glu5Valins2fs*3
- alleles are described using square brackets ("[ ]")
- for all descriptions the most 3' position possible is arbitrarily assigned to
have been changed
- general descriptions
- unknown effect
- p.? - protein has not been analysed, an
effect is expected but difficult to predict
- p.(=) - protein has not been analysed, but no change is
expected
- p.= - protein has not been analysed, RNA was, but
no change is expected
- amount of protein
changes which affect the promoter of a gene, the transcription initiation site
(cap site), the translation initiation site, etc. may affect the amount of protein
produced;
- p0 - no protein can be detected (experimental
data should be available)
- p.0? - probably no protein is produced
Silent changes
Description of so called "silent" changes in the format p.(Leu54Leu) (or
p.(L54L)) should not be used (correct is p.(=)); descriptions should be given at DNA
level (see Discussion).
Substitutions
Substitutions have the format p.Trp26Cys and do not use the ">"-character
(indicating "changes to") used on DNA- and RNA level.
- missense changes
p.Trp26Cys denotes that amino acid Tryptophan-26 (Trp, W) is changed to a Cysteine (Cys)
- nonsense changes
p.(Trp26*) indicates RNA nor protein was analysed but amino acid Tryptophan-26 (Trp, W) is
predicted to change to a stop codon (*) (alternatively p.(W26*) or p.(Trp26Ter))
- initiating methionine changes (Met1) (see Discussion, see Examples)
- p.0 - no protein is produced (experimental data should be available)
- p.Met1? - denotes that amino acid Methionine-1 (translation initiation
site) is changed and that it is unclear what the consequence of this change is
- p.Met1_Lys45del - a new translation initiation site is activated (at
Met46)
date 2012-08-31 new translation initiation site
(see Discussion)
- p.Met1ext-5 - a variant in the 5' UTR activates a new upstream
translation initiation site starting with amino acid Met-5 (Methionine -5)
- p.Met1Valext-12 - amino acid Met-1 is changed to Val
activating an upstream translation initiation site at position -12 (Methionine-12)
NOTE: recently modified from p.Met1ValextMet-12 (see Discussion)
no-stop change (substitution in stop
codon)
"ext*#" is used to indicate the extension of a protein sequence until
a new stop codon is reached at "#" amino acids downstream as a consequence of a
variant changing the natural stop codon into an amino acid.
date 2012-11-01 When
no stop codon is predicted this is indicated by "ext*?".
- p.*110Glnext*17 (alternatively p.*110Qext*17) describes a variant in the stop
codon (*) at position 110, changing it to a codon for Glutamine (Gln, Q) and adding a tail
of 17 new amino acids (incl. Gln110) to the protein's C-terminus after which a new stop
codon (*17) is reached ('codon *127').
- date 2012-11-01 p.*327Argext*?
(alternatively p.*327Rext*?) describes a variant in the stop codon (*) at position
327, changing it to a codon for Arginine (Arg, R) and adding a tail of new amino acids of
unknown length since the shifted frame does not contain a new stop codon (see FAQ).
Deletions
Deletions are described using "del" after an indication of the
first and last amino acid(s) deleted separated by a "_" (underscore).
- p.Lys2del in the sequence MKMGHQQQCC denotes a deletion of amino
acid Lysine-2 (Lys, K) to MMGHQQQCC
- p.Gln8del in the sequence MKMGHQQQCC denotes a Glutamine-8 (Gln,
Q) deletion to MKMGHQQCC
- p.(Cys28_Met30del) denotes RNA nor protein was analysed but the predicted change is a
deletion of three amino acids, from Cysteine-28 to Methionine-30
NOTE: for all descriptions the most 3' position possible is
arbitrarily assigned to have been changed
Duplications
Duplications are described using "dup" after an indication of
the first and last amino acid(s) duplicated separated by a "_" (underscore).
- p.Gly4_Gln6dup in the sequence MKMGHQQQCC denotes a duplication of amino acids Glycine-4
(Gly, G) to Glutamine-6 (Gln, Q) (i.e. MKMGHQGHQQQCC)
- duplicating insertions in single amino acid stretches (or short tandem repeats) are
described as a duplication, e.g. a duplicating HQ insertion in the HQ-tandem repeat
sequence of MKMGHQHQCC to MKMGHQHQHQCC is described as
p.His7_Gln8dup (not p.Gln8_Cys9insHisGln)
NOTE: for all descriptions the most 3' position possible is
arbitrarily assigned to have been changed
Insertions
Insertions are described using "ins" after an indication of the
amino acids flanking the insertion site, separated by a "_" (underscore)
and followed by a description of the amino acid(s) inserted. Duplicating insertions should
be described as duplications (see Discussion), not
as insertion. Since for large insertions the amino acids can be derived from the DNA
and/or RNA descriptions they need not to be described exactly but the total number may be
given (like "ins17").
- p.Lys2_Met3insGlnSerLys denotes that the sequence GlnSerLys (QSK) was inserted between
amino acids Lysine-2 (Lys, K) and Methionine-3 (Met, M), changing MKMGHQQQCC to MKQSKMGHQQQCC
- p.Trp182_Gln183ins17 describes a variant that inserts 17 amino acids between amino acids
Trp182 and Gln183
NOTE: it must be possible to deduce the 17 inserted amino acids for the
description at DNA or RNA level
Variability of short sequence repeats
Variability of short sequence repeats are described as p.Gln6(3_6); the description
indicates that a stretch of Glutamines (Gln, Q) is present, starting at amino acid
position 6 (e.g. in MKMGHQQQCC), which is found with a variable length from 3 to 6 in the
population
NOTE: the underscore is used to indicate the range (3 to 6 times).
Insertion-deletions (indels)
Insertion-deletions (indels) are described using "delins" as a
deletion followed by an insertion after an indication of the amino acid(s) flanking the
site of the insertion/deletion separated by a "_" (underscore,
see Discussion).
- p.(Cys28_Lys29delinsTrp) indicates RNA nor protein was analysed but the predicted change
is a 3 bp deletion that affects the codons for Cysteine-28 and Lysine-29, substituting
them for a codon for Tryptophan
- p.Cys28delinsTrpVal denotes a 3 bp insertion in the codon for Cysteine-28, generating
codons for Tryptophan (Trp, W) and Valine (Val, V)
Frame shifts
Frame shifting variants are described using "fs" after the first amino acid
affected by the change. Descriptions either use a short ("fs") or long
("fs*#") description. The description of frame shifts does not
include the deletion at protein level from the site of the frame shift to the natural end
of the protein (stop codon). For frame shifting insertions the inserted amino acid
residues are not described, only the total length of the new shifted frame is given (i.e.
including the inserted amino acids).
NOTE: typing error in den "Dunnen &
Antonarakis (2000)". The suggestion to use ">" to
indicate "delins" in frame shift descriptions has been retracted.
- short description - uses "fs" only, e.g. p.Arg97fs
- long description - uses "fs*#" (see Discussion)
- includes the change occurring at the site of the frame shift, e.g. p.Arg97Gly
- "fs*#" indicates at which codon position the new reading frame ends in a stop
(*). The position of the stop in the new reading frame is calculated starting at the first
changed amino acid that is created by the frame shift, and ending at the first stop codon
(fs*#), e.g. p.Arg97Glyfs*16
NOTE: the shifted reading frame is thus open for '#-1' amino acids
- Examples;
- p.Arg97Profs*23 (not p.Arg97_Thr109delinsProfs*23; short p.Arg97fs)
denotes a frame shifting change with Arginine-97 as the first affected amino acid,
replacing it for a Proline and creating a new reading frame ending in a stop at position
23 (counting starts with the Proline as amino acid 1)
- p.Glu5Valfs*5 describes a frame shifting insertion (do not use p.Glu5Valins2fs*3)
- p.(Tyr4*) indicates RNA nor protein was analysed but the predicted consequence of
the change c.12delC in the sequence ATG-GAT-GCA-TAC-GTG-ACG to ATG-GAT-GCA-TA.-G
TG-A CG is a Tyr to translation termination codon.
- p.Asp2Metfs*4 (alternatively p.Asp2fs) describes the consequence of the change
c.4delG in the sequence ATG-GAT-GCA-TAC-GTG-ACG to ATG- .AT-G CA-T
AC-G TG-A CG.
- p.Glu5Valfs*5 (alternatively p.Glu5fs) describes the consequence of the change
c.6_13dup in the sequence ATG-GAT-GCA-TAC-GAG-ATG-AGG to
ATG-GAT-GCA-TAC-GT-G CA-T AC-G AG-A TG-A
GG.
- date 2012-11-01 p.Ile327Argfs*?
(alternatively p.Ile327fs) describes the consequences of a frame shifting
change (e.g. a 1-nucleotide insertion) with Isoleucine-327 as the first affected amino
acid, replacing it for an Arginine and creating a new reading frame which does not
encounter a new stop codon (see FAQ).
NOTE: the changes observed should be described on protein level and not
try to incorporate any knowledge regarding the change at DNA-level (see Recommendation). Thus, p.His150Hisfs*10 is not
correct, but p.Gln151Thrfs*9 is.
More changes in one individual
Two or more changes in one individual are described by combining the changes, per
allele (chromosome) between brackets ("[]").
- Changes in different alleles (e.g. in recessive diseases) are
described as "[change allele 1];[change allele 2]" (see
Discussion).
- p.[(Ala25Thr)];[(Ala25Thr)] indictes RNA nor protein was analysed but the predicted
change is amino acid Alanine-25 to Threonine on both chromosomes (homozygous).
- p.[Ala25Thr];[?] denotes a change of amino acid Alanine-25 to Threonine in one allele
and an unknown change in the other allele
NOTE: "unknown change in the other allele" does not only
mean that no DNA-change was detected in that other allele but includes cases where the
consequence of a detected change is unclear or can not be predicted (e.g. the consequence
of a change at the splice site)
- p.[Ala25Thr];[=] denotes a change of amino acid Alanine-25 to Threonine in one allele
and a normal sequence (indicated by "=") in the other allele (see FAQ)
- Two variations in one allele
- deriving from two independent changes at DNA level are described as
"[first change;second change]" (see Discussion).
- p.[(Ala25Thr; Gly28Val)] indicates two predcited changes in one allele (RNA nor protein
was analysed); amino acid Alanine-25 to Threonine and Glycine-28 to Valine
- deriving from one change at DNA level that has more than one effect on
RNA/protein level are described as "[first change, second change]" (see Discussion).
- p.[Asn26His, Ala25_Gly29del] denotes two protein changes deriving from a change in one
allele at DNA level (c.76A>C) resulting in two transcripts (r.[76a>c, 73_88del] );
amino acid Asparagine-25 to Histidine and a deletion of amino acids Asparagine-25 to
Glycine-29
Two
sequence changes with alleles unknown are described as "[change 1(;)change
2]" (see Disucssion).
- p.[Ala25Thr(;)Gly794Val] denotes that two changes were identified in one individual
(amino acid Alanine-25 to Threonine and Glycine-794 to Valine), but it is not known
whether these changes are in the same allele or in different alleles
Mosaicism
is described using "/"
- p.[=/Arg83Ser] describes a mosaic organism or somatic tissue where the allele in some
cells contain the normal sequence (Arg83 described as '='), while other cells contain a
Ser at this position
Chimerism
is described using "//"
- p.[=//Arg83Ser] describes a chimeric organism where the allele in some cells contain
the normal sequence (Arg83 described as '='), while other cells contain a Ser at this
position
| Top of page | Homepage | Check-list | Symbols, codons, etc.
|
| Recommendations: DNA, RNA, protein, uncertain |
| Discussions | FAQ's | History |
| Example descriptions: QuickRef, DNA, RNA, protein |
Copyright © HGVS 2010 All Rights Reserved
Website Created by Rania Horaitis, Nomenclature by J.T. Den Dunnen - Disclaimer |