Uncertainties in the description of sequence variants

Coping with uncertainties in the description of sequence variants

Last modified April 14, 2015

NOTE: this website is frozen since May 1, 2016. It has been replaced by a new version at http://www.HGVS.org/varnomen. These pages serve as archival copy only.

Summary
Introduction
- NOTE - duplications
Subjects

Summary

This page discusses how to cope with uncertainties in the description of sequence variants and where available we link to existing recommendations. Do you agree / disagree?, did we miss cases?, do you want to make suggestions?; please contact us by E-mail (to: HGVSmn @ JohanDenDunnen.nl) or use the HGVS-mutnomen Facebook.

Introduction

Often clear changes can be detected in the genome, but a precise description in relation to the genomic sequence is not possible. Examples include cytogenetically detectable changes, changes detected using Fluorescence In Situ Hybridisation (FISH) and changes detected using array technology. Other examples are cases were genomic deletions / duplications are detected indirectly using RNA analysis; although the change can be described exactly at RNA-level, the genomic sequence spanning the break point of the rearrangement is required to describe the change at DNA-level. Finally there are examples, mostly from older publications, where the changes detected are not described precisely ("an 11 nucleotide deletion in exon 3") or uniquely ("changing amino acid Gly-17 to Arg").

In many diseases changes are found which delete or duplicate (sets of) whole exons and thereby seriously affect the function of the gene. These changes are detected with technologies like PCR, Southern blotting, MAPH and MLPA. The analysis however does not reveal the breakpoints at a molecular level. Incorporation of these variants in sequence variation databases is very important, most importantly to be able to determine their frequency. In addition, e.g. in Duchenne/Becker Muscular Dystrophy (DMD/BMD), the breakpoints can be predictors for the severity of the disease: a truncated reading frame causes DMD, an open reading frame BMD. There is a clear demand to be able to include such changes in catalogs of all DNA sequence variants found.

It is clear that the goal should be to describe these variants as precise as possible. However, in real life this is often not possible and one should wonder whether a lot of detail adds useful information. The deletion of genomic sequences can be detected using several different methods, incl. FISH, Southern blot, quantitative PCR, MAPH and MLPA. One could argue that - to be as precise as possible - the description of the deletion should include information of the probe sequences used. Using some methods, the presence of a probe signal only indicates that a (large) part of the probe is present (e.g. BAC-derived FISH signals). In PCR-based methods the absence of a signal does not mean that the entire target sequence is missing; PCR results are negative when only one of the two primer annealing sites is deleted or mutated (see FAQ).

In MLPA were a pair of short oligonucleotides is used to detect the copy number of a speficic sequence (mostly an exon). Effectively, when the signal for a probe is decreased, it only indicates that the pair of oligo nucleotide sequences (20-30 nucleotides) do not both hybridize to the target sequence and can therefore not be ligated, extended and detected. It is common practice (after excluding other variants when only 1 exon probe is affected) to describe the result as a change affecting the entire set sequences (exons), like "a deletion of exons 23-27" or "c.3163-?_3786+?del". When indeed the location of the probe would need to be included the description becomes too complex to be useful. Furthermore, assuming the probe hybridizes from position c.3211 to c.3236, which location should be taken?, only part of the target sequence might be deleted.

Main recommendation
To indicate uncertainties in the description of sequence changes the question mark ("?") and brackets ("()") are used. When the exact position of a change is not known, the range of the uncertainty is listed between brackets (like (5' border_3' border)) and one should describe the change on DNA-level as precise as possible. When it is difficult to give an exact nucleotide position for a specific probe/sequence tested, a rule of thumb is to use the central nucleotide.

Based on these considerations the description of a deletion has the format: (last-positive_first-negative)_ (last-negative_first-positive). Details of the description are based on the technology used to detect the change:

microscopy - chromosome banding
FISH - genomic coordinates (probe names)
arrays - genomic coordinates (probes, SNP IDs)
MLPA - coding DNA/genomic coordinates (exons)
PCR - coding DNA/genomic coordinates

For further details see below.

NOTE

For duplications the same recommendations hold, except that duplications are designated by "dup" in stead of "del": (last-positive_first-negative)_ (last-negative_first-positive)dup (see Recommendations). It should be noted however that the description "dup" may by definition (see Standards) only be used when the additional copy is located directly 3'-flanking the original copy (a tandem duplication). In most cases there will be no experimental proof, one simply detects the presence of an additional copy that can be anywhere in the genome (inserted / transposed). Discussions are ongoing how to include this uncertainty best in the description (see Recommendations). It should be clear though that describing a duplication like c.3163-?_3786+?dup in general is not correct.

Subjects

Incomplete descriptions

When the exact position of a change is not known, the range of the uncertainty is listed between brackets ("()", see Recommendation). Similarly, when insertions have not been specified (e.g. "ins5") or when an insertion was not sequenced but its length estimated (e.g. from gel electrophoresis), brackets are used to indicate the uncertainty.

Examples

c.(67_70)insG (p.Gly23fs) indicates the insertion of a G at an unknown position in sequence of amino acid codon 23
g.11_12ins(100) (alternatively c.11_12insN[100]) indicates the insertion of about 100 nucleotides between position g.11 and g.12

g.11_12ins(1) (alternatively g.11_12insN) indicates the insertion of one not specified nucleotide (N) between position g.11 and g.12
g.11_12ins(5) (alternatively g.11_12insNNNNN) indicates the insertion of 5 not specified nucleotides (NNNNN) between position g.11 and g.12

p.(fs*) is the best description to describe at protein level the consequence of c.577_578ins(4) (with the 4 nucleotides inserted not specified)
c.(165_253)del11 indicates an 11 nucleotide deletion in exon 3 (NOTE: in relation to the coding DNA reference sequence, exon 3 is located from position 165 to 253)

c.(1521_1524)del3 is the description that should be connected to the "deltaF508" (p.(Phe508del)) variant often found in patients with Cystic Fibrosis when no DNA data are given. The sequence surrounding codon Phe-508 in the CFTR gene is ..-ATC-TTT-GGT-... (c.1519 to c.1527) and three different deletions (TC-T, C-TT and -TTT-) would give the reported change at protein level. Adhering to the 3' rule for variant descriptions (see Recommendations) this goes with 2 different changes at DNA level: c.1521_1523del and c.1522_1524del. When one assumes the change at DNA level is c.1522_1524delTTT, deletion of exactly the Phe-508 encoding triplet, one is wrong. The reported change is mostly c.1521_1523delCTT. So, without specification in the manuscript one can not be certain.

Exonic deletions / duplications

The description should use the basic format (last-positive_first-negative)_ (last-negative_first-positive), be based on the reference sequence used and include the position of the most extreme region(s) tested (e.g. segment PCR-ed, probes used for hybridisation, etc.).

For clarity and to make descriptions form specific tests, e.g. MLPA, not too complicated, it is allowed to describe changes assuming that when a probe for a specific exon scores deleted (duplicated) that the entire exon is deleted (duplicated), i.e. detailed knowledge regarding the exact location of the probe sequence used is not used in the description.

NOTE: it should be clearly indicated which technology was used (MLPA, PCR, etc.) and where primer/probe sequences can be found. For rearrangements affecting 1-exon only it should be indicated whether DNA sequencing was performed to exclude variants affecting the primer/probe target sequences. When different probes for one exon score different, this information must be used in the description of the sequence change (see FAQ).

Deletions are designated by "del" after an indication of the first and last nucleotide(s) deleted (see Recommendations).

Examples

date 2012-11-01c.(87+1_88-1)_(300+1_301-1)del (alternatively c.88-?_301+?del) denotes the deletion of exons 3 to 4, starting at an unknown position in intron 2 (c.87+1_88-1, downstream of c.87 (the last nucleotide of exon 2) but upstream of c.88 (the first nucleotide of exon 3)), and ending at an unknown position in intron 4 (c.300+1_301-1).
NOTE: the original suggestion was to describe this change as c.88-?_301+?del. However, this description is confusing because it can not be discriminated from c.88-?_301+?del where probes were not tested upstream of c. 88 or downstream of c. 301. Only when one knows the details, e.g. that an MLPA test was used that did include probes up- and downstream, one would understand the description which is not acceptable.
c.(?_-30)_(12+1_13-1)del describes a deletion starting somewhere upstream from the 5' end of a gene (located at coding DNA nucleotide -30) and ending in the intron between coding DNA nucleotides 12+1 and 13-1 (intron 1).
c.(?_-1)_(*1_?)del is a standard way to describe the deletion of the entire protein coding region of a gene (coding DNA reference sequence).
NOTE: when more details are available regarding the deletion, based on the probes tested to determine its location, the description can be specified like c.(?_-189)_(*884_?)del, meaning the deletion starts 5' of c.-189 and extends to beyond c.*884 in the 3'UTR.

NOTE: when regions outside the gene have been tested and found to be present / absent this can be indicated in a "Remarks" field. Alternatively, one can determine what the position of the genomic sequences tested is and use these positions in the description (e.g. position of oligonucleotide used in PCR, MLPA, SNParray, and/or of the probe used for hybridisation).

FISH-detected rearrangements

Many chromosomal rearrangements, especially in the past, have been detected using techniques like Southern blotting and Fluorescence In Situ Hybridisation (FISH). The description of these changes was often in tabular or graphical format based on either position-ordered probe names or relative chromosomal positions. Especially when FISH was used, relatively little or often even no actual DNA sequence of the probe(s) used was known. In those days, even when a probe sequence was known, this information was of little help to determine the probe location more precisely.

With the availability of the reference human genome sequence, this situation has dramatically changed. Now, any piece of DNA probe sequence can be used to position that probe with great precision on the human genome map. In addition, the clones used to generate the human genome sequence are freely available and have become the preferred probes for new FISH experiments. The latter is especially true for genome-wide array-CGH experiments using ordered PAC/BAC-clones. As a consequence of these developments it is now possible to describe these changes based on DNA sequences. NOTE: see Discussion.

Following the recommendation to describe rearrangements using the format (5' border_3' border) for FISH probes this becomes (last-positive-clone_first-negative-clone)_ (last-negative-clone_first-positive-clone).

Examples

chrX:g.(AC096506.5_AL109609.5)_(AL451144.5_AL050305.9)del describes a genomic deletion on the X-chromosome detected using FISH. The deletion spans from PAC probes RP4-556A22 (GenBank AL109609.5) to RP11-151J4 (GenBank AL451144.5), both yielding no signal. On the telomeric side (p-arm) the closest probe tested positive was PAC RP11-64I1 (GenBank AC096506.5), on the centromeric side the closest probe tested positive was RP6-60B16 (GenBank AL050305.9).
NOTE: a description like (AC096506.5_AL109609.5)del indicates that the breakpoint of the deletion is somewhere between these two sequences (here PAC-derived) and gives a direct link to the human genome sequence.
When the genomic positions of these sequences are known, the deletion can also be described directly in relation to the genome sequence as hg19 chrX:g.(32218983_32238146)_(32984039_33252615)del, i.e. (genomic-end-position-last-positive-clone_genomic-start-position-first-negative-clone)_ (genomic-end-position-last-negative-clone_genomic-start-position-first-positive-clone). Although this description is more precise in relation to the genome sequence, the information regarding the probe sequences used can not be derived from the description and needs to be reported separately. Note the addition of "hg19" to indicate the reference genome build used for the description.

chrX:g.(?_AL109609.5)_(AL451144.5_?)del describes a genomic deletion on the X-chromosome detected using FISH. The deletion spans from PAC probes RP4-556A22 (GenBank AL109609.5) to RP11-151J4 (GenBank AL451144.5), both yielding no signal. No flanking positive probes were tested, making it unclear how far the deletion extends (compare with previous description).

chrX:g.(AC096506.5_AL109609.5)_AL451144.5del describes a genomic deletion on the X-chromosome detected using FISH. The deletion spans from PAC probes RP4-556A22 (GenBank AL109609.5) to RP11-151J4 (GenBank AL451144.5). On the telomeric side (p-arm) the closest probe tested positive was PAC RP11-64I1 (GenBank AC096506.5), while RP4-556A22 (GenBank AL109609.5) tested negative. On the centromeric side the probe RP11-151J4 (GenBank AL451144.5) gave a reduced signal, indicating that the breakpoint lies inside this clone (note that this identifier is not between brackets).
An alternative description is hg19 chrX:g.(AC096506.5_AL109609.5)_AL451144.5:g.(1_100207)del where AL451144.5:g.(1_100207) indicates that the range where the deletion junction lies spans nucleotides 1 to 100,207. One can argue whether this addition is very informative. Note also the addition of "hg19" to indicate the reference genome build used for the description.

Array-detected rearrangements

Basically, chromosomal rearrangements and other DNA sequence variants detected using array technology can, based on the array-probe sequences used, be described as those for FISH-detected rearrangements (see FISH-detected rearrangements). An advantage here is that the array probes used are often exactly defined, being mostly relatively short 20-60-mer oligonucleotide sequences. This information can be used to exactly describe the rearrangements at the nucleotide level.

For deletions the basic format is (last-probe-present_first-probe-deleted)_(last-probe-deleted_first-probe-present).

Examples

hg19 chrX:g.(32218983_32238146)_(32984039_33252615)del describes a deletion on the X chromosome, based on reference genome build hg19, starting between nucleotides 32,218,983 to 32,238,146 and ending between nucleotides 32,984,039 to 33,252,615.

hg19 chrX:g.(?_32238146)_(32984039_?)del describes a deletion on the human X chromosome, based on reference genome build hg19, starting upstream of nucleotide 32,238,146 and ending downstream of nucleotide 32,984,039.
NOTE: the description leaves it thus unclear how far the deletion might extend, suggesting no up- or downstream probes were tested (and scored positive).

g.(rs2342234_rs3929856)_(rs10507342_rs947283)del (alternatively hg19 chr13:g.(18,858,133_18,867,056)_(24,517,730_24,531,502)del) describes a genomic deletion on chromosome 13 detected using a SNP-array. The deletion spans from dbSNP entries rs3929856 to rs10507342, both yielding no signal. On the centromeric side (q-arm) the closest probe tested positive was rs2342234, on the telomeric side the closest probe tested positive was rs947283.
NOTE: although the alternative description based on the human genome nucleotide numbering is more precise, the information regarding the probe sequences used is lost.

g.(?_rs3929856)_(rs10507342_?)del (alternatively hg19 chr13:g.(?_18,867,056)_(24,517,730_?)del) describes a deletion detected using an array spanning from dbSNP entries rs3929856 to rs10507342. The description assumes no flanking positive probes were tested, making it unclear how far the deletion extends.

Cytogenetic rearrangements

A nomenclature system to describe cytogenetically detectable rearrangements has been suggested early on (see ISCN 1985). Current recommendations in this areas are made by the "Standing Committee on Human Cytogenetic Nomenclature" and were published recently as ISCN 2013.

Two changes in one individual

When two sequence changes are found in one gene of an individual but it is unknown whether they are located on the same or on different chromosomes, the change is described using the format c.[76A>C(;)483G>C] (see Recommendations).

Coping with uncertainties in the description of sequence variants

Last modified April 14, 2015

NOTE: this website is frozen since May 1, 2016. It has been replaced by a new version at http://www.HGVS.org/varnomen. These pages serve as archival copy only.

Contents

Summary

Introduction

NOTE

Subjects

Incomplete descriptions

Examples

Exonic deletions / duplications

Examples

FISH-detected rearrangements

Examples

Array-detected rearrangements

Examples

Cytogenetic rearrangements

Two changes in one individual