Nomenclature for the description of sequence variants (mutation nomenclature)

Opinions please

Last modified August, 2011

RNA editing

As guardian of the HGVS recommendations for the description of sequence variants I have received the request to indicate how to describe modifications of DNA, RNA and protein molecules. The most pressing are those to describe RNA editing and DNA methylation.

My suggestion for RNA editing, now open for comments, is to describe this as;

g.1287@ on genomic DNA level
c.143@ on coding DNA level
r.143c>u on RNA level
p.(His48Pro) on protein level (i.e. simply the predicted consequences)

So, for a nucleotide edited at RNA level on DNA describe it by adding a "@" to the position.

Based on the current recommendations the descriptions on DNA level could be g.= / c.= because nothing changes. The change is only at RNA and potentially protein level. The reason to suggest a specific description at DNA level is to make it easier to retrieve information on RNA editing from databases;

some databases use c.= to indicate the second, normal allele in dominant diseases. Searching for c.= would thus retrieve all these entries.
retrieving information with a search for g.= / c.= would retrieve all entries for which RNA editing has been reported, and not for a specific site one is interested in.
using g.= / c.= at DNA level, to allow specific searches at RNA level, we would need to make a discrimination between "DNA-based" changes (like r.143c>u) and those derived from RNA editing (e.g. r.143c@u). This seems less attractive, especially since we already have a proper nomenclature for this.
even today some database still have no fields for RNA findings, they can thus not store data derived from RNA editing.
currently most genomic information is retrieved through a DNA file (e.g. exome variant queries) and people are more likely to 'forget' to check RNA data. Consequently, the fact that a site has been changed which is known to be edited in RNA (with whatever functional consequences) would be missed.
vi) when analysing RNA based sequence data people will encounter sequence variants, check the DNA database, not find the variant (RNA editing is not marked in the database) and subsequently fail trying to confirm the variants through genomic DNA analysis. Spoiling precious resources.

RNA editing is not of one type and the change r.143c>u is probably not correct because the 'c' is not really changed to a 'u'. At some point we probably need to suggest ways to exactly describe the modification that was found but I believe we can do that later. Making such recommendations can then be combined with those for DNA modifications (like methylation with methyl or hydroxy-methyl groups) making sure they follow the same rules.

The use of the '@' character versus other characters (&, $, ~, #) is of course debatable. Another option is to use a three-letter abbreviation like 'del' and 'ins', e.g. edt (g.1287edt / c.143edt) but this is seems less attractive (longer and potentially confusing). The '@' is just there like a short footnote, indicating, 'note this site, something is happening at ('@') this position.

Best regards,

Johan den Dunnen
Human Genome Variation Society
mail to: ddunnen @ HumGen.nl