Human Genome Variation Society Meeting

Shanghai, China

14 April 2002

Brief Meeting Report

Posted 13th May 2002

This informal meeting attracted 20-30 people during the afternoon and productive discussion occurred besides the presentation as follows:

Assessment of electronic data collection from PubMed for the establishment of Chinese Gene Variation Database (CGVdb) - Kwang-Jen Hsiao.
Dr. Hsiao presented a comprehensive account of his study of collecting Chinese mutations from the public literature. Detailed examination of the effectiveness of the protocol developed was presented.

HGVbase - Current Status - Anthony J. Brookes & Heikki Lehväslaiho.
This talk outlined the current status of HGVbase and their plans for the future. Recent developments include implimentation of genome draft coordinates and searches, annotation about duplicated genome segments and functional predictions for amino-acid variants, autovalidation of ever-changing RefSeq positions, haplotype and genotype representation, and automated mining of PubMed followed by email requests for data submission. Ongoing activities will provide flexible representation of phenotypes, a fully functional API, and standardized data exchange structures for communication with the other databases plus the HGVSYS Waystation and Office.

A Review of Locus Specific Mutation Databases - Richard G.H. Cotton.
Dick Cotton presented the data from a comprehensive analysis of LSDBs by a visiting professor in our Laboratory Prof. Mireille Claustres who independently reviewed 94 LSDBs for 80 content criteria. This revealed extreme heterogeneity between LSDBs. An outline of an ideal LSDB from the point of view of the >100 curators was shown. This work has now been published; Genome Research 12:680-688.

HGVSYS - Current Status - Saeed Teebi.
Saeed reported that political events had delayed the implementation of the WayStation but that it was only a matter of weeks away from activation when the Wiley side should also be ready.

LSDB in a box discussion.
Difficulties were seen as too many variables and requirements. It was though that it should be simple with storage and linked with other models. It should provide a dump to the database. It was suggested that a Finnish database company "Medical Bioinformatics" could be useful. It was suggested if possible that MutationView might have a role. It was concluded that the content needed to be defined, and a champion was needed and clearly, funds were required. One factor which makes an all-encompassing LSDB complex is the "view" of the data taken by Curators. Many Curators have a mutation-centric view, others a patient-centric view, some a disease-centric view, while some are interested primarily in amino acid changes and/or the alterations to expressed protein. This is probably a major factor (amongst others) limiting the widespread use systems such as MuStaR, UMDB, MutBASE, and MutationView. It was suggested that a modular approach to an LSDB in a box would be one way forward. Independent modules dealing with e.g. DNA level changes, RNA level changes, protein level changes, phenotype, and patient information would allow Curators to "pick 'n' mix" those modules suited to their particular view and for which they have data. Records within modules would be connected to appropriate records in other modules. This modular approach should simplify design and coding, and would allow changes to e.g. nucleotide numbering in one module to be independent of data on say protein outcome in another. It might also make the handling of multi-genic disorders easier as a "database" could include more than one locus. The technical implementation of such a system would need further discussion.

Collection of Mutations from Diagnostic Labs.
Dick Cotton explained that Graham Taylor and more recently others in Australia were trying to obtain funds to setup such a system for their countries that could be models for others around the world.

Nomenclature.
The main problem with nomenclature is when to start the numbering from. This is especially so before all genes and their regulatory regions have been sequenced. The informaticists clearly favour a scheme dependant on position in the gene defined at the DNA level and place it exactly whereas the biochemist and clinician are happy with the amino acid based nickname. There was agreement that a further discussion paper was needed.

The main problem with mutation nomenclature is in specifying the location of a mutation. In general, the use of several pieces of information can pinpoint a mutation accurately, but a concise method, which could be included as part of the "Summary" mutation description, would be useful. Clearly, assigning nucleotide number according to genomic DNA numbering is ideal, but what about mutations in incomplete regions of sequence? Heikki Lehvaslaiho described a system which would update the nucleotide numbering based on the most recent draft genome sequence. This would seem ideal, but until such time as the sequence stabilises, this could mean the continual renumbering of mutations, which could be problematic for those working in the field. (It also does not solve the problem for other species, which are not yet near having a complete genome sequence.) One suggestion was to have a second location system running in parallel, which would be more useful to those working on a specific locus. This, however, leads back to the original problem. Possible solutions would be:

a) allow curators to specify numbering for their loci in a "free for all" ad hoc fashion
b) devise a system based on "well known landmarks" for each locus e.g. exon/intron boundaries
c) provide a set of guidelines or good practice rules, which curators could use to develop their own ad hoc systems.

The "trivial" mutation names based on amino acid change (e.g. R240X) were not seen as a problem, as protein sequence was stable. It was suggested that these trivial names could be appended to the DNA level "summary" mutation description. There was agreement that a further discussion paper was needed.

Ethics.
Dick Cotton had presented the ethical needs and problems form a mutation database perspective to the HUGO ethics committee in the morning and presented these again to the HGVS meeting as a practical guide for curators. They were:

. Absolute obligation of confidentiality oEthical review board- For HGVS?
. Anonymization. Prior to transmission with system in place - not for publication?
. Inform donors on transmission to LSDB?- not for publications?
. No disclosure of genetic information w/o consultation. Publications?
. International standardization.
. Take vulnerable persons into account.
. Clarify purpose and limits of LSDB.
. No interactive databases ? (none now?)
. No virtual physician relationship.
. Common ethical framework from start.
. Take specific communities/cultures into account.
. Define ethical principles as well as scientific workings.
. Clinical patient centred databases?
. Baltimore lecture?
. New paper. Practical application of ethical principles to LSDBs.
. Y2K HUMU paper- see: http://www3.interscience.wiley.com/cgi-bin/abstract/68503062/START

The meeting agreed these were a good foundation for a practical guide that was based on an earlier paper by Bartha Knoppers (Y2K issue of Human Mutation).

Further developments of these principles have been sent out to HGVS members for comment, non-members are also encouraged to comment.

Copyright HUGO MDI 2000 Created by Rania Horaitis Posted 29th June 2001
Coordinator Rania Horaitis horaitis@mail.medstv.unimelb.edu.au