NEWSLETTERS

NEWSLETTER OF THE
HUGO MUTATION DATABASE INITIATIVE
SPONSORED BY THE MARCH OF DIMES (U.S.A)
No.4 May 1998

INSIDE THIS ISSUE:

NEW WORKING GROUPS-CALL FOR VOLUNTEERS
CORRESPONDENCE
ESTABLISHMENT OF NEW LSDB-PROGRESS
CALENDAR OF EVENTS
USEFUL CONTACTS

New working Groups

Six new working groups have recently been established. Anybody interested in joining any of these groups (who has not already put their name on the list) please contact Rania (horaitis@ariel.ucs.unimelb.edu.au) The groups are as follows:

Call for Volunteers

A. Copyright and Intellectual Property
B. Patient Aspects of Databases.
C. Ethnic and National Databases.
D. Polymorphisms/SNP's.
E. Quality Control and Peer Review.
F. Spectral Databases

The members volunteering so far (to be updated monthly) may be viewed elsewhere on the website: List of Volunteers

Turin Meeting

A short database meeting was held in Turin on the 27th March, there were 69 registrants. The 4 established working groups gave their reports and the 6 new working groups (above) were approved. A more detailed report will be distributed soon.

Database Lists

A list of databases available on the web is now complete. This will be made available on the MDI and HUGO websites in the next couple of weeks.

CORRESPONDENCE, NOTES & NOTICES

Mutability Web Program

I have written a very simple web-based program called "Mutability" which will take a DNA sequence (typed or pasted into a web form) and determine how many single point mutations would result in nonsense, missense, or neutral mutations. It then displays the sequence with potential stops and CpGs highlighted. The URL is: http://www.hgu.mrc.ac.uk/Softdata/Mutability/ Instructions and a more detailed description are at: http://www.hgu.mrc.ac.uk/Softdata/Mutability/muthelp.htm It was written to satisfy a specific need here at MRC HGU, but it may be of use to others.

Dr Alastair Brown MRC Human Genetics Unit, Edinburgh EH4 2XU, UK
Alastair.Brown@hgu.mrc.ac.uk, Fax: +44 (0)131 343 2620

Note from EBI.

An article describing the federated mutation database system at EBI has just come out in Trends in Genetics May issue (1).

With permission from the publisher, an HTML version of the article is available at: http://www2.ebi.ac.uk/mutations/publications/TIG98.html

I'd like to take this opportunity to thank all database curators who have made their data available and all people involved in Mutation Database Initiative. Without MDI this work would not have been possible.

1. Lehväslaiho H, Ashburner M & Etzold T: Unified access to mutation databases. Trends in Genetics 14:205-206, 1998.

Heikki Lehvaslaiho EMBL Outstation, EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge. CB10 1SD, UK.
Fax: 44-(0)-1223-494-468, Email: heikki@ebi.ac.uk

MutRes- Searchable Mutation Resources List

The number of databases in the searchable Mutation Resources list, MutRes, has reached a round number, 100, so I thought this might be a good time update the ftp server copy.

#Mutation Resources List (MutRes), version 1.3 980406
Name Mutation Resources List
Acc M0000000
Type database
Contact Heikki Lehvaslaiho
Address EMBL Outstation, European Bioinformatics Institute
Address Wellcome Trust Genome Campus, Hinxton
Address Cambs CB10 1SD, United Kingdom
Email heikki@ebi.ac.uk
Fax +44 (0)1223 494 644
Phone +44 (0)1223 494 468
Size 125 entries, 100 databases
URL
http://srs.ebi.ac.uk:5000/srs5bin/cgi-bin/wgetz?-
fun+Pagelibinfo+-info+MUTRES
URL ftp://ftp.ebi.ac.uk/pub/databases/mutres/
Update 19971024
//
#
#
# Version history
#
# 1.2 971024 105 entries, 85 databases
# 1.3 980406 125 entries, 100 databases
#
Heikki Lehvaslaiho
EMBL Outstation, EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge.
CB10 1SD, UK. Fax: 44-(0)-1223-494-468, Email: heikki@ebi.ac.uk

Establishment of new LSDB-Progress

Many genes have been selected regarding the creation and curation of a lsdb. Of the genes chosen, for example, 48% have greater than 15 mutations listed in OMIM, 46% have greater than 7 and less than 15 and 6% have less than 7. These figures correspond to an expected quarter the number of mutations in a lsdb. For example, OMIM lists 119 mutations for CFTR, the Cystic Fibrosis db currently lists 770. A total of 81 potential database curators have been contacted so far. These people have been researched through OMIM, the journal Human Mutation, Scriver et al's The Metabolic and Molecular Bases of Inherited Disease and suggestions from MDI members. Half of these have needed re-contacting as we had received no reply. To date we have received 45 replies and have encouraged the formation of 41 new lsdbs. These groups are now active in various stages from collecting data to putting finishing touches on their web-based databases. We are now about to embark on a campaign to contact new possible curators for those genes where we have received no reply from potential curators or other genes with lesser numbers of mutations in OMIM not yet contacted.

If you are interested in creating a database for your gene of interest and have not already been contacted, or have not replied to our queries please let us know.

R.Horaitis
Co-Ordinator MDI
Mutation Research Centre

Membership of HUGO

If you are not already a member of HUGO please consider becoming one. This will ensure a membership not only interested in key issues arising out of genomics, such as variation. Membership forms can be obtained from HUGO. hugo@hugo-europe.org.uk

CALENDAR OF EVENTS

HUGO MUTATION DETECTION TRAINING COURSE.

PLACE: Hinxton Hall, Cambridge, UK.
TIME: 4-7 September, 1998
ORGANISERS:
Graham Taylor (UK)
Richard G. H. Cotton (Australia)
Ulf Landegren (Sweden)
Mirelle Claustres (France)
Restricted to 50 applicants.
FOR MORE DETAILS SEE:
http://www.leedsdna.demon.co.uk/mutation98.htm

TO REGISTER: CONTACT HUGO
Fax: 44-171-935-8341
Email: hugo@hugo-europe.org.uk

HUGO MUTATION DATABASE INITIATIVE MEETING

PLACE: Denver, Colorado, U.S.A
TIME: 8.00AM-19.00PM, OCTOBER 27TH 1998
ORGANISERS:
R. HORAITIS (AUSTRALIA)
R.G.H. COTTON (AUSTRALIA)
TO REGISTER: CONTACT R. HORAITIS)
EMAIL: horaitis@ariel.ucs.unimelb.edu.au
FAX: 61-3-9288-2988)

HUGO MUTATION DATABASE INITIATIVE
NEWSLETTER NO. 3, MARCH, 1998

1. Alliance Working Group Report:

· The HUGO Mutation Database Meeting in Turin will be held from 3.00pm to 6.30pm 27th of March, in the Dublin Room of the Lingotto Conference Centre. You are all invited to attend. If interested please contact Liz Evans at HUGO: hugo@hugo-europe.org.uk

· A listing of locus-specific databases and related information acknowledging their curators has been completed. This will be placed on the initiative's website soon with links to lsdb's.

· Rania Horaitis was asked to speak about the initiative at the 20th Annual Conference on the Organisation and Expression of the Genome at Lorne, Australia. There was much enthusiasm and support from the attendees.

· Six new Working Groups were suggested at the Baltimore meeting. It is now time to formally establish these groups. Potential Chairs have been asked and when these are finalised a call for members of these groups will be made. These groups are; Quality Control/Peer Review, Patient component, Copyright and intellectual property, Ethnic/National databases, Polymorphisms, and Mutation spectrum databases. However, interested parties may contact Rania now at : horaitis@ariel.ucs.unimelb.edu.au

2. Report from Content and Software Working Group (Charles Scriver convenor):

o Seven members of the working group attended the HUGO MDI meeting in Baltimore, October 1997 at which time additional participants were recruited to the working group. A number of salient issues were identified at this meeting.

o Drafts of the guideline document have been written and distributed to over 30 participants and readers. The document builds on a preceding guideline document on nomenclature of mutations. The new document describes the rationale for having mutation databases, identifies the difference between genomic and locus-specific databases, further identifies the relevance of having mutation databases created separately from databases for genetic diseases (and patients) and databases for genomic research. The guideline document identifies the components of content (entities and attributes) and how they are accommodated by the techniques of informatics. Several different structures can be developed and various ways of structuring mutation data are identified. The software to map a database into an informatics system and to deploy it constitutes the third part of the report.

o The document was initially sent to approximately 15 people. Replies were received from 8; two persons gave strong advice which will lead to important revisions in the document; the remainder found the document helpful and promising.

o A revision of the document is in progress; it may be ready for distribution and further review at HGM-98 (Turin).

o Several additional activities parallel the Group's work on the guidelines document:

i. Horaitis and Cotton have developed a mini guidelines document on "how to set up a locus-specific database". This advice may be linked form the HUGO MDI website. Its URL is: http://www.debelle.mcgill.ca/guidelines
ii. A directory of mutation databases and related resources is being developed by Horaitis.
iii. A letter documenting HUGO MDI was published in Science (Cotton, McKusick and Scriver, Science 279, 10-11, 1998 (Jan. 2)

Note: References (website addresses) in the Science report contain errors: 80/~cotton should show the tilde not a hyphen. (A correction was published in a later issue of Science).

Charles Scriver (mc77@musica.mcgill.ca) Convenor Software and Content Working Group

3. Report from Nomenclature Working Group Convenor (Stylianos Antonarakis):

The article "Recommendations for a Nomenclature System for Human Gene Mutations" by Stylianos Antonarakis and the Nomenclature working group has been published. Hum Mut 11:1-3 (1998)

4. Report from EBI. (Heikki Lehvaslaiho):

Our federated SRS-based mutation database system now contains nearly all publicly available locus specific mutation databases. At the moment there are over 14 000 unique sequence changes described in 34 human genes. All original fields can be searched and browsed via a Web interface and query results can be viewed either as list of original entries or fields or using a core mutation view common to all databases.

A new database can be added in to the system in one working day. Resulting computer code can then be copied from the EBI database description page into a local SRS program. Curators wanting to have their database added are invited to contact us.

Detailed recommendations * for mutation database design and content based on the lessons learned from creating this system can be found at EBI Mutation Web pages at http://www2.ebi.ac.uk/mutations/.

We are working on making data more easily available for computational analysis over the network and validating the data. A Web based program DNA Mutation Checker is available for validation. The same code can easily be modifed to work as a submission tool to any database.

Heikki Lehvaslaiho <heikki@ebi.ac.uk>
EMBL - European Bioinformatics Institute

* These recommendations are currently being incorporated into the HUGO MDI Software and Content document by C. Scriver and colleagues.

5. Report from HGMD Cardiff:

David N. Cooper, Edward V. Ball, Peter Stenson and Michael Krawczak
Institute of Medical Genetics
University of Wales College of Medicine
Heath Park
Cardiff CF4 4XN, UK

The Human Gene Mutation Database (HGMD) represents a comprehensive core collection of data on published germline mutations in nuclear genes underlying human inherited disease. By February 1998, the database contained over 12,500 different lesions in a total of 692 different genes, with new entries currently accumulating at a rate of over 2,500 per annum. Although originally established for the scientific study of mutational mechanisms in human genes, HGMD has acquired a much broader utility to researchers, physicians and genetic counsellors so that it was made publicly available at http://uwcm.ac.uk/uwcm/mg/hgmd0.html in April 1996. Mutation data in HGMD are accessible on the basis of every gene being allocated one webpage per mutation type, if data of that type are present. Meaningful integration with phenotypic, structural and mapping information has been accomplished through bi-directional links between HGMD and both the Genome Database (GDB) and Online Mendelian Inheritance in Man (OMIM), Baltimore, USA. Hypertext links have also been established to Medline abstracts through Entrez, and to a collection of 516 reference cDNA sequences also used for data checking. Being both comprehensive and fully integrated into the existing bioinformatics structures relevant to human genetics, HGMD has established itself as the central core database of inherited human gene mutations.

Introduction

The Human Gene Mutation Database (HGMD), maintained at the Institute of Medical Genetics in Cardiff, represents a comprehensive core collection of data on germline mutations underlying human inherited disease. Thus, HGMD comprises published single base-pair substitutions in coding, regulatory and splicing-relevant regions of human nuclear genes as well as deletions, duplications, insertions, repeat expansions and "indels", plus a number of complex rearrangements not covered by the above categories. Somatic gene mutations and mitochondrial genome mutations are not included.

The curators of HGMD have adopted a policy of entering each mutation only once in order to avoid confusion between recurrent and identical-by-descent lesions. Reliable discrimination between these two alternatives would require information available only for a very small proportion of known lesions. Therefore, although data on the regional, ethnic and haplotype context of mutations would be extremely useful in terms of epidemiological and population genetics research, any unselective accumulation of literature reports would have resulted in an inflation of references with little immediate scientific use.

Although originally established for the scientific study of mutational mechanisms in human genes (1), HGMD has acquired a much broader utility in that it provides information of practical importance to researchers in human molecular genetics, physicians interested in a particular inherited condition in a given patient or family, and genetic counsellors. In view of its potential usefulness, the curators of HGMD made the database publicly available (2) through the WorldWideWeb in April 1996.

Data coverage and structure

By February 1998, HGMD contained over 12,500 different lesions in a total of 692 different genes (Table 1). Entries are accumulating at a rate of over 2,500 per annum. Coverage is limited to original published reports although some data are taken from "Mutation Updates" or review articles. Mutations reported only in abstract form are not generally included. Data acquisition for HGMD has been accomplished by a combination of manual and computerised search procedures, scanning in excess of 250 journals on a weekly/monthly basis.

All HGMD entries comprise a reference to the first literature report of a mutation, the associated disease state as specified in that report, the gene name, HUGO-approved symbol and chromosomal location. In cases where a gene symbol has not yet been made available owing to the recency of the cloning report, a provisional symbol has been adopted which is denoted by lower-case letters. Single base-pair substitutions in coding regions are presented in terms of a triplet change with an additional flanking base included if the mutated base lies in either the first or third position in the triplet. While substitutions causing regulatory abnormalities are logged in with eight nucleotides flanking the site of mutation on both sides, no flanking sequence has been included yet for substitutions leading to aberrant splicing. Micro-deletions and micro-insertions (of less than 20 bp) are presented in terms of the deleted/inserted bases in lower case plus (in upper case) 10 bp DNA sequence flanking both ends of the lesion. Either the codon number or, in cases where a lesion extends outwith the coding region of the gene in question, other positional information, is provided e.g. 5' UTR (5' untranslated region) or E6I6 (denotes exon 6/intron 6 boundary). Codon numbering may in some cases display inconsistencies owing to the common use of different numbering systems for the same protein. For the majority of genes, however, residue numbering has been standardized with respect to a generally accepted numbering system employing the appropriate reference cDNA sequence. For gross deletions, gross insertions and complex rearrangements, information regarding the nature and location of a lesion is logged in narrative form because of the extremely variable quality of the original data reported.

Data access

HGMD is accessible on the basis of every gene being allocated one webpage per mutation type, if data of that type are present. Since HGMD is partly dependent upon industrial funding and involves considerable editorial work over and above mere literature screening (e.g. to ensure the consistency of nucleotide sequence information, amino acid residue numbering and gene symbol usage), unsolved copyright problems have so far precluded HGMD from being downloadable in its entirety. However, once the closer cooperation with publically funded bioinformatics institutions currently envisaged has been put in place, unrestricted access to the database will become possible. During its first 22 months on the Internet, HGMD has been accessed approximately 2,000 times per week.

Meaningful integration of the data with phenotypic, structural and mapping information on human genes has been accomplished through bi-directional links between HGMD and both the Genome Database (GDB) and Online Mendelian Inheritance in Man (OMIM), Baltimore, USA. In addition, hypertext links have been established from HGMD references to Medline abstracts through Entrez. Hypertext links have also been set up to "reference cDNA sequences" (516 to date) which are also used for data checking. It is planned to include "reference genomic DNA sequences" in the future so as to provide the DNA sequence environment for splicing and regulatory mutations.

The links to GDB and OMIM have enforced the standardisation of disease and gene nomenclature in HGMD. Thus HGMD can be searched either by HUGO-approved gene symbols, GDB accession numbers, or OMIM-compatible disease or gene names. For genes for which Locus-Specific Mutation Databases are available on the Internet, these databases (currently ~40) can be accessed either from the corresponding gene-specific HGMD pages or via the Locus-Specific Mutation Database page (3). Mutation maps are now included for each gene to provide a pictorial representation of those mutations that occur within the coding region (i.e. missense and nonsense mutations, micro-insertions and micro-deletions).

Conclusions and Outlook

Being both comprehensive and fully integrated into the existing bioinformatics structures relevant to human genetics, HGMD has established itself as the central core database of inherited human gene mutations. In order to improve the accuracy, efficiency and rapidity of mutation publication, however, direct submission of mutation data to a central resource capable of (and responsible for) checking the novelty and consistency of data is both necessary and desirable. Although some Locus-Specific Databases have included mutations not published anywhere in the literature, even the close integration of these facilities will be inadequate to the task of meeting the demands likely to be made upon a central data repository. A substantial proportion of published mutation data are derived from genes in which only a handful of lesions have so far been characterised not therefore warranting the establishment of a Locus-Specific Database. Indeed, such a resource is currently accessible via the Internet for only 62/692 (9%) of genes also referred to in HGMD. Although mutation data associated with these genes should comprise ~45% mutations in HGMD (assuming the Locus-Specific Databases to be sufficiently comprehensive), the obvious lack of general coverage stresses the point that comprehensive collection of mutation data can only be performed in generalised fashion. To this end, HGMD has collaborated with Springer-Verlag GmbH, Heidelberg, to establish an online system for the submission and electronic publication of human gene mutation data (4). These data are now being published regularly by Springer's journal Human Genetics in both electronic and printed form. Once published, the data are transmitted to Cardiff and deposited in HGMD.

References

1. Cooper DN, Krawczak M (1993) Human Gene Mutation. BIOS, Oxford.
2. http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html
3. http://www.uwcm.ac.uk/uwcm/mg/oth_mut.html
4. http://link.springer.de/journals/humangen/mutation

Acknowledgements

The authors wish to thank SmithKline Beecham, Pfizer and the Deutsche Forschungsgemeinschaft for their financial support and Iain Fenton for computer assistance.

Table 1. Number of HGMD entries by mutation type (February 1998)
Mutation TypeNo. Of Entries
Single base-pair substitutions, missense/nonsense7959
Single base-pair substitutions, splicing1200
Single base-pair substitutions, regulatory106
Small deletions (<=20bp)2067
Small insertions (<=20bp)731
Small indels (<=20bp)96
Repeat expansions16
Gross deletions (>20bp)177
Gross insertions and duplications (>20bp)72
Complex rearrangements including inversions84
Total12508

HUGO MUTATION DATABASE INITIATIVE
NEWSLETTER NO. 2, JANUARY, 1998

1. The 4th International Mutation Database Meeting was held in Baltimore in October 1997. There were 67 registrants and the meeting was very successful. The meeting program is available online on the website (see below). The attendees and abstracts will be available on the website soon.

2. For those who have not heard, there is a small HUGO Mutation Database meeting in Turin, Italy starting 3pm on March 27th 1998, adjoined to the Human Genome Meeting (HGM'98) 28-30 March 1998. This is mainly to present reports from the working groups and to elicit discussion. Those interested in attending should contact R. Horaitis. (horaitis@ariel.ucs.unimelb.edu.au)

3. An article entitled "Recommendations for a Nomenclature System for Human Gene Mutations" by S. Antonarakis and the contributions of the Nomenclature Working Group and wider community (in total 31 names) has just been published. Human Mutation Vol.11 #1 p1-3. This Human Mutation journal article will soon be available on-line through the publisher's Internet service Wiley InterScience. You may access Wiley InterScience through a link on the home page of the Human Mutation Web Site at "http://journals.wiley.com/humanmutation/" or by going directly to the Wiley InterScience site at http://www.interscience.wiley.com/." We are encouraging all locus specific databases to conform to these standards.

4. Charles Scriver, Piotr Nowacki and other members of the Content and Software Working Group, are currently preparing a Guidelines document describing how to set up in detail a locus specific database. This will be circulated and or posted for more general comment.

5. The campaign to encourage locus-specific databases has begun. Sixty-seven potential curators have been contacted so far, and we have had a positive response from twenty-seven. If you are interested in creating a database and curating for your gene of interest, and have not already indicated this to us, please contact us for further information.

6. Dr Matthias Wjst recently announced a new database for asthma and allergy linkages and mutations. This is available at "http://cooke.gsf.de". He is willing to share details and software with others wishing to set up a common disease/linkage database. For further details contact Dr Wjst (wjst@gsf.de).

Finally, our initiative's community has now been finalised at ~350 and is growing. All communications with members have been documented and the interests and expertise of each member has been listed. Please advise us of any others you may think will be interested.

Websites: HUGO-Mutation Database Initiative (HUGO-MDI): http://ariel.ucs.unimelb.edu.au:80/~cotton/mut_database.htm

HUGO-MDI Mirror site: http://www2.ebi.ac.uk/mutations/


NEWSLETTER NO. 1 AUGUST, 1997

This is intended to be the first of a series of newsletters to those who have indicated they are interested in joining the Mutation Database Initiative and others working in the area, who have not necessarily joined as yet. It seems prudent to keep this to one short printed page in the interest of people reading it. It will also be posted on the Web site.

1. The pace of co-ordination should increase in October as Rania Horaitis, formerly Editorial Assistant of Human Mutation takes the position of full-time Co-ordinator. We are grateful to the March of Dimes for the funds making this possible.

2. For those who have not seen the advertisements there is a HUGO Mutation Database meeting in Baltimore on the 27th-28th October, 1997 associated with the American Society of Human Genetics meeting. Details can be obtained from Liz Evans, HUGO London, Email:- hugo@hugo-europe.org.uk, Fax (44 171 935 8341) and Telephone (44 171 935 8085). Closing date for abstracts is the 19th September.

3. Several new items will be posted on the Initiative's Web site by the end of September.

4. A manuscript has been submitted for publication suggesting content and software for locus specific mutation databases. If and when accepted, comments will be invited.

5. Manuscripts have been submitted on criteria for causation of disease by mutations and again comment will be invited.

6. The current mutation nomenclature system is on the Web site and will be formalised shortly.

7. Volunteers for curating specific databases, for specialised tasks or for working group membership, are invited at any time as are ideas for the future of an integrated central database/locus specific database system. Please contact Liz Evans or Dick Cotton on these matters.


This is updated each time there is a newsletter by the Co-ordinator Rania Horaitis