REPORT OF THE 6TH HUGO MUTATION DATABASE INITIATIVE MEETING

27TH MARCH, 1999

BRISBANE, AUSTRALIA.

9.00 - 17.00

For reasons of economy and convenience MDI meetings are held in association with the ASHG and HUGO HGM meetings which also ensures maximum chance of capture of relevant and committed registrants. The most recent meeting was held in Brisbane, Australia, in association with HGM 99. The attendees numbered 30, far less than our 123 in Denver, but this is probably due to the location being far from both Europe and North America. Nevertheless, it was a productive meeting as leaders of most working groups were present.

It was a very informal meeting. Reports and plans were received from a few groups. Considerable time was allowed for discussion of the presented material, issues and future plans.

Prof. Richard Cotton reported activities of the ALLIANCE Working Group, which serves a co-ordinating role for the MDI. The MDI community is growing and members now number ~>500 compared to ~350 in 1997. 13 new locus-specific databases have been launched in the last year due to MDI encouragement. Previously established databases have also been busy improving their sites.

The Alliance group has taken the role of dissemination of information via a bimonthly newsletter, website, and publications such as an Appendix to chapter 1 "Mutation databases: Overview & catalogues" in "Metabolic and Molecular Bases of Inherited Disease", 8th Ed. McGraw Hill in preparation, as well as Unit 7.11 -"Human mutation databases" in "Current protocols in human genetics", Wiley-Liss now in press. The journal Human Mutation has also agreed to publish our meeting reports.

Calls were made by the alliance group and mutation nomenclature has now been adopted by the journals Human mutation, AJHG, Mutation Research, the "Trends" journals and Nature Genetics. The Leeds Castle polyposis group has also formally announced it would take on these recommendations.

It was mentioned that volunteers are urgently required for the following groups copyright & intellectual property, patient aspects of databases, ethnic and national databases, quality control and peer review, spectral databases, polymorphisms/SNPs, software assessment.

The quality control group has the draft questions for a mutation entry form now available for viewing at http://ariel.ucs.unimelb.edu.au:80/~cotton/entry.htm. Any comments regarding this form must be made to Rania at (horaitis@ariel.ucs.unimelb.edu.au) by May 14th at the latest, so that a final recommendation may be made. More on this later in the report.

Reports were given by some of the CENTRAL databases.

Dr Heikki Lehvaslaiho from the EBI reported on the cross-linking of mutation databases. The federated approach to central mutation databases at EBI has lead to a collection of most of the publicly available mutation databases under one structure and user interface. (http://www2.ebi.ac.uk/mutations/) This is implemented using the SRS program developed by Thure Etzold. Verification and database cross-linking are now the main emphasis. The two largest single locus databases are the p53 databases at IARC (Lyon, France) and Hopital Necker-Enfants Malades (Paris, France). Cross-linking of identical entries have been done using the trivial name of the mutation, literature reference and original name of the mutation. A general mechanism of linking between single locus databases and general databases with protein level mutation description has been implemented. Using interspecies gene homologue lists the links can be extended to other species. This allows an easy way of comparing the coverage of different databases. Comparing allelic variants from OMIM to SWISS-PROT sequences and variants has lead to an insight to various ways protein sequences are commonly numbered. A small improvement has been added to the EBI Mutation Checker. It is now possible to use genomic reference sequence with standard cDNA numbering. Finally, at the last MDI meeting at Denver, a need was recognised for a utility tracking updates and existence of mutation database web sites. Weekly status of MutRes (The EBI database of mutation databases) entries is now collected into MutResStatus database that is available through the SRS interface.

Dr Steve Sherry was to speak on computing and information resources for genome analysis at the NCBI but was unable to attend due to a sudden illness.

The genome database, (GDB) was reported by Dr Jamie Cutticchia. As we all know, GDB had announced its termination but now has guaranteed funding for 4-5 years from various philanthropic sources and is able to continue its work. Operations have now moved from Johns Hopkins University (JHU) in Baltimore, to the Hospital for Sick Children (HSC) in Toronto. However, JHU has agreed to provide space and support for the GDB staff that now number 5, - a further staff increase of 5 more has been proposed. Curatorial staff will be at both JHU and HSC the latter will assume general operations and provide computer support. JHU will have curatorial control of GDB with high speed connections to HSC. The server is due to move to HSC in April. The URL http://www.gdb.org is active but points now to HSC not JHU. It is proposed that HUGO serves as an advisory board for GDB and that. GDB is now more science focussed than informatics focussed as it was in the past and aims to working with the community on the acquisition and integration of information rather than developing the database itself. There are a few problems. The code is the result of long in-house development and has problems in maintainability. So far former GDB programmers have generously offered their free help, but the long term solution is to move to commercial platforms. The future of GDB can be seen to be moving away from existing code and onto commercial platforms. Data submission and curatorial work will be continued with HUGO to define the role of GDB. GDB is hoped to become more federated, i.e. work with other databases e.g. searching the CF database with GDB tools is now possible. GDB would like to prepare a shell and provide assistance to those curating mutation information and wishes to work with the gene advisory committee to curate the appropriate information and give up those parts that can be better curated elsewhere. GDB also wishes to work with other databases e.g. GeneBank to define better relationships and linkages. GDB is willing to provide hardware for hosting "virtual" servers. It is also willing to host meetings on federation of databases.

Two types of SOFTWARE that may be used in database curation were presented, Mutation View and MuStaR™.

MUTATION VIEW was demonstrated by Dr Shinsei Minoshima. This latter database/software application has been developed even further. It has a very user friendly interface with visual data presentation. At present this databases contains information for 16 eye disease genes, 2 Parkinson's genes, 3 Charcot-Marie-Tooth genes, 4 heart disease genes and 1 gene for autoimmunedisease. The software is made readily available to curators of LSDBs who wish to establish a world-wide database.

Dr Alastair Brown gave a progress report and demonstration of MUSTAR™ - mutation storage and retrieval program (http://www.hgu.mrc.ac.uk/Softdata/Mustar/). The PAX6 and PAX2 mutation databases are run on this program (http://www.hgu.mrc.ac.uk/Softdata/PAX2/ http://www.hgu.mrc.ac.uk/Softdata/PAX6/). This system comprises a Curation program used to maintain a LSDB, and a suite of programs to create and run a Web site for the database. It is free to the academic community for non-commercial use and incorporates HUGO MDI recommendations. New mutations are entered by the Curator using the curation program. (either directly or by importing submissions made via the Web site). The curation program can also be used to update existing entries, to search the database, and to export the data for the web site software. The program is a stand-alone Microsoft Access application written for Windows 95/98/NT, but can be run on a Macintosh or UNIX system running SoftWindows, it can be kept on a laptop if necessary. The curation program was demonstrated. It has a form-like friendly user interface whereby new mutations may be submitted, simple and advanced searches are available, other information may be entered, and data may be accepted from an exported flat file. The curation program is currently being tested, and will be available in its basic version soon, until then a MS-Access table is available to simplify the creation of a flat file database for use with the Web site software - this does not contain the error checking facilities of the full Curation program, but provides drop-down lists of field values where possible. Dr Brown then went on to demonstrate the website. An access controlled curator control panel managers and updates the web pages. The MuStaR software creates tables of all mutations as well as summary tables by intron and exon. Data is listed by systematic mutation name. Clicking the mutation will produce that entry's full record. A submission form is available to allow remote users to submit new mutations to the Curator, who is then notified a new mutation is to be entered. Anyone with web access may search an on-line copy of the database and the database may also be downloaded in plain text. Main changes to this software since the Denver meeting are that the curation program now has a better user interface, the database can now handle splice variants, all the pages can be curated except the submittal and search forms which will be added soon to the curation program. The current status of MuStaR is that the curation program is to be tested, the access table may be used as a temporary measure, the web software is now ready, improvements will and are being added. Future additions-quality assurance fields are to be added, mutation data from patient and phenotype data is to be separated and data will be separated by links. The entire system has been based around the requirements of users, recommendations made by the HUGO MDI working parties, and suggestions made by participants at the Denver MDI meeting in October 1998.

As the meeting was held in Australia, Dr Tim Littlejohn, Head of the Australian National Genomic Information Services (ANGIS http://www.angis.org.au) presented an outline of bioinformatics in Australia. ANGIS provides bioinformatics services through a network of internet-based bioinformatics servers around the Nation. These servers provide access to a large number of molecular sequence databases and bioinformatics software tools for the analysis of genomic information. The ANGIS service is accessed through the integrated WWW based workbench named "WebANGIS". A number of introductory, advanced and one-week bioinformatics courses are also run, so ANGIS plays a significant role in bioinformatics education in Australia. Support is also a part of its facility services. ANGIS's service supports mutation detection and analysis application areas through its core tools and through some specialised resources such as the Online Mendelian Inheritance in Animals (OMIA) database developed by A/Professor Frank Nicholas at the University of Sydney. ANGIS has established collaborations abroad with a server established at the University Putra Malaysia and with the WebANGIS system being adapted for use in the South African National Bioinformatics Institute. Recently ANGIS has licensed its technologies into an Australian Company, EnCompass Bioinformatics (www.en-bio.com) to allow it to bring these bioinformatics technologies to the wider international research community.

One LSDB was presented in Brisbane. This was a disease-centred mutation database for familial hypertrophic cardiomyopathy (http://www.angis.org.au/Databases/heartbreak.html), presented by Dr David Fung. FHC is an autosomal hereditary disorder of the sarcomere. There has been no public database dedicated to the collection of FHC-associated data. This database aims to provide an online resource that contains summarised and updated information on mutations, molecular epidemiology, and phenotypes, for researchers and clinicians. Unpublished mutations may also be recorded. Due to limited funds and time, Dr Fung's team made full use of freeware available either on the web or bundled with Linux such as PerlBuilder, Netscape, DBI, DBD, CGI.pm, PostgreSQL and Apache. Thus, a database was created on an extreme budget. Data security was implemented by restricting access to the master database while making the production copy publicly available. The HUGO MDI recommendations are used throughout the database.

Ample time was allowed for the discussion of ISSUES in the afternoon.

Content of LSDB: It was recommended that databases should follow the guidelines to be published soon, Scriver C.R., Nowacki P.M., and Lehvaslaiho H. Guidelines and Recommendations for Content, Structure and Deployment of Mutation Databases. Hum Mut Vol.13#5 (1999).

Mutation entry form: It was agreed that the form used to enter new mutations in a LSDB should incorporate all the recommendations in the Scriver et al guidelines. The draft form is available on the MDI website for viewing (http://ariel.ucs.unimelb.edu.au:80/~cotton/entry.htm) This is a non-interactive form and is only a draft of questions which should be included. It was decided the form will be made available until May 14th 1999 for comment. All comments are to be sent to Rania (horaitis@ariel.ucs.unimelb.edu.au). A final recommendation will then be made. Dr Jamie Cuttichia commented that a JAVA applet should be used with a standardized mutation form, and that there should be one place on the web where people are able to submit their mutations. The form should then be able to pass on the appropriate information to the relevant LSDB. Thus, this is a proposal for a central entry form.

Nomenclature: A decision needs to be made on the naming of complex mutations.

Software: Various types of software have been made available to the MDI for the creation of LSDBs. It has been suggested the MDI make recommendations. The MDI however, cannot dictate a specific type must be used. The software should require minimum human intervention, and should be in a format which may be integrated in the central databases, therefore a standard data exchange can be made. The central database people should therefore work with the software creators/curators of LSDBs.

Central databases:

Copyright & Intellectual Property: It was agreed information should be made freely available. However, there are problems, data in a LSDB may be copied in its entirety and placed on another LSDB without knowledge and acknowledgement. Authors may wish some protection for mutations that are unpublished. A question as to whether unpublished mutations should be part of an electronic journal was brought up, however Prof. Van Ommen brought up the fact that websites are increasingly now accepted as a publication.

Polymorphisms/SNP databases: All people are to be encouraged to deposit polymorphisms into a database. Dr Lehvaslaiho comments there is a trend of adding extra tables for polymorphisms in a database, however these differ from the general database and make it harder to integrate into the SRS system. He recommends that polymorphisms should be included in the general structure of the database. Dr Sylvia Spengler commented that the map location of a polymorphism is important and should be included.

Ethnic & National databases: National databases are important in particular countries for healthcare, e.g. F9 in the UK. It was decided countries and specific ethnic groups should be urged to document their mutations for their own national interests.

Patient aspects of databases: It was acknowledged that patients who wish to find information will find the LSDB. The LSDB should included some information for them, even if only pointing to another site, e.g. the PAH database does include such information. Some databases have registered every patient and what their mutation is, e.g. the F9 and BTK databases. A Thalassaemia database is now under construction in the UK.

Spectral databases: No comments were made.

Electronic journal: The overwhelming philosophy is that information on the web should be made freely available and that web journals are often considered 2nd class to print journals. Funding to keep such a journal running is an issue although advertising could be a possible avenue. An incentive to enter mutations into databases could be that a print journal will not accept information unless submitted to a database, however sometimes a mutation may never be published. Giving an electronic journal prestige like a print journal or possibly under the aegis of existing journals or even under HUGO maybe possible. Prof. Cotton is to put a proposal together and circulate it for further comments.

FINAL RESOLUTIONS:

1. Mutation submittal form will be available for 1 month for comment. to be modified if necessary. MAY 14th. This will then be the HUGO MDI recommended form.

2. Dr Jamie Cuttichia to produce an interactive user-friendly mutation entry form for the MDI.

3. Guidelines should be put in place about copyright protection of databases.

4. Polymorphisms should not be separated in databases but included in the general structure of the database.

5. Proposal for an electronic journal should be pursued.

6. Complex nomenclature needs pursuing.

7. Meeting October 19th, 1999, in San Francisco, U.S.A. agreed.

Reported by Rania Horaitis

Posted 26th April, 1999