Working Group Draft 2
December 23rd, 1999

HUGO/MDI COMMUNITY-WIDE DATABASE: PROJECT SPECIFICATIONS

 

A. INTRODUCTION

 

Mutation databases are becoming increasingly consulted by clinicians and those in the diagnostic arena as evidenced by increasing use of locus-specific databases (LSDBs) and HGMD, a listing of published mutations. There are approximately an equal number of unpublished mutations as published and it is becoming more difficult to publish mutations. The HUGO Mutation Database Initiative (MDI) was formed in 1994 to ensure collection of all mutations in an accurate and reliable manner and their subsequent distribution through the Internet. This task becomes more important as biotechnology later uncovers more mutations and needs the information for progress.

Following extensive discussion, the October 19, 1999 MDI meeting passed a resolution committing the Community to study and, if feasible, implement a new institution designed to (i) strengthen existing LSDBs, and (ii) put integrated, next generation data-sets and software tools into the hands of researchers. The proposed facility that might be dispersed made of many components, would support itself by fundamentally new methods that may include sales to the commercial sector. The proposed facility would provide basic data and research tools to Community members (including LSDBs) without charge.

The first step is to create detailed specifications describing the proposed facility. To this end, a working group including leading bioinformatic experts, LSDB operators, central database operators, and an industry representative was formed. This document reflects their work.

MDI's consultants will use the final version of this document to prepare a detailed business plan. The plan will be presented and put to a vote at MDI's April 9, 2000 meeting in Vancouver. (new addition May 10, 2000- Vancouver meeting synopsis now online) In view of this deadline, it is important that we receive your comments on or before January 31, 2000. Stakeholders and other relevant parties will then be canvassed for expression of interest in forming components of the facility.

B. GENERAL

B1. Description

Spec. B1.1:

The goal of the HUGO Mutation Database Initiative is to collect and disperse accurately, efficiently, and completely a record of predominantly disease causing variation together with other variations where relevant (e.g., inorganic variation).

Spec. B1.2

To achieve this goal, MDI has resolved to study and, if feasible, implement a new proposed facility that would:

Comment:

MDI has laid the ground for the proposed facility by, among other things, promoting debate and consensus-building discussions; fostering the development of common nomenclatures; encouraging appropriate software development; and developing the particular ideas embodied in these Project Specifications. As a result of these activities, most if not all of the final components for success are already in place. The community needs LSDBs (inch wide mile deep) as well as mile wide inch deep general databases.

B2. Community Support.

Spec. B2.1:

The proposed facility's data will come from periodic, voluntary contributions by LSDBs, clinicians, and other members of the Community as set forth more fully in these Project Specifications. Community members will be asked to pledge their support in advance. B3. Funding. Spec. B3.1: The proposed facility will use traditional public and/or private sector grants (cf. the SNP Consortium) to the maximum extent possible. Comment: MDI will investigate funding opportunities which may exist through NIH, MRC, the US Department of Energy, HUGO, ASHG, and various other public and private institutions. Persons reviewing these Project Specifications are invited to suggest additional possibilities. In addition to conventional cash grants, the proposed facility may be eligible for significant in-kind assistance, services, and/or expertise. Spec. B3.2: If suitable grants cannot be found, the proposed facility should be prepared to support itself by selling suitably commercialized versions of its databases and tools to the private sector.

Comment:

This model was originally suggested in Steve Maurer's Jan. 2000 contribution to Human Mutation. It was discussed in detail during a three hour stakeholders meeting on October 18 and the MDI meeting on October 19. Pending a detailed business plan, it appears viable.

B4. Launch Date.

Spec. B4.1:

The proposed facility should be launched in Autumn (North Hemi.), 2000 and offer products to Community and private sector users no later than Spring, 2001.

C. GENERAL BENEFITS TO COMMUNITY.

C1. Core Data.

Spec. C1.1:

The proposed facility should include all data fields currently found on MDI's model form (http://ariel.ucs.unimelb.edu.au:80/~cotton/entry.htm). All volunteer contributors should include this core data in their submissions.

Comment:

MDI has achieved considerable consensus on what types of data/nomenclatures are needed. The model form reflects this.

C2. Optional Data.

Spec. C2.1:

The proposed facility should include certain additional data fields (e.g., patient I.D. numbers) as optional. The Community would determine and periodically revise the proposed facility's list of optional fields.

C3. Advanced Computing Architecture/Powerful Tools.

Spec. C3.1:

The proposed facility should be built around a computing architecture that (i) uses state-of-the-art bioinformatic techniques, (ii) facilitates maximally flexible user-defined searches, and (iii) has been specifically tailored to human mutation data.

Comment:

Particular software architectures (e.g., object oriented, relational, and object relational) will be evaluated according to the proposed facility's needs in light of then-existing technology. Candidates submitting proposals to design and operate the proposed facility will be asked to identify which tools and architectures would best serve the Community's needs.Candidates will be encouraged to research and develop sophisticated databases, tools and new management techniques.

Spec. C3.2:

In order to meet the goals set forth in Spec. B1.2(i), the proposed facility will develop a set of powerful bioinformatic software products and related data-sets, hereinafter referred to as the Core Tools. The Core Tools will be made available to academic, government, and qualified non-profit members of the Community without charge.

Comment:

For economic reasons, the proposed facility may need to sell commercial versions of the Core Tools to the private sector at substantial prices (~$10,000/subscriber/year). To the maximum extent possible, any extra features included in these commercialized products will involve services (e.g. nightly updates) which are relatively unimportant to academic and non-profit users.

Comment:

Non-profit institutions whose missions and activities are substantially similar to those of academic researchers would receive access to the Core Tools without charge. Eligibility would be interpreted liberally on a case-by-case basis.

C4. Curation and Maintenance.

Spec. C4.1:

The proposed facility should add value by curating, editing, and updating its data.

C5. New Databases Over Time.

Spec. C5.1:

In order to increase its financial viability, the proposed facility may create new tools and databases aimed at commercial users. The proposed facility shall take reasonable steps to develop and add non-commercial versions of such tools and databases to its Core Tools.

C6. First Sale Principle.

Spec. C6.1:

Society only benefits from scientific data when such data is used. For this reason, the proposed facility would encourage academic, government, non-profit, and commercial users to re-use, re-publish, and re-process any or all data in its collection. To the maximum extent feasible, the proposed facility shall avoid any business model (e.g., pass-through rights) that impairs this goal.

Comment:

Detailed proposals will be developed by our business consultants. The basic strategy was discussed by Steve Maurer at the October 19 meeting and can be found in his contribution to Human Mutation's forthcoming Jan. 2000 issue.

C7. Rights in Contributed Data.

Spec. C7.1:

The proposed facility will have the full, non-exclusive right to use, modify, and redistribute all data submitted to it for any purpose described in these Project Specifications. Volunteers submitting data will retain all other rights to use, and if appropriate license, sell, and redistribute the data as they see fit. Volunteers are encouraged to exercise these retained rights according to copyleft principles.

Comment:

Copyleft refers to the practice of providing copyrighted materials to any member of the public who requests them without charge, provided that he or she agrees to keep those materials (and any derivative products) in the public domain. The basic idea is to keep commercial entities from using the material to create private label products that can be sold at a profit.

Spec. C7.2:

Consistent with sound economic principles, the proposed facility will conduct its business using copyleft principles to the maximum extent possible if appropriate.

Comment:

MDI will instruct its consultants to study and report on copyleft options as part of their business plan. Persons interested in copyleft are asked to submit business models consistent with these Project Specifications for further study.

D. COMMUNITY CONTROL

D1. Governing Body

Spec. D1.1:

The proposed facility will be operated by and on behalf of the Community as a whole. Management would be performed by a Governing Board elected by the Community. Whenever possible, major management decisions will be made in consultation with the Community after full and fair discussion (e.g., at MDI meetings).

Comment:

The structure, powers, and duties of the Governing Board and its members will be specified in a detailed charter. The charter would be presented to and voted on by Community members at an MDI meeting held prior to the proposed facility's launch date.

Spec. D1.2:

Day-to-day operation of the database will be left to the reasonable discretion of one or more contract operator(s). See subpart F, below.

D.2 Science Advisory Panel

Spec. D2.1:

All persons and entities contributing, maintaining, or curating data for the proposed facility shall do so in accordance with generally recognized practices, procedures and methodologies then-current in the Community.

Spec. D2.2:

The Governing Board should appoint a Science Advisory Panel. Among its other duties, the Panel will be responsible for (i) reviewing the qualifications and credentials of any person (including, but not limited to LSDB operators) who wish to review, edit, curate, or annotate raw data on behalf of the proposed facility, and (ii) take reasonable steps to ensure that such activities are performed in a timely, accurate, and conscientious manner consistent with recognized practices, procedures and methodologies then-current in the Community.

Spec. D2.3:

From time to time, the Governing Board shall call on the Science Advisory Panel to provide such other expertise and advise as may be appropriate.

D.3 Ethics Advisory Board

Spec. D3.1:

The Governing Board shall appoint an Ethics Advisory Panel to prepare policy guidelines for the proposed facility. Thereafter, the Panel will continue to advise and consult with respect to ongoing and proposed operations.

Spec. D3.2:

From time to time, the Governing Board shall call on the Ethics Advisory Panel to provide such other expertise and advise as may be appropriate.

D.4 Public Communications

Spec. D4.1:

It is reasonable to think that the proposed facility's databases will occasionally be consulted by lay people, working clinicians, and other medical professionals. The proposed facility shall take all steps reasonably necessary to help these individuals use the proposed facility in a safe and ethical manner.

Comment:

One strategy would be to answer questions by telephone or e-mail. Staffing could be accomplished by a network of Community volunteers, partnership with a charitable institution (e.g., March of Dimes), or through a commercial firm willing to support the effort without charge as a public service.

D.5 Responsibility for Reviewing, Maintaining and Curating Data; Orphan Mutations.

Spec. D5.1:

Each participating LSDB operator will be responsible for keeping his or her LSDB current and up-to-date.

Comment:

In addition to making their data available to the proposed facility, some LSDB operators may choose to operate a parallel web site on their own servers. In order to maintain true parallelism, participating LSDB owners will be asked to forward new information to the proposed facility as soon as it appears on their own servers.

Spec. D5.2:

The proposed facility will receive data from persons (including, but not limited to clinicians and healthcare workers) who do not operate LSDBs. Where submitted data falls within the subject matter of an existing LSDB, the proposed facility will immediately forward it to the LSDBs operator. The operator shall (i) review, curate, and annotate the data as necessary, and (ii) promptly return it to the proposed facility for posting.

Comment:

The Community will have to decide whether to make as-submitted raw data available on-line. Alternatives include keeping raw data confidential; making it temporarily available until a reviewed/curated version becomes available; and making it permanently available so that interested researchers can form their own independent judgments.

Spec. D5.3:

The proposed facility will receive data for which there is no existing LSDB. A volunteer network of Community members will be created to review, curate, and annotate such data on behalf of the proposed facility.

E. BENEFITS TO LSDBs

E1. Technical Support.

Spec. E1.1:

In return for regular contributions of data, the proposed facility should strengthen and support existing LSDBs. Comment: Support may include:

Additional ideas may come from bids and contract proposals.

E2. Publication Credit/Gateway Support.

Spec. E2.1:

The proposed facility would be designed so that users could readily find out which LSDBs contain data on a particular subject and follow links back to participating LSDBs.

Spec. E2.2:

The proposed facility shall take all reasonable steps required to ensure that contributors receive full attribution for all data submitted.

Comment:

MDI will instruct its consultants to study and report on economically viable strategies for recording attribution as part of its business report. These options include, but are not limited to, routine publication of submitted data in PubMed Central.

F. CONTRACT OPERATORS

F1. Creation, Operation, and Maintenance.

Spec. F1.1:

The Community will solicit bids and proposals from all qualified candidates interested in designing, creating, and operating the proposed facility or any component of it. Bids and proposals will be evaluated according to merit.

Spec. F1.2:

Several Community members already operate large database/software projects and may be interested in creating and operating all or part of the proposed facility. Consistent with Spec. F1.1, it is in the Community's interest to strengthen these entities by paying them to perform work for the proposed facility wherever possible.

Comment:

Central databases such as HGMD, NCBI, EBI together with current software creators are obvious candidates for successful bids.

Spec. F.1.3:

Entities performing paid work for the proposed facility would remain free to operate and maintain other projects as they see fit.

Spec. F.1.4:

Contracts to operate and maintain the proposed facility would be re-bid at three-year intervals.

G. INTERFACING WITH THE COMMUNITY

G1. Deposits.

Spec. G1.1.

The proposed facility would use state-of-the-art software to make data submissions as easy as possible.

Comment:

The current lack of clear and easy methods for submitting data and entering it into computationally advanced databases is a major obstacle to (i) the creation of new LSDBs, and (ii) the publication of observations by members of the clinical community. Advanced, easy-to-use forms would maximize the publication of information which currently goes unpublished and uncollected.

Comment:

The proposed facility would make a special effort to encourage submissions by (i) researchers interested in launching new LSDBs, (ii) clinicians, and (iii) other Community members who have not previously published data.

G2. Access.

Spec. G2.1:

The proposed facility's projects will be made available over the Internet, CD-ROM, and other reasonably convenient media.

H. Private Sector Issues

H1. Computing Architectures

Spec. H1.1:

The proposed facility's commercial data products would be presented in convenient computing formats that private sector customers could run and manipulate in-house. The products would also be offered in flat-file versions.

Comment:

Particular formats (e.g., Sybase or Oracle), will be evaluated according to potential customers needs in light of then-existing technology. Candidates submitting proposals to design and operate the proposed facility will be asked to propose suitable formats.

H2. Business Model

Spec H2.1.

The product would be sold to by subscription. Advance subscriptions would be solicited to help cover start up costs.

I. GATEWAYS

I1 Partnerships

Spec I1.1

Users of the proposed facility could use it as a gateway to PubMed Central and/or other on-line journals.

Spec. I1.2

The proposed facility would be free to accept advertising in those cases where it was economically desirable to do so.

Spec. I.1.3

The proposed facility would be free to accept gateway fees when users exploited links to other on-line resources. Gateway fees would not be charged to academic, government, or qualified non-profit users unless the on-line resource was prepared to absorb the entire fee from its own resources without charging the affected user.

J. STAKEHOLDER SUPPORT FOR THE SPECIFICATIONS AT 23 DEC. 1999

Link to complete stakeholders list

Cuticchia J. Representing GDB
Katz M. Representing M.O.D
Lehväslaiho H. Representing EBI
Maurer S. Representing the Law
HUGO
Cotton R.G.H. Convenor and representing HUGO
Van Ommen G.-J HUGO President
Locus specific databases
Auerbach A. Fanconi anaemia database
Gottlieb B. Androgen receptor database
Scriver C. Representing LSDB, software and the MDI
Software
Beroud C. Representing UMD software & LSDBs
Shimizu N. Representing MutationView, software and JBIC
Industry
Micklem G. Representing Incyte
GeneClinics
Tarczy-Hornoch P. Representing Patient Aspects of databases

COMMENTS ON THIS DOCUMENT SHOULD BE DIRECTED TO RANIA AT horaitis@ariel.ucs.unimelb.edu.au AS SOON AS POSSIBLE, BY 31 JANUARY 2000.

Updated 10th May 2000