Sharing Genotype and Phenotype Data between Stakeholders: The UK Experience

This abstract has open access

Abstract Description

The introduction of whole genome sequencing and other genotyping platforms in the delivery of precision medicine poses critical questions about how to deliver support data archiving, , analysis and interpretation for the generation of clinical-grade reports. It is one of the key challenges for Unified Health Care systems who are in the process of implementing omics-based delivery of more precise intervention in prevention and treatment of disease.

The 100,000 Genomes Project set a number of key principles. Those participating in the project are invited to sign a national consent after having been informed about the project with a unified national participant information leaflet, phenotype data are captured using Human Phenotype Ontology (HPO) terms, sequencing results are generated by a single laboratory and are processed through a unified analysis pipeline, which monitors sequencing quality and calls variants. The decision for ccentralised informatics systems was from the outset of the project. The reasons underlying this decision are: it is easier to build, allows large scale analyses far easier if compared with a federated data ecosystem, and it is more cost-effective to build. Concerns about central design stifling innovation need to be recognised and the central system functions as a hub-and-spoke with locally delivered, but centrally purchased compute solutions for interpretation of genotype and phenotype data by multi-disciplinary teams (MDTs).

Bringing innovation to the diagnosis and care for patients with rare diseases and cancer is critically dependent on the sharing of data between hospitals and community health care services. Having a centralised compute architecture allows the standardisation of processes and data. This makes it easier to achieve data capture from the plurality of healthcare providers in a Universal Health Care system, like the National Health Service.

The development of data models is required to exchange rich information in computable ways and humans can understand what it is about. Application Program Interface (API) modules that are open and can be used to programmatically interact with the system should be the preferred option and a modular architecture provides opportunities to bring third party analytics to the data. The 100,000 Genomes Project philosophy for analysis has been to define end-user required interpretation services, that can be applied as required instead of "hard coding" into algorithms as part of an analytics pipeline. This modular architecture provides hospital users at Genomic Laboratory Hubs and other locations with a tailored analysis service to be applied on the data. This approach supports Tiering (variant classification), Exomiser (gene prioritisation), Cardioclassifier (classification) and many others will be added in the future.

With centralised data processing and locally driven variant interpretation at hospital-based Genomics Laboratory Hubs poses a risk of a disconnect of faster data analyses at the centre than interpretation can be completed at the decentralised hubs. Tools need to be developed to monitor the level of disconnect because otherwise a national project is at risk to fail. One-way interpretation can be improved is by building a knowledge base from the information that is accumulated in the 100,000 Genomes Project and the local Genomic Medicine Service.

In summary today, population scale genomics is driven towards centralisation by the type of sequencing technology. This has created a requirement to centralise information processing. However clinical data are to remain decentral at hospital level. Applications for pushing & pulling the clinical data into a 'safe haven' are required for analysis of joint-up data. We are convinced that for the time being a system of distributed expertise for variant interpretation by MDTs in the context of the clinical phenotype is essential to reap the benefits from genotyping by whole genome sequencing or other genotyping platforms for patient care.

Abstract ID :

HAC701463

Submission Type

Speaker