It is commonly acknowledged that healthcare data has tremendous potential as a tool to improve overall public health and unlock significant innovation in both the public and private healthcare sectors. Use cases based on the sharing and processing of healthcare data (e.g., machine learning, AI, etc.) are numerous.
Recent notable examples include leveraging data-driven algorithms to improve cancer diagnosis, reduce dangerous drug interactions, and predict the best treatment option based on a patient’s specific symptoms, history, and demographics, etc.
As far as the healthcare data is disseminated across all actors, a key enabler is healthcare data sharing, on which this paper focuses.
This requires overcoming a range of distinct challenges, including careful consideration of patient privacy concerns, anonymization of data without loss of relevance, and the harmonization of data across disparate sources and IT solutions.
These barriers to sharing healthcare data are gradually being broken down, driven by a few major national initiatives across Europe (Health Data Hub in France, NHS in the UK, MII in Germany) and Israel, which we review in this paper.
If the regulation regarding healthcare data sharing remains at its early stage, what are the opportunities for start-ups and corporations to start launching data sharing projects?
When should they enter the game? And what is missing to seize these opportunities alongside the national initiatives?
2019 has seen the adoption of a new directive on open data and the re-use of public sector information (PSI) by the European commission. It introduces the concept of high value data sets which are held by public sector bodies and public undertakings and must be made available to support their re-use  by a wider audience. To date, these priority datasets include geospatial, earth observation and environment, meteorological, companies and mobility data.
The current directive notably excludes both healthcare data (avoiding the complexities of personal data) and privately owned data.
All of these high value data sets have an important common characteristic: they do not require extensive anonymization work before being released to the public and they are predominantly owned by public entities; two good reasons for the European Commission not to consider healthcare data within the scope of the first directive.
Sharing the precious public healthcare data of millions of citizens at the European level takes time. Furthermore, the data is fragmented, redundant and incomplete. It is also owned by private parties who might not be so keen on sharing it for free. Nevertheless, major recent national initiatives are opening the way for a more expansive European level sharing of data.
The private sector has already seen its fair share of data management constraints since the introduction of the GDPR (General Data Protection Regulation), but this will not be the status quo going forward. The forced sharing of data for private actors such as PSD2 (Payment Services Directive 2) is a vivid example of how sharing can be imposed to corporates (banks in this case), and the next iteration of the GDPR is also in development.
The public sharing of healthcare data is gaining momentum and European corporates (pharmaceutical companies, medical facilities, insurers, etc.) are not immune from regulated sharing. Nevertheless, healthcare actors can have a range of business reasons to proactively begin sharing their data.
Health data can be segmented into two main categories: individual data (related to the patients’ health status, the medical expert’s diagnoses payments and reimbursements) and aggregated data (related to facilities or practitioners, including overall medical payments & reimbursements).
These data sets are generated by different entities, each with varying levels of access to the same data. Exhibit 1 shows a synthesis of data ownership, even though each country has its specificity.
Although the use of healthcare data is often of a sensitive nature, use cases demonstrate that data sharing can lead to various benefits, including improvements in public health outcomes, cost efficiency enhancements, and an improved ability to simulate and model healthcare economics.
Selected prospective use cases are explored in this direction:
Health practitioners are then willing to share these data sets in order to showcase their strengths and be selected.
Healthcare data sharing’s main challenge lies in the balance between aggregation (to protect personal data) and informational value (many use cases require individual level information).
Anonymization is a critical pain point to unlock healthcare data sharing, if for no other reason than to comply with the GDPR. Its goal is to prevent any direct or indirect  attempts to retrieve the individual linked to the data. In this respect, it should not be confused with pseudonymization, which only avoids direct patient identification. This is the approach retained by the French Health Data Hub, in order to keep high informational value.
Full anonymization is indeed systematically detrimental to the value of the data set: the more it is anonymized or aggregated, the smaller the scope of applications. Finding the right balance is a challenge and the data sharing IT platform must carefully address anonymization workflows.
Data harmonization, as exemplified by the German initiative to form consortia of health actors (as described in the next section), must also be considered among the top priority challenges to address. Ensuring that the data has the same meaning and the same granularity for the two sharing parties can be extremely time consuming. This is especially true for unstructured data, which represents a sizable proportion of healthcare data.
Many other challenges stand in the way, the technological choices for sharing being one of them. Many voices were raised against the French Health Data Hub for using Microsoft’s Azur cloud computing, which sparked the fear of seeing the health data of millions of French citizens compromised by the Americans in the context of the CLOUD Act .
This also shows how sensitive the subject can be, and that clear communication and transparency are important to prevent the general opinion from backfiring.
On the other hand, some innovative Tech solutions are emerging, such as the Substra foundation recently released by the French start-up Owkin. Substra is an open source platform based on the blockchain and relying on advanced cryptography and machine learning. It provides a unique solution for healthcare actors to share data at individual level, while protecting data confidentiality. Apricity, another start-up focused on fertility, is starting to use Substra to secure the processing of sensitive fertility data, specifically under the precise and very strong regulation in the UK.
Embleema, another start-up partnering with large pharmaceutical groups, is also pushing for more healthcare data sharing. It uses the distributed ledger and smart contract features of blockchain to give patients control over how and with who they want to share their data
Several initiatives have been launched recently, Israel being the pioneer in terms of data sharing and access (please refer to Exhibit 2).
Israel is a potent example of what a centralized management of data can do (only 4 HMOs encompassing all Israeli citizens and system interoperability across hospitals). Thanks to the genetically diverse population and 20 years of digitized medical records, the country is sitting on a gold mine. In 2018, a $300M state initiative was deployed to further centralize the data and foster its
availability to corporates and startups. As an illustration, this initiative has contributed to attracting AXA-backed Kamet (start-up studio focused on healthcare and medical sectors) to open offices in Israel. The ties between the 4 national publicly owned HMOs and the Israeli startups have no equivalent anywhere else in the World. The numerous examples of these public/private partnerships include Maccabi (one of the 4 HMOs) granting access to 6 million pathological exams to the Ibex start up; this was done in order to train their algorithm to counter-diagnosis prostate cancer biopsy.
In Europe: France, the UK, Germany and Scandinavian countries are leading the way.
The French national data base SNDS is composed of the national health coverage reimbursement, the analysis of the medical activity of hospitals and medical causes of death. One year ago, the French government launched an ambitious program and platform named Health Data Hub to foster access to this database by researchers and corporations. In each call for proposal, projects are selected on their database enrichment potential as well as on their ability to demonstrate general interest contribution. Although access is still restricted to occasional calls for projects, Health Data Hub officials claim that their near-term objective is to offer a more agile and inclusive process for interested parties. Among the 10 projects selected during the first call, many involve private companies or startups. Besides the previously mentioned Malakoff Médéric initiative, other interesting examples include Hydro project, where the startup Implicity is using the SNDS data to create a model to predict heart attacks.
The Health Data Hub just launched a second call for projects tailored to AI startups.
The UK opened to the general public an anonymized and heavily aggregated version of data (on the population and on the performance of facilities) from both the NHS healthcare and some private hospitals (but no other private parties). The NHS has therefore taken on the role of centralizing public and private data for the sake of transparency, in a top down approach. This is especially useful for comparing the performance of private and public hospitals.
Germany has also committed to sharing healthcare data through the German Medical Informatics Initiative (MII). By investing €160M by 2021, the German government is supporting the interoperability of IT systems and data integration between the consortia  formed by local hospitals, higher education institutions, businesses and non-university hospitals. Besides interoperability, each consortium is working on specific use cases based on their data. For instance, the HIGHmed consortia is developing an automated early-warning system to support the algorithmic detection of pathogen clusters in hospitals. This is achieved with the help of many public and research institutions, but also with private companies such as Siemens.
Scandinavian countries have a long tradition of leveraging their health data. In Sweden, for example, some data points can be traced back to even before the 60’s, as governments healthcare data are systematically collected and linked to education and income data. These data sets are accessible to research organizations and to some
extent corporates, which enables them to link, at individual level, care received and the long-term socioeconomic outcomes. For example, it’s possible to link successful early intervention with the ability to find employment, obtain a degree, etc.
Sharing healthcare data can bring value by bridging gaps across a range of medical industry stakeholders; however, this value can only be unlocked if the right sharing framework is implemented. As the exchange of healthcare data increases, companies should seize the opportunity to build robust exchange methods, both in terms of data management and technology.
The remaining barriers, ranging from the anonymization of personal and sensitive data (while keeping informational value) to data harmonization, are being broken down thanks to new technologies designed to secure the sharing of healthcare data. The Substra foundation, recently released by Owkin, is a perfect example.
It is based on blockchain technology and uses advanced cryptography and machine learning to allow hospitals to securely share personal data. It is an open source project, so anyone can use this platform.
The value was noticed by governments and momentum around healthcare data sharing is building in Europe, driven by ambitious initiatives. In the short term, we expect national initiatives to continue to increase in scope and volume. Joining ongoing programs would be a way for corporates to explore how to leverage the value of their data, while strengthening ties with tomorrow’s lawmakers.
The launchpad for private actors to initiate healthcare data sharing is set and healthcare players should continue developing competencies on data technology and regulation.
 They should be available free of charge, in machine readable formats and provided via Application Programming Interfaces (APIs).
 Evaluation of the extent to which the desired outcome occurs and what influences it.
 Such as merging several data sets to retrieve personal data.
 U.S. federal law which can compel U.S. based technology companies to provide foreign governments data if asked.
 Health Maintenance Organization are public health services providers and insurers. Every Israeli citizen must subscribe to 1 of the 4 HMOs.
 4 consortia have been created to date: DIFUTURE, HiGHmed, MIRACUM and SMITH; while other may appear in the future