Phoenix Children’s isn’t alone. Paul Black, CEO of EHR and practice management provider Allscripts, recently told Healthcare Dive how the organization uses data lakes, tapping artificial intelligence, machine learning and human ingenuity to sift out “correlations” hospitals might not see.
Certainly, interest in the technology is growing, as research forecasts the data lakes market to increase at a rate of 28 percent between 2017 and 2023.
So what does it take to make the most of a data lake? The first step is to understand the tool and what it can offer.
What Is a Healthcare Data Lake?
Essentially, a data lake is an architecture used to store high-volume, high-velocity, high-variety, as-is data in a centralized repository for Big Data and real-time analytics.
Healthcare organizations can pull in vast amounts of data — structured, semistructured, and unstructured — in real time into a data lake, from anywhere. Data can be ingested from Internet of Things sensors, clickstream activity on a website, log files, social media feeds, videos and online transaction processing (OLTP) systems, for instance.
A Healthy Data Lake Requires Maintenance
There are no constraints on where the data hails from, but it’s a good idea to use metadata tagging to add some level of organization to what’s ingested, so that relevant data can be surfaced for queries and analysis.
“To ensure that a lake doesn’t become a swamp, it’s very helpful to provide a catalog that makes data visible and accessible to the business, as well as to IT and data-management professionals,” says Doug Henschen, vice president and principal analyst at Constellation Research.
New York’s Montefiore Health System, for example, sees the value of a well-maintained data lake for the large volumes of data it deals with; it links that data to metadata and ontologies, Dr. Parsa Mirhaji tells HealthTech. The health system uses its multisourced, tagged data to support artificial intelligence and deep learning.
Montefiore has created an environment in which researchers can experiment and learn from the data better than they could if they had to rely on just a massive information repository, says Mirhaji, director of clinical research informatics at the Albert Einstein College of Medicine and Montefiore Medical Center-Institute for Clinical Translational Research.
“It requires consistent management of metadata, terminology management, ontology management, linked open data and modeling, as well as the kinds of automated algorithms that can use these resources efficiently to solve difficult problems,” he says.