Data Analytics

Data Lakes Take Healthcare Analytics to the Next Level

These days, healthcare organizations are rich in patient data, but how can they make sure they’re getting the best value from it?

Jennifer Zaino

Twitter

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. Her work appears in publications including The Semantic Web Blog, RFID Journal, Smart Enterprise Exchange, and more.

Juliet Van Wagenen

Juliet is the senior web editor for BizTech and HealthTech magazines. In her six years as a journalist she has covered everything from aerospace to indie music reviews — but she is unfailingly partial to covering technology.

With the transition to value-based care underway, making the best use of data to further patient satisfaction is top of mind for healthcare leaders everywhere. Enter the data lake: an architecture that can help providers store, share and use electronic health record and other patient data.

Already, healthcare organizations have begun to tap data lakes with the aim of uniting disparate data from across hospital systems. Phoenix Children’s Hospital in Arizona, for instance, collects and stores information for medication and patient analysis in their data lake, freeing previously siloed data to boost patient care.

“We pulled data from 40 systems — everything from the surgery system to the scanned medical records to the general ledger and payroll,” David Higginson, executive vice president and chief administrative officer at Phoenix Children’s, tells HealthTech. “Anyplace we could tap data, we did.”

The organization’s move to data lakes has helped to effectively shift its culture, creating a data-driven approach to problem-solving. It has also delivered several practical solutions, such as a kidney care dashboard that pools information from multiple systems across the hospital to monitor for kidney injuries that can result from the use of powerful drugs. Moreover, after analyzing more than 750,000 medication orders, the hospital has developed an algorithm to support more accurate dosing.

Phoenix Children’s isn’t alone. Paul Black, CEO of EHR and practice management provider Allscripts, recently told Healthcare Dive how the organization uses data lakes, tapping artificial intelligence, machine learning and human ingenuity to sift out “correlations” hospitals might not see.

Certainly, interest in the technology is growing, as research forecasts the data lakes market to increase at a rate of 28 percent between 2017 and 2023.

So what does it take to make the most of a data lake? The first step is to understand the tool and what it can offer.

What Is a Healthcare Data Lake?

Essentially, a data lake is an architecture used to store high-volume, high-velocity, high-variety, as-is data in a centralized repository for Big Data and real-time analytics.

Healthcare organizations can pull in vast amounts of data — structured, semistructured, and unstructured — in real time into a data lake, from anywhere. Data can be ingested from Internet of Things sensors, clickstream activity on a website, log files, social media feeds, videos and online transaction processing (OLTP) systems, for instance.

A Healthy Data Lake Requires Maintenance

There are no constraints on where the data hails from, but it’s a good idea to use metadata tagging to add some level of organization to what’s ingested, so that relevant data can be surfaced for queries and analysis.

“To ensure that a lake doesn’t become a swamp, it’s very helpful to provide a catalog that makes data visible and accessible to the business, as well as to IT and data-management professionals,” says Doug Henschen, vice president and principal analyst at Constellation Research.

New York’s Montefiore Health System, for example, sees the value of a well-maintained data lake for the large volumes of data it deals with; it links that data to metadata and ontologies, Dr. Parsa Mirhaji tells HealthTech. The health system uses its multisourced, tagged data to support artificial intelligence and deep learning.

Montefiore has created an environment in which researchers can experiment and learn from the data better than they could if they had to rely on just a massive information repository, says Mirhaji, director of clinical research informatics at the Albert Einstein College of Medicine and Montefiore Medical Center-Institute for Clinical Translational Research.

“It requires consistent management of metadata, terminology management, ontology management, linked open data and modeling, as well as the kinds of automated algorithms that can use these resources efficiently to solve difficult problems,” he says.

How Providers Can Get Started with Data Lakes

So how can healthcare organizations begin to build out a data lake? The first step is by adopting the necessary underlying Big Data architectures.

Phoenix Children’s Hospital began the journey by standing up an on-premises Microsoft SQL server to extract, transform and load packages. It also adopted Microsoft’s reporting services application.

But while many providers, such as Oracle, offer both on-premises and cloud data lake solutions, most organizations are turning to the cloud for data lake architectures.

“Cloud-based options — whether provided by cloud providers or software vendors with their own cloud services — are seeing the lion’s share of the growth these days,” says Henschen, adding that many vendors also provide Big Data infrastructure as part of their cloud offerings.

Spark and Hadoop Big Data architecture services, for instance, exist on Microsoft Azure, Google Cloud and other leading cloud services. Cloud providers also can offer advanced security for data lakes.

Another important aspect to beginning a data lake journey is to understand the investment necessary to dive in. It’s a large undertaking, but for many organizations it ultimately is worth it.

“When data volumes start creeping into the tens of terabytes, it’s time to consider something in the lake vein,” says Henschen.

Data Lakes vs. Data Warehouses

Data lakes should not be confused with data warehouses. Where data lakes store raw data, warehouses store current and historical data in an organized fashion.

Data warehouses are best for analyzing structured data quickly and with great accuracy and transparency for managerial or regulatory purposes. Meanwhile, data lakes are primed for experimentation, as organizations can load a variety of data types from multiple sources and quickly engage in ad hoc analysis, explains Kelle O'Neal, founder and CEO of management consulting firm First San Francisco Partners.

“The rapid inclusion of new data sets would never be possible in a traditional data warehouse, with its data model–specific structures and its constraints on adding new sources or targets,” O’Neal says.

Organizations may use both data lakes and data warehouses. The decision about which to use is based on “understanding and optimizing what the different solutions do best,” she says.

polygraphus/Getty Images

Become an Insider

Sign up today to receive premium content!

HealthTech Magazine

Data Lakes Take Healthcare Analytics to the Next Level

What Is a Healthcare Data Lake?

A Healthy Data Lake Requires Maintenance

How Providers Can Get Started with Data Lakes

Data Lakes vs. Data Warehouses

Extending IAM and Zero Trust to All Administrative Accounts

Mergers and Acquisitions: An Overview of Notable Healthcare M&A Activity in 2025

4 Key Aspects That Make AI PCs Attractive to Healthcare Workers

New AI Research From CDW

What Is a Healthcare Data Lake?

A Healthy Data Lake Requires Maintenance

How Providers Can Get Started with Data Lakes

Data Lakes vs. Data Warehouses

More On

Related Articles