Apr 10 2024
Data Analytics

How Data Engineering Helps Health Systems Reach Their Most Critical Goals

Data engineering is a growing requirement for healthcare organizations as they work toward value-based care models, health equity and population health.

Healthcare organizations generate millions of pieces of data, and clinicians’ need to pull meaningful insights from that data as it continues to increase. That’s where data engineers come in. They can design tools to expedite a broad array of processes and help health system leaders improve efficiency, patient safety and healthcare delivery.

Data engineering solutions rapidly process huge volumes of data generated from complex inputs, such as on-premises software and cloud-based computing, while optimizing systems such as XML, JASON, API Call, HL7 and Parquet to facilitate the organization, analysis and HIPAA-secured sharing of patient data. In addition to collecting this information, data engineers standardize and improve data quality, and integrate the aggregate data to produce meaningful insights that can enhance a hospital’s clinical and operational performance.

While many organizations will focus their analysis primarily on federally required compliance and patient safety, data can also be deployed to identify trends in the population, improve the quality of care, lower the cost of services, predict case volume and staffing needs, and drive institutional decision-making. Data analysis can also be useful in determining appropriate inventory levels, forecasting medical supply demands and reducing supply chain costs. In the dynamic and competitive healthcare marketplace, effective data utilization can be the differentiating factor in successful strategic plans across all healthcare functions.

Click the banner below to learn how a modern data platform supports decision making.


Data engineering in healthcare is constantly evolving to create solutions for the industry’s most pressing issues: ensuring the safety of frontline workers while maintaining adequate staffing levels during the pandemic; moving toward value-based care models; health equity; and revenue cycle management. In each case, data engineering can play a significant role in developing solutions by integrating data from electronic health records, payer claims and tools that present a comprehensive view of patient information. The following use cases illustrate how data engineering applications are transforming healthcare.

Data Engineering Supports Population Health Strategies

Creating an effective population health strategy requires an organization to take a data-driven approach and study both internal and external data, especially from other providers and payers outside the organization that are ready to share data through clinically integrated network (CIN) contracts. However, integrating external data from multiple sources presents several challenges related to data quality, data feed format, cadence, standard codes, onboarding data from a new payer into the organization’s enterprise data, and more.

Our health system’s team developed data engineering solutions to overcome these challenges, including:

  • Data quality services that helped to standardize the data and bring payer data into a uniform structure
  • Data integration mappings developed to feed the payer data into the standard structure and inject them into the enterprise data warehouse
  • Quality checks on each layer of data driven through rule sets configured in the database for each payer’s data

This enterprise data was later used to analyze and bundle various clinical procedures to reduce the cost to patients without affecting the quality of service.

The data engineering solution enabled our health system to achieve strategic goals for improving population health. Armed with data, hospital leaders were able to analyze and compare different providers’ costs for procedures for specific diagnoses. Their analysis allowed for certain costs to be reduced, making healthcare more affordable for the population. In some cases, it also generated rebates to providers who were part of the CIN. 

RELATED: What are digital twins and how can they be used in healthcare?

The solution also made it possible to process payer data rapidly and enabled the parallel processing of multiple sets of payer data. The uniform structure that was created to store all payer data helped to onboard a new payer faster and made it easier to merge that new payer data with existing enterprise data to make it available for dashboards and analytics solutions. Furthermore, the quality checks identified issues well ahead of any potential problems, expediting the injection of quality data into the analytics system. With this design, onboarding any new payer data into the population health system can be achieved in two weeks.

Another benefit of the population health solution is how it builds and feeds critical data from other programs into our internal analytics system. With the ability to integrate this information and create analytics internally, we eliminated the need for external vendors and reduced data costs to the organization.

Data Engineering Was Key for Staff Management During the Pandemic

One of the many challenges hospitals faced during the pandemic was keeping clinicians safe as they worked on the front lines to care for patients. As soon as the first COVID-19 vaccine became available, our organization found that it needed a quick and efficient way to organize and prioritize clinical staff and to develop a system to administer the vaccine to those who were most responsible for patient care.

Data integration was used to assist in providing the vaccine by integrating cloud and on-premises systems with external systems. Using Informatica Cloud tool, we built extract, transform and load (ETL) mapping to connect to the Workday human resources program. This allowed us to obtain and prioritize a list of eligible clinicians and critical administrative staff to build a data repository. 

ETL also checked the designated personnel files for vaccine consent forms and scheduled vaccine administration in accordance with the prioritization. The ETL was quickly designed with a change data capture feature, which checked the system for new consent forms and added those personnel into the scheduling list. Once they were approved by the system, these individuals were automatically notified about their vaccination schedule through email and text messages using the command tasks built in the Informatica Cloud system.

Additionally, as illustrated in Figures 1 and 2, a data integration service can further integrate with the absentee system to keep track of staff who had tested positive for COVID-19, feeding their quarantined days into a reporting layer that populated vaccine data dashboards. This functionality helped management monitor our clinicians’ health and make quick decisions on workforce availability so they could recruit frontline workers as needed.

Figure 1

Figure 1: Dashboard report helps hospital leaders quickly see vaccine status for clinical staff. (This chart was created from sample data and does not reflect any real metrics or actual healthcare organization.) Source: Suresh Munuswamy


Figure 2

Figure 2: Dashboard report displays quarantine data for clinical staff. (This chart was created from sample data and does not reflect any real metrics or actual healthcare organization.) Source: Suresh Munuswamy


Monitoring Critical Care Patients’ Health with Data Engineering

Healthcare organizations must contend with the huge volume of data generated by patient monitoring devices, registrations, document scanning and EHR systems. One example of an effective use case is how a data engineering application enabled an alert to a cardiac specialty team. The team wanted to closely monitor patients who had recently undergone heart surgeries, such as coronary artery bypass grafting or surgical valve and transcatheter aortic valve replacement, to improve their quality of care and obtain data for their research.

The challenge was dealing with high volumes of both real-time and historical data. The Informatica command task notified physicians in the cardiac team via email and cellphone text, based on a patient’s surgical event. This solution was designed with dynamic rules and parameters with reusable capabilities so that an organization can add more procedures to the data collection or deploy the same application to monitor other health issues.

READ MORE: How can increased data sharing improve health outcomes?

Improving the Patient Experience with Data Now and in the Future

In the age of value-based healthcare, hospitals have an imperative — and a financial incentive — to improve the patient experience. Data engineering enabled a robust and scalable approach for one provider that was looking for a better way to capture and analyze patient experience data.

The organization used Informatica application integration service and Rest API calls through the data integration service to integrate various cloud systems, including the Press Ganey patient satisfaction survey system, the EHR and supply chain management. These data inputs were processed and streamlined in Informatica Cloud, which then converted the data into a Parquet file format and directly published it into Microsoft Azure cloud blob storage.

The solution helped to process the data in an optimized and automated way, without manual intervention. Once data was published in blob storage, it was fed directly into the Azure machine learning (ML) model, which helped the organization review the end-to-end journey of patient and prediction, providing data for effective decision-making about improving patients’ experiences and satisfaction survey scores.

DISCOVER: Enhance data-driven healthcare with artificial intelligence.

As these use cases demonstrate, data engineering can provide a variety of solutions that help healthcare organizations implement a diverse range of strategic plans. Data engineering tools are now powered by generative artificial intelligence, which supports increased productivity and automation. The intelligent structure model allows systems to process a wide variety of complex data formats, from Excel to HL7. This strong tool can process any external data from any new vendor and offers organizations the ability to scale solutions as needs arise.  

Data engineering simplifies the data feed and automates data in AI and ML models to help healthcare leaders analyze their organization’s performance, analyze trends, make evidence-based decisions and predict future population health needs. Future data engineering solutions with generative AI and large language models will give organizations the tools to accelerate the application of new data strategies with confidence that their healthcare data is secure. The possibilities for data engineering uses are limited only by our imaginations.

Wasan Tita/Getty Images

Become an Insider

Unlock white papers, personalized recommendations and other premium content for an in-depth look at evolving IT