Close

Join the Insider Program

Explore exclusive HealthTech coverage and enjoy early access to the latest stories.

May 08 2023
Data Analytics

How to Navigate Structured and Unstructured Data as a Healthcare Organization

Providers generate roughly 137 terabytes of data every day. Most of it is unstructured data, which can be hard to work with in healthcare if there is no data management plan in place.

It’s no secret that healthcare is a data-driven business. The average hospital produces roughly 50 petabytes of data every year. That’s more than twice the amount of data housed in the Library of Congress, and it amounts to 137 terabytes per day. Due in large part to the proliferation of medical devices, genetic testing and patient-generated health data, coupled with near-universal use of electronic health record systems, the amount of data generated in healthcare has been increasing at a rate of 47 percent per year.

As organizations generate and collect medical data, they need to be mindful of how it will be stored, secured, analyzed and shared with other healthcare entities. Each of these steps is vital for managing populations, caring for individual patients, and monitoring service utilization and costs, but each also poses challenges. This is especially true if data is in an unstructured format and needs to be processed, or “normalized,” before it’s readable by machines.

Click the banner below to learn how a modern data platform supports decision making.

Defining Structured vs. Unstructured Data in Healthcare

Structured data tends to be quantitative — that which can be easily formatted for a database and, as a result, easily plugged into analytics and decision support systems. Structured data in healthcare consists of demographic information (first and last name, date of birth, home address, gender), vital signs (height, weight, blood pressure, blood glucose), and data elements such as diagnostic or billing codes, medications and laboratory test results.

Unstructured data, by contrast, is undefined in its native format. Examples of unstructured data in healthcare include medical images and written narratives (clinical notes, problem lists, discharge summaries and radiology reports).

Most healthcare data is unstructured — 80 percent, in most estimations. This is largely because medical imaging data makes up 80 percent of all clinical content, according to NetApp. A single chest X-ray may be 15 megabytes, but a 3D mammogram can be 300 megabytes and a digital pathology file can be 3 gigabytes, roughly the same as a high-definition full-length movie.

EXAMINE: What is the role of data governance in healthcare?

The Challenges of Working with Unstructured Data in Healthcare

Unstructured data is immensely valuable to healthcare. “If you approach it from a high level, clinical notes are a glimpse into the physician’s brain,” says Brian Laberge, solution engineer at software and solutions provider Wolters Kluwer.

In addition, written notes often capture the severity of a patient’s health condition or nuanced nonclinical social needs far better than highly structured diagnostic codes, he adds.

Clinical and administrative staff can easily parse free text for relevant information, such as a diagnosis or a treatment recommendation. The difficulty stems from what comes next.

“The storage requirements are more than a few rows of data from diagnostic codes and demographic information,” Laberge says.

This is both a blessing (because unstructured data doesn’t need to be stored in a data warehouse with a rigid structure) and a curse (because organizations can amass unstructured data quickly and haphazardly).

The proliferation of unstructured data in healthcare can pose data retention, purging and destruction challenges as well. The issue isn’t the amount of data that must be stored and the length of time it must be stored; instead, it’s where it has been stored and what has been stored, Laberge says.

For example, organizations commonly purge medical records that are inactive or delete research data sets once a study has been completed. With these unstructured data types, he says, “It’s not just a single database that you’re deleting. There are likely more files out there, and there’s metadata associated with them.”

Working with Patient-Generated Health Data

Patient-generated health data comes with its own set of concerns. While it may be available in real time from sources such as monitoring devices or digital therapeutics applications — and it may be structured in its own right — most of it is only transferrable into EHRs as unstructured summary reports, notes Natalie Schibell, vice president and principal analyst at Forrester. (The same is true of visit summaries that come from urgent care, retail health or telehealth providers not affiliated with a health system.)

In these situations, the valuable nuance of the summary document is largely lost. That doesn’t provide a complete picture of a patient’s health, which makes it difficult for health systems to analyze their vast data stores and see which patients need more attention, Schibell says. It also contributes to wasteful spending, as physicians without readily available results will simply order another test. “There’s a big risk in duplicative and disruptive care,” she adds.

DISCOVER: How modern data platforms can boost healthcare agility.

Six Steps to Making Unstructured Data More Meaningful in Healthcare

The American Hospital Association has suggested that now’s the time for hospitals to transform themselves into data-driven organizations. This will improve clinical and business decision-making, the AHA said, while also helping hospitals better serve their patients and their communities in times of need.

Becoming a data-driven organization depends on the ability to derive meaning from unstructured data. While this is a tall order for many health systems, there are a few key steps organizations can take to move forward.

  1. Optimize storage: Organizations should look at where data is stored as well as how those storage arrays are synced and distributed. Anything that can be migrated to the cloud should be. This will free up space onsite for the most recent and relevant data.
  2. Classify data: Data should be structured into groups based on how it will be used, who needs to access it, what level of confidentiality it needs, and what security policies apply to it. It’s also critical to look at the format of the data and determine whether it can, in fact, be structured.
  3. Bring order to unstructured data: If unstructured data has clinical or business value, it will benefit from normalization, which aims to make it look more like structured data. “Given the sheer volume of this data, you can’t do it manually,” Schibell says — but artificial intelligence and natural language processing can help.
  4. Look for context: NLP alone is insufficient for normalizing unstructured data, Laberge says. A clinical note may include the word diabetes, but that doesn’t automatically mean a patient has diabetes. The physician may have recorded that the patient doesn’t have diabetes, or that the patient’s father has diabetes.
  5. Code to industry standards: Once the context of data is understood, organizations should code as much information as possible to applicable industry standards such as ICD-10 or SNOMED. This helps bring structure to unstructured data, which makes it readable — and useful — for analytics and machine learning models.
  6. Give guidance to data science: Many data scientists don’t have a clinical background and may not know, for example, that a Type 2 diabetes diagnosis can be expressed using one of nearly two dozen ICD-10 codes. Clinical teams should provide data science teams with appropriate guidance before they dive into a data set, Laberge says.

As with many large-scale technology initiatives, the secret to success with unstructured data in healthcare is a well-defined scope and use case, Laberge says. Instead of trying to boil the ocean, organizations should focus on a key business metric or other quantifiable area of improvement.

“You need clarity about what you want to get out of the data you have,” says Laberge.

UP NEXT: Unlock this data practice set for modern data platform success.

sanjeri/Getty Images