Jul 06 2021

How AI Learning Can Protect Patient Privacy and Still Offer Valuable Research

Training artificial intelligence systems in healthcare requires extensive data, which can lead to privacy and regulatory issues. Federated learning is one solution.

Healthcare organizations continue to make great strides in their use of artificial intelligence applications to improve patient care. But for AI systems to produce high-quality algorithms, they need large, diverse data sets.

Compiling these data sets can be a challenge because of regulatory and ethical obligations that restrict access to patient data. These obligations can lead chief medical informatics officers to adopt policies that forbid healthcare data from leaving an organization.

Reluctance to compile large data sets, driven by the increasing risks of financial penalties and reputational damage, clashes with the rapidly growing interest in creating and deploying medical AI models within healthcare. New solutions are needed that enable the training of AI models while also protecting patient privacy.

Training Needed for AI in Medical Imaging

The amount of training data in medical imaging, especially publicly available data, is a fraction of what is available in other fields. The shortage of curated and representative data sets is one of the largest impediments to developing meaningful AI solutions for medical imaging, and the protection of patient privacy adds to the difficulty.

Recently, companies such as NVIDIA and Google have created software tools to enable data-distributed techniques for training AI. One example is federated learning, which works by deploying AI models to each participating institution in a discrete group (or “federation”). Models are then trained individually at each institution through exposure to local data. During training, models are periodically sent to a central federated server, where they are aggregated together. The aggregated model is then redistributed to each institution for further training. This is the key step in preserving privacy, as the models themselves consist only of parameters that have been tuned to data, not the protected data itself.

Over time, this process allows AI models to receive the benefit of knowledge learned at every institution within the federation. Once training is complete, a single aggregated model is produced that has been, indirectly, trained on data from all institutions in the federation.

MORE FROM HEALTHTECH: What’s next for AI in healthcare?

Signs of Promise for Medical AI

In a study published this year, our team in the UCLA Computational Diagnostics Lab investigated using a federated learning architecture to train a deep-learning AI model to locate and delineate the prostate within MRIs using data from different institutions. We found that ­federated learning produced an AI model that worked better on data from the participating institutions and on data from different institutions compared with models trained on one participating institution’s data alone.

To understand the enthusiasm for federated learning, consider if organizations in a federation were smartphone users who had agreed to allow an algorithm to analyze the images on their phones, potentially allowing model training with distributed compute and data from a vast user group. From this perspective, one can imagine analogous scenarios in medicine in which patients can opt in to federations for compensation. This could speed up innovation within the medical AI space.

Successful medical AI algorithm development requires exposure to a large quantity of data that is representative of patients across the globe. Our findings demonstrate an alternative to the financial, legal and ethical complexities this has posed: Institutions can team up into federations and develop innovative, valuable medical AI models that can perform just as well as those developed through the creation of massive, siloed data sets, with less risk to privacy.

metamorworks/Getty Images