Machine learning is like a smart, experienced but inarticulate colleague. It excels at classification in healthcare, but when it comes to treatment, a disconnect exists between the “what” and the “why.”
Software built on machine learning algorithms can digest millions of images of benign moles and melanomas, and on its own extract the key features that differentiate the two. Based on this learning, in the future, the software can recognize what is benign and what is not. The problem is that even though the software knows why one image is benign and the other is not, it cannot verbalize the reason.
But new consensus guidelines for designing clinical decision support (CDS) software are emerging, and they aim to leave the healthcare professional in control of the decision-making.
Part of the impetus for these guidelines is new federal legislation that defines the scope of software regulated by the U.S. Food and Drug Administration. In the 21st Century Cures Act, passed in December 2016, Congress specified that the FDA will not regulate CDS software so long as the software enables healthcare-professional users to independently review the basis of the recommendations. Thus, one of the key design features for unregulated software is that it reveals the basis for the recommendation, including the underlying clinical intelligence.
An understanding and knowledge of the clinical intelligence allows the user to do two things: assess the quality and reliability of that intelligence, and mentally compare notes to understand why the software is offering a particular recommendation. Doctors who disagree with the software will need to decide which recommendation to follow — theirs or the software’s. Keeping the doctor in control of the decision-making reduces the risk that an erroneous recommendation by the technology would hurt the patient, in part by forcing the doctor to reconcile any differences in opinion.
By way of example, say a physician examines a patient named Ms. Smith, analyzes her symptoms, and concludes that Ms. Smith should take Drug A. As an extra quality measure, the physician decides to consult a CDS program that uses both machine learning as well as expert systems, such as medical society guidelines. After analyzing Ms. Smith’s electronic health records and symptoms, the software recommends Drug B.
Conceivably there are many reasons why the software might recommend Drug B. Perhaps it:
- Notices in the EHR that the patient is allergic to Drug A
- Is following hospital-created guidelines that recommend Drug B first because it is cheaper
- Discerns based on most recent clinical trial results that Drug B is likely more effective based on the presence of a genetic marker
- Is using an outdated database that doesn’t include Drug A
- Simply notices, through machine learning, an association between patients who have similar symptoms to Ms. Smith and improvements of those symptoms when taking Drug B, but the confidence level is only 50 percent
If the software simply tells the doctor to use Drug B for Ms. Smith, with no explanation, the physician is left to guess why that might be the case. The reconciliation between the doctor’s own opinion and the software’s recommendation produces the real learning.
In the example above, without any clues, that reconciliation may be difficult. Reverse engineering is not always possible. But it would make a substantial difference to the physician to know which reason serves as the basis for the software’s recommendation. If the issue is a drug allergy, the doctor can investigate to confirm whether the allergy information for the patient is correct. If the software does not contain information on Drug A, the doctor can comfortably decide what to recommend based on her or his additional knowledge. If the issue is cost, the doctor will be able to work with the patient to decide what to do.
A software recommendation with no explanation puts the physician in a quandary. Indeed, from a malpractice standpoint, if the doctor ignores the software’s recommendation and the patient does not do well, the doctor could be second-guessed in court.
How Can Machine Learning–Based CDS Developers Address the Black Box?
Biomedical research scientists are working to address the challenge of articulating machine learning models in a clear and concise manner.
In the meantime, there are five key steps developers can take.
Explain what can be explained. Don’t make the problem bigger than it has to be. If the software is actually a blend of expert systems and machine learning, and if a particular recommendation is based on expert systems, such as simply looking up the drug allergy in the patient’s EHR or recommending a treatment because it is cheaper, the recommendation ought to reveal that reason.
Communicate the quality of the machine learning algorithms. When the source is truly machine learning, the software needs to reveal that source, along with information that will help the user gauge the quality and reliability of the machine learning algorithm. Through a page in the user interface that can be periodically updated, the developer could explain to the user the extent to which the system has been validated and the historical batting average of the software. That context helps the user understand the reliability of the software in general.
Describe the data sources used for learning. Providing a thorough explanation of the data sets used to feed and test the machine can provide important context and assurance to the clinician.
State the association as precisely as possible. With machine learning, really what we are seeing is an association — something in the patient-specific information triggers an association to what the software has seen in other cases. Even though the software can’t articulate exactly what it is about the data that triggers the association or even what features it looked at, that doesn’t make it any different than a radiologist who points to a place on an image and says “I’ve seen that before, and it’s been malignant.”Much of what we “know” in medicine is really just associations without a deeper understanding of a causal relationship. Software built on machine learning needs to explain that it has spotted an association, and state as precisely as it can the details of that association. In the case of Ms. Smith, the software might note that there is an association between patients with X, Y and Z symptoms, as well as improvements in those symptoms when taking Drug B.
Convey the confidence level. While software based on machine learning does a miserable job of explaining the clinical logic it follows, machine learning excels at communicating its confidence level in reaching a particular recommendation. And that’s quite valuable. That information helps the user decide how much deference the user should give a particular recommendation.
Some argue that in the future we won’t need to rely as much upon individual physicians at the point of care, but rather we should use committees of experts to rigorously review and approve software for use.
Machine learning is emerging as an undeniably useful tool in the healthcare realm. However, in order to close the gap between physician- and software-recommended treatments, it will be essential to either get the machine learning system to provide context for its recommendation or validate the software through a third party.