If you only had the headlines to go on, you’d be left thinking that Artificial Intelligence (AI)-driven health technology is at its genesis, as the world scrambles to regulate this new breed of algorithm that seems to have arrived out of nowhere.
In reality, AI engines have been powering health technology for decades. Think Computer Aided Detection (CAD), for example.
We can thank the advent of machine learning (ML) and deep learning (DL) algorithms and their rapid proliferation in health tech. I don’t plan on getting into the differences between machine learning and its subtype, deep learning, in this blog. If you’d like a refresher, NVIDIA provides a good overview . Suffice it to say ML/DL algorithms represent a departure from their physics-based algorithmic cousins in that they can be trained. In imaging analytics, for example, this means that ML/DL models “learn” to identify patterns within the pixel data (and sometimes metadata) by analyzing scores of imaging samples that together paint a representative portrait of the intended patient population. Training concludes once the algorithm can effectively accomplish the task it set off to do, whether that was to accurately predict the likelihood a patient may develop breast cancer, to flag a suspicious lesion, to prioritize a potentially time sensitive exam in radiologist’s worklist, or to characterize tissue features in dense regions of the lung, like VIDA Insights.
Given the proliferation of these new manifestations of AI and the relative dearth of information in circulation about how major regulators like the U.S. FDA intends to regulate AI, the attention now being placed on its regulation by regulators, public, and industry is welcome news.
Don’t get me wrong. It’s not exactly a wild west out there today, despite what you may be hearing. AI is not escaping regulation in today's regulatory environment. The objective of this renewed focus on ML/DL regulation is geared towards introducing a regulatory framework that caters to the unique qualities and potential of ML/DL while addressing their unique challenges.
Industry presently benefits from multiple routes to bringing ML/DL algorithms to market. In fact, they are precisely the same routes that are available to other medical devices, whether software-only, hardware, or some combination of the two. They include the 510(k), De Novo, and PMA routes. And if the risk profile permits it, some DL/ML may not require FDA oversight at all, just like some types of clinical decision support software, scalpels, basic DICOM image viewers, and other medical devices that carry a low risk of patient harm when put into qualified hands.
But these pathways raise an important question: how do these generalist pathways to market clearance address the unique challenges of transparency in ML/DL algorithms?
ML/DL-based algorithms have a reputation for behaving like “black boxes,” which is to say visibility into how they arrive at a particular conclusion is restricted, raising questions about transparency and trust.
But transparency does not reduce to the degree to which an algorithm is explainable.
While the reduction of explainability may hinder transparency, a combination of other factors reinforces it. Here’s why patients, care providers, industry, and regulators are learning to trust AI in today’s regulatory landscape.
Bias is a major topic in machine learning today. That’s because training and testing datasets that fail to represent the intended patient population may work well on certain patients in certain regions, while sub optimally on others elsewhere. With biased data sets, inequitable provision of medical care risks creeping in.
The importance of countering bias is not lost on industry leaders.
To preserve the generalizability of algorithm results, skilled developers employ large, independent training and testing data sets that are representative of the intended patient population. Now there may be times when developers need to stray from distributions in their data sets that approximate distributions in the intended patient population, but with good reason. Some parameters, whether age, sex, disease severity, weight, or some other factor, may pose special challenges that make achieving desired performance levels on exams that exhibit those features difficult when sticking with distributions that strictly mimic the patient population. To get around this, developers enrich their data sets with additional examples of those special cases so that through additional training, the algorithm can push past those hurdles that stand in the way of achieving optimal performance levels.
Incidentally, it is this very focus on establishing representative data sets by industry that opens up the potential for standardizing care, yet in a highly personalized manner. Standardization and personalization may seem like contradictory terms, but they are not. At the risk of putting this too simplistically, the union of standardization and personalized care puts healthcare on track to ushering in a new era of healthcare where the same patients who exhibit the same set of conditions can expect to receive the same care, regardless of who sees them. ML/DL-based algorithms represent an important part of this equation because of their aptitude for generating the same results for the same conditions no matter how many times the algorithm is run against the same set of data, whether a CT image, voice recording, or the like. For those of you familiar with the challenges of inter-reader variability, the importance of this quality of AI is unlikely to be lost.
But back to the challenge of transparency in AI.
While algorithms are not explainable, the ground truth against which the algorithm’s performance is often compared in product testing is verifiable. When the standard of care entails visual assessment of a health condition by a qualified radiologist, say, such as detection of suspicious lesions in a mammogram, developers can compare the algorithm’s performance to the radiologist’s performance (or a consensus of radiologists when inter-reader variability is a concern) to demonstrate equivalent performance (or better!). Despite struggling to know how the algorithm arrived at a particular conclusion, we can still have confidence in the conclusion itself, knowing that it aligns with the conclusion of a qualified physician.
Of course, this approach does not do much for transparency without sharing it, and share it regulated developers do through the course of a 510(k) submission, first, and then, with the customer base following clearance, who, are just as keen as patients and regulators to confirm the performance attributes of any new technology that they adopt, not just ML/DL algorithms.
So, while the “how” of an algorithm may be sometimes opaque, rigorous training and testing based on large, representative samples go a long way towards providing regulators, healthcare providers, and patients with confidence in the results.
But trust is earned another way, too.
ML/DL algorithms are typically designed to support verification by the qualified user. If the algorithm claims a patient has honeycombing in the left lung, say, the physician can confirm this with a quick glance at the original CT exam. The more and more the algorithm produces a result that is in alignment with the physician’s assessment, the more trust builds. This is not to say that algorithms are perfect. Like physicians themselves, they don’t always get it right. But when they don’t, the physician is there to overrule the finding. With many AI technologies today, inaccurate results are obvious, keeping physicians firmly in the driver’s seat.
AI, machine learning in particular, pose unique challenges, like their explainability problem, as we discussed above. But these challenges are dwarfed by their potential contribution they can make to healthcare, and old ways of regulating AI just aren’t compatible with the potentiality of machine learning. So, if we are on track to solving the puzzle of transparency in algorithm development, and I think we are, based on some of the points raised earlier and the growing number of frameworks that are beginning to surface that prescribe practical methods for developing explainable, quality algorithms (here's one from Xavier Health), then an important focus of regulators must be: how do we facilitate the introduction of safe and effective AI technologies to market?
Towards this end, FDA, as many of you may now be aware, released an action plan at the beginning of the year that steps through how the Agency plans to evolve AI regulation forward to ensure safe and effective solutions continue to be released to market, while recognizing the unique potential of AI to adapt and improve with unprecedented speed.
FDA has so far signaled that they plan to proceed with the Pre-Specification and Predetermined Change Control Plan they first introduced in their discussion paper that preceded their action plan. While the pre-spec steps through anticipated modifications to the algorithm over time, the change control addresses how those modifications will be introduced in a controlled manner. This opens up the possibility for introducing changes outside of, say, a 510(k) that may have otherwise required pre-market notification based on an interpretation of FDA guidelines on the 510(k) decision process for software devices. If FDA agrees with a developer's Predetermined Change Plan, product improvements can be more swiftly introduced to market without compromising safety. Confirmation that FDA intends to introduce these concepts is welcome news not just to developers, but to patients, too, who stand to benefit from rapid (albeit controlled) product iterations the most. The improvement that was introduced to market following an accelerated pathway might just be the improvement needed to catch that cancer at its earliest, most treatable stage.
FDA also unveiled its plan to forge ahead with establishing a set of Good Machine Learning Practices (GMLP) to give industry and other stakeholders greater clarity with respect to training and testing expectations. While frameworks are currently in place to regulate AI, like AI in CAD, which can often be used to inform how to approach designing and testing machine learning-based technologies in other contexts, standardization is lacking. Developers are not necessarily measured using the same yardstick (all other factors considered equal), in other words - and with expectations sometimes ambiguously defined, current approaches risk delaying market entry. Earlier access to safe and effective algorithms translates into better patient outcomes, as we have seen. GMLP, because it has the real potential to standardize and clarify expectations, may support earlier access at lower costs. This news has made many in industry cautiously optimistic.
FDA also indicated that they intend to redouble their efforts to promote transparency, given AI’s unique qualities, some of which were highlighted earlier. While many developers already make information about training/testing data and algorithm performance to their end users, FDA aims to formalize these requirements. If enacted, we hope this move will further ease concerns about transparency and bias and improve trust in AI-based software.
So, these are the positive steps FDA is taking to update regulatory frameworks for AI/ML technologies as a result of the influx of machine learning algorithms in healthcare and advances in algorithm development.
Less positive, perhaps, and something of a shock, really, if I’m to be honest, was FDA’s announcement that the Agency intends to waive 510(k) requirements for some kinds of software devices.
Without FDA oversight, the burden shifts to patients and healthcare providers to weed out unvalidated technologies. Without FDA oversight, can patients and healthcare providers count on having visibility into training methodologies, testing data set distributions, and performance data? How about mitigations for usability-oriented hazards that are prevalent in AI, like automation bias? Will the pool of high-quality devices be diluted?
For lower risk devices, these considerations may not matter much. But some of the devices FDA plans to deregulate carry greater risks, like algorithms that operate in the background, quietly flagging exams with potentially time-sensitive findings. With patients advocating for greater transparency in AI, and FDA signaling that they have been listening to their and other stakeholders' concerns with their action plan, this move is raising eyebrows. The reasoning for deregulating the software offered by FDA--that the dearth of adverse event reports about these technologies justifies their deregulation--is also causing consternation, since many of the technologies that FDA is proposing to deregulate are new. Is it possible that reports just haven't been given the time to accumulate?
It’s anyone’s guess where FDA will land on this. Time will tell. Regardless, AI is here to stay. And with the right checks and balances in place, that’s a good thing. The benefits of AI technology to patients and healthcare providers could fill the internet. Or at least another blog.