The FDA’s New Release: Good Machine Learning Practice for Medical Device Development - Guiding Principles

The USA Food and Drug Administration (the FDA) has just published principles for Good Machine Learning Practice for Medical Device Development (GMLP Principles). This publication follows the Artificial Intelligence/ Machine Learning (AI/ML) based software as a Medical Device Action Plan published in the beginning of the year, and is a joint publication of the FDA, Health Canada and the United Kingdom Medicines and Healthcare Products Regulatory Agency.

The GMLP principles provide developers with recommended, non -mandatory, rules to implement when developing AI/ML in the healthcare sector. There is no doubt developing AI/ML in the unique and challenging area of healthcare will require further adjustments.

The GMLP principles are:

The Total Product Life Cycle Approach

In-depth understanding of a model’s intended integration into clinical workflow, and the desired benefits and associated patient risks, can help ensure that ML enabled medical devices are safe and effective. This will require incorporating the medical point of view during the development process more closely.

Good Software Engineering and Security Practices Are Implemented

Good software engineering practices, data quality assurance, data management, and robust cybersecurity practices are needed. These practices include methodical risk management and design process. This principle seems to refer to acceptable medical device regulatory requirements of quality management and risk management.

Clinical Study Participants and Data Sets Are Representative of the Intended Patient Population

Data collection should ensure that the relevant characteristics of the intended patient population (for example, in terms of age, gender, race, and ethnicity), use, and measurement inputs are sufficiently represented. This is important to prevent or mitigate any bias.

Training Data Sets Are Independent of Test Sets

Training and test datasets are selected and maintained to be appropriately independent of one another. All potential sources of dependence, including patient, data acquisition, and site factors, should be addressed.

Selected Reference Datasets Are Based Upon Best Available Methods

If available, accepted reference datasets in model development and testing, demonstrate model robustness and generalizability across the intended patient population. In certain areas of health, this principle raises difficulty as the existence of an accepted reference standard is not clear.

Model Design Is Tailored to the Available Data and Reflects the Intended Use of the Device

The model design supports the active mitigation of known risks, like overfitting, performance degradation, and security risks. The clinical benefits and risks related to the product are well understood, and support that the product can safely and effectively achieve its intended use.

The Performance of the Human-AI Team

Where the model has a “human in the loop”, human factors considerations and the human interpretability of the model outputs are addressed.

Testing Demonstrates Device Performance During Clinically Relevant Conditions

Device performance should be evaluated in clinically valid conditions and independently of the training data set.

Users Are Provided Clear, Essential Information

Such as the product’s intended use and indications for use, and known limitations. Users are also to be made aware of device modifications and updates from real-world performance monitoring, the basis for decision-making when available, and a means to communicate product concerns to the developer.

Deployed Models Are Monitored for Performance and Re-training Risks Are Managed

Monitoring in the real world is important for improving safety and performance. Additionally, when models are periodically or continually trained after deployment, there are appropriate controls in place to manage risks of overfitting, unintended bias, or degradation of the model (for example, dataset drift) that may impact the safety and performance of the model as it is used by the Human-AI team.

The GMPL principles are a starting point for creating a framework for implementing AI in medical devices and are still subject to change and development. Although these principles seem to be based on common AI practices in other sectors, the implementation in the health sector is not without difficulties and requires an in depth and holistic view of the product and its analysis from a data-scientific, medical and regulatory perspectives.