Certification and Foundations of Safe Machine Learning Systems in Healthcare

A Project funded by the Carl Zeiss Foundation (2022 - 2028)



Machine learning (ML), in particular deep learning, has advanced many areas such as computer vision, natural language processing, and speech recognition to an unprecedented level of performance. Deep learning based systems have also been applied in healthcare where they have been able to produce diagnostic decisions of similar or better quality than the respective clinical experts. Up to now, the enormous potential of improving quality of decision making in healthcare using ML has not been fully leveraged. In particular, there are serious concerns when using ML in safety-critical systems such as medical applications. Modern deep learning is lacking in terms of transparency, privacy, robustness, reproducibility and reliability. Moreover, it has been shown that ML-systems can be unfair and have undesired biases due to the choice of the training data or the learning algorithm itself. In addition to the above concerns, ethical questions regarding responsibility as well as potential undesired feedback loops need to be addressed when using ML based decision-making systems in healthcare. Finally, the regulatory process of ensuring the safe application of ML-systems in healthcare is only in its beginning - on a national and international level.


The main goal of this proposal is to enable the beneficial use of ML in healthcare through research in the foundations of safe ML-systems and the development of protocols and automatic tools for their certification. In order to align the methodological research with the specific problems, needs and requirements in medicine, we apply and develop the techniques in the context of medical prototype applications. We aim at a holistic treatment of the safety problem of ML-systems which involves the following properties and problems:

  • Robustness of decisions under natural and adversarial modifications of the input
  • Transparency of decisions via explanations targeting experts clinicians as well as patients
  • Reliability of decisions via uncertainty quantification (potentially triggering human intervention when needed) and performance generalisation guarantees
  • Fairness of decisions by avoiding bias and/or at least making biases fully transparent
  • Privacy of patient data by using privacy-preserving machine learning
  • Reproducible decisions via FAIRness and reproducible data generation, annotation, training and monitoring of the running system
  • Ethics of decision-making in healthcare by early identification of implicit trade-offs, undesired feedback loops or other obstacles for beneficial use of algorithmic decision-making

While the importance of most of these properties is widely recognised when it comes to certifying ML-systems for the use in safety-critical domains, these are far from being solved. The project aims to advance the state of the art in safe machine learning through fundamental research. At the same time, we ground and focus our research in particular prototypes of medical applications of ML, where we also explore the trade-offs of these different properties when one wants to achieve all of them together in order to build a safe ML-system.

The second major goal is the certification of ML-systems where we have the following specific objectives:

  • development of techniques and protocols for the statistical and provable certification of properties of ML systems
  • guidelines on the quantification of certified properties and full transparency on the trade-offs between the different objectives
  • automatic certification tools which can be easily applied by notified bodies and companies
  • investigation of techniques for the certification of updates of running systems