Responsible AI: Protecting Against Data Leakage
Artificial Intelligence (AI) can bring incredible new capabilities to medical technology and dementia assessment, but AI also carries ethical and structural risks that need to be identified and controlled. Here at SPARK Neuro, we take these risks very seriously, because we know that AI risk can lead to ineffective or unsafe products, and the furthering of structural inequities. Responsible AI is core to our company’s values and mission, and we aim to help guide the industry towards adopting AI best practices, so that technological advances are safe, effective, ethical, and equitable for all.
For this reason, in 2024 we published our paper “Data leakage in deep learning studies of translational EEG” in Frontiers Neuroscience. Our aim was to identify and quantify a major risk that we commonly see in clinical research, called Data Leakage. Data Leakage occurs when AI models are built using methods that fail to protect against the structure of most clinical data. These common model building techniques can lead to massive overestimation of model performance, and products that promise high accuracy but fail in the real world.
In this paper, Brookshire et al. describe the extent of this problem, specifically with regard to EEG and Deep Neural Networks (DNNs) They quantify the effect of this overestimation by recreating models published in the literature, and they show how reported model performance is overestimated. In one particularly stunning example, they recreate a model that claims to diagnose Alzheimer’s disease using EEG and DNNs with a reported 99% accuracy. After correcting for data leakage, they show that the true performance is in fact not significantly better than chance (53%).
Brookshire et al. show that a majority of papers (at least ~62%, probably more) using DNNs and EEG have this data leakage problem and therefore have overestimated performance. As such, data leakage presents a major problem for the burgeoning field of Clinical AI and for the promise that AI makes of improved patient health. With this publication, we hope to help guide the industry towards better controls in AI development.