Why AI Courses Must Teach Students to Recognize Bias in Medical Data
An influx of students pursuing courses in artificial intelligence (AI) for healthcare brightens the prospect of revolutionizing medical diagnostics and treatment recommendations yearly. Despite this excitement, one area of concern remains under-addressed—educating students on the critical importance of assessing the quality and biases inherent in the training data used to develop these AI models.
The Unseen Shortcomings in Healthcare Data and the Role of AI Education
Wonderfully highlighted by Leo Anthony Celi, an accomplished physician and senior research scientist at MIT’s Institute for Medical Engineering and Science, this oversight in AI education may create issues down the line. In a recent article, he explains how data bias, especially in clinical data primarily collected from white males, might lead to inefficiencies in AI systems when applied to more diverse populations. For instance, pulse oximeters often overestimate oxygen saturation levels in people of color due to their underrepresentation in clinical trials. This is only the tip of the iceberg, with countless more cases where medical equipment and data systems overlook population diversity, producing skewed results and potentially harmful decisions.
Another crucial issue orbits around the use of electronic health records (EHRs) as a basis for AI models. Although an essential part of medicine, EHRs were never intended to serve as learning systems, and they are ripe with inconsistencies and biases. Celi, however, is not all doom and gloom, advocating for ingenious ways of responsibly using the existing data instead of replacing the entire EHR infrastructure—something currently not feasible. Thankfully, innovative approaches such as transformer models are being explored to understand correlations between lab results, vital signs, and treatments better. This fascinating approach could help lessen the impact of missing or biased data, often affected by social determinants of health and implicit provider biases.
Addressing the Imperfections and Maximizing Learning
The challenges become apparent when you consider Celi’s experiences teaching AI in healthcare. Since starting in 2016, his MIT team realized that students were being taught to optimize models in terms of statistical performance rather than question the data’s integrity. A review of 11 online courses revealed the scale of this problem; only five courses talked about data bias and a mere two offered substantial discussions on the subject. As AI continues to establish its footprint in healthcare, the onus lies with educators to ensure students can not only build models but also scrutinize the data fueling them. Bridging this divide will require a shift in focus from solely model building to also understanding the data – an area Celi believes should account for at least half the course content.
One initiative helping to solve this puzzle is the MIT Critical Data consortium. They’ve been hosting international datathons since 2014. These sessions unite clinicians, data scientists, and healthcare professionals to collaboratively examine local datasets, aiming to understand health and disease within the unique cultural and systematic context of each region. These collaborations inspire an environment where critical thinking organically thrives.
Embracing the imperfections in data can also be a step towards improvement, albeit a challenging one. A good example is the MIMIC database, which took over 10 years to formulate a usable schema, largely due to users acknowledging and pointing out its flaws. An insightful reminder from Celi at this point is that even without all the answers, inspiring people to start asking the right questions can be a game-changer. As students and researchers engage with AI development in healthcare, they need to remain aware of its transformative potential and the ethical responsibilities that come with that.
For a more in-depth discussion on this subject with Leo Anthony Celi, visit MIT News.