Categories: Aktualności

Dlaczego modele uczenia maszynowego mogą zawodzić w nowych warunkach i co możemy z tym zrobić?

Machine learning models have their fair share of admirers, mostly for their ability to dig into colossal datasets and churn out highly accurate results. But they’re not invincible. Quite the contrary, according to recent findings from MIT scientists. They uncovered a chink in the otherwise resilient armor of top-rated models – a failure to carry their credibility from one situation to another.

One would think high-accuracy is a testament to generalizability. Not according to MIT researchers, though. Marzyeh Ghassemi, an associate professor in MIT’s Department of Electrical Engineering and Computer Science, suggests that a model that may be a superstar in one context may crash in another for up to 75% of the datasets. She counsels caution in blindly relying on average performance metrics when putting models into service in real-world scenarios.

When Models Falter and What Lies Beneath

An enlightening paper presented at the 2025 Neural Information Processing Systems (NeurIPS) conference by the team reveals how deep this problem runs. In essence, they found that models initially trained on diagnosing illnesses in one hospital with chest X-rays could perform deplorably in another hospital. The hitch? Aggregated statistics largely overlook this discrepancy, camouflaging the poor performance with respect to specific patient groups, like those with certain conditions such as pleural diseases or enlarged cardiomediastinum.

A key issue identified by the researchers was the presence of spurious correlations—essentially, relationships learned during training that do not translate into new environments. Uncover one, and the fallout can be far-reaching, if not catastrophic. For instance, imaging models may associate specific markings on X-rays from one hospital with a disease, but fail to see the same disease in another hospital’s scans where the marking is absent. Unlearning these spurious relationships is certainly a challenge.

Poking Holes in Traditional Beliefs and Looking Ahead

The conventional wisdom held was that if models ranked highly in one setting, they would equally shine in another. This premise, referred to as “accuracy-on-the-line,” met its downfall at the hands of the MIT team’s investigations. Their work showed that models that were crowned in one context could in fact be the laggards in another.

The researchers navigated this situation with a novel algorithm called OODSelect, spearheaded by MIT postdoc Olawale Salaudeen. The technique involves a scrutiny of thousands of models trained on data, and then retested on different data. The algorithm casts a spotlight on those models that performed admirably in the initial setting but flunked significantly in a new one.

What’s the way forward? The team has already put forth their code and the identified subsets for others to use, hoping that the machine learning community will embrace OODSelect. This way, organizations that stumble upon areas where their models are underperforming can course-correct by taking targeted steps to improve those specific areas.

“We hope the released code and OODSelect subsets serve as a bridge,” the researchers write, indicating their drive towards creating benchmarks and models that grapple with the adverse effects of spurious correlations.

To cloth this discussion in greater detail, check out the original article from MIT News: Why it’s critical to move beyond overly aggregated machine-learning metrics.

Max Krawiec

Next The Rise of AI and the Search for Authenticity on Instagram »

Previous « BYD's Rise to EV Supremacy: Meet the Dolphin Surf and the Future of Electric Cars

Published by

Max Krawiec

1 miesiąc ago

Jak firmy zajmujące się drukiem 3D mogą zyskać widoczność dzięki automatyzacji treści.

This website uses cookies.

Dlaczego modele uczenia maszynowego mogą zawodzić w nowych warunkach i co możemy z tym zrobić?

When Models Falter and What Lies Beneath

Poking Holes in Traditional Beliefs and Looking Ahead

Related Post

Recent Posts

Enhancing the Efficiency of Reasoning Large Language Models

Trump’s Plan to Curb Rising Electricity Costs: A Pledge from Tech Giants

Google’s Gemini: A Leap Forward in Mobile AI

Blending AI with Physics: Bringing Creative Designs to Life

Usprawnij pozyskiwanie klientów: Sztuczna inteligencja dla firm księgowych w mediach społecznościowych

Gemini AI od Google: rewolucja w automatyzacji zadań na smartfonie