Categories: Aktualności

Badanie MIT ujawnia ukryte niedociągnięcia w dużych modelach językowych

Najnowsze badania z MIT ujawniła krytyczny błąd w dużych modelach językowych (LLM). Te zaawansowane narzędzia sztucznej inteligencji zrewolucjonizowały różne sektory, zapewniając usługi, od botów obsługi klienta po platformy podsumowujące notatki medyczne. Pojawił się jednak nieoczekiwany problem, ponieważ modele te mogą wyciągać niewłaściwe wnioski podczas fazy szkolenia.

Dylemat składni zastępującej sens

LLMs, surprisingly, don’t exclusively depend on domain knowledge in responding to given queries. Instead, they tend to rely on familiar grammatical structures they’ve previously encountered during training sessions. This learning technique can lead them to come up with rather convincing yet misguided responses, especially when faced with unfamiliar or syntactically deceptive questions.

These models undergo training using a broad spectrum of Internet text, allowing them to establish relationships between words, phrases, and sentence formats. In the process, LLMs associate specific syntactic patterns or “syntactic templates” with particular subjects or domains. For instance, the model might interpret the structure of a question such as “Where is Paris located?” to be often associated with geographic inquiries. Consequently, even if presented with a nonsensical query following the same structure, like “Quickly sit Paris clouded?”, the model would still respond with “France”, regardless of the question’s absurdity.

What began as an innocent reliance on pattern-oriented reasoning has transformed into a serious liability, especially in high-stakes environments. This flaw means that AI models like LLMs can fail unpredictably when summarizing clinical records, generating financial reports, or handling sensitive customer data. “This is a byproduct of how we train models”, explains Marzyeh Ghassemi, an associate professor at MIT and senior author of the study. “But models are now used in practice in safety-critical domains far beyond the tasks that created these syntactic failures.”

Odkrywanie, wykorzystywanie i ewolucja

Aby jeszcze bardziej zagłębić się w tę kwestię, zespół badawczy przeprowadził testy syntetyczne, które zasadniczo ograniczyły każdą domenę do jednego szablonu syntaktycznego podczas treningu. Co zaskakujące, wyniki ujawniły, że LLM mogą nadal generować dokładne odpowiedzi nawet na bezsensowne zapytania, o ile są one zgodne ze znaną strukturą gramatyczną. Przeformułowanie z inną strukturą dawało nieprawidłowe odpowiedzi od modeli, niezależnie od niezmienionego znaczenia.

The study also brought to light the unnerving fact that this syntactic bias could potentially be manipulated by malicious users to bypass the AI’s safety protocols. Vinith Suriyakumar, an MIT graduate student and co-author of the study, emphasizes this concern, stating that “we need to figure out new defenses based on how LLMs learn language, rather than just ad hoc solutions.”

The research didn’t propose specific fixes but the team did develop a new tool for developers. This benchmarking tool will allow developers to uncover whether a model overly leans on syntactic patterns, thereby helping to enhance model credibility before deployment. The MIT team also plans to investigate potential mitigation strategies like incorporating more diverse syntactic templates in the training data and examine how this problem could affect reasoning models – a subcategory of LLMs designed to solve multi-step problems.

The study has drawn attention from professionals outside the investigating group. “This work highlights the importance of linguistic awareness in LLM safety research,” commented Jessy Li, an associate professor at the University of Texas. This project was made possible through support from the National Science Foundation, the Gordon and Betty Moore Foundation, Schmidt Sciences, a Google Research Award, and a Bridgewater AIA Labs Fellowship.

Max Krawiec

This website uses cookies.