Jüngste Forschung vom MIT hat eine kritische Schwachstelle in Large Language Models (LLMs) aufgedeckt. Diese fortschrittlichen KI-Tools haben verschiedene Sektoren revolutioniert und bieten Dienste an, die von Bots für den Kundenservice bis hin zu Plattformen für die Zusammenfassung medizinischer Notizen reichen. Es ist jedoch ein unerwartetes Problem aufgetaucht, da diese Modelle in der Trainingsphase möglicherweise die falschen Lektionen lernen.
LLMs, surprisingly, don’t exclusively depend on domain knowledge in responding to given queries. Instead, they tend to rely on familiar grammatical structures they’ve previously encountered during training sessions. This learning technique can lead them to come up with rather convincing yet misguided responses, especially when faced with unfamiliar or syntactically deceptive questions.
These models undergo training using a broad spectrum of Internet text, allowing them to establish relationships between words, phrases, and sentence formats. In the process, LLMs associate specific syntactic patterns or “syntactic templates” with particular subjects or domains. For instance, the model might interpret the structure of a question such as “Where is Paris located?” to be often associated with geographic inquiries. Consequently, even if presented with a nonsensical query following the same structure, like “Quickly sit Paris clouded?”, the model would still respond with “France”, regardless of the question’s absurdity.
What began as an innocent reliance on pattern-oriented reasoning has transformed into a serious liability, especially in high-stakes environments. This flaw means that AI models like LLMs can fail unpredictably when summarizing clinical records, generating financial reports, or handling sensitive customer data. “This is a byproduct of how we train models”, explains Marzyeh Ghassemi, an associate professor at MIT and senior author of the study. “But models are now used in practice in safety-critical domains far beyond the tasks that created these syntactic failures.”
Um dieses Problem weiter zu erforschen, führte das Forschungsteam synthetische Tests durch, bei denen jeder Bereich während des Trainings im Wesentlichen auf eine syntaktische Vorlage beschränkt war. Überraschenderweise zeigten die Ergebnisse, dass LLMs auch auf unsinnige Anfragen korrekte Antworten geben konnten, solange sie einer bekannten grammatikalischen Struktur folgten. Eine Umformulierung mit einer anderen Struktur führte zu falschen Antworten der Modelle, unabhängig von einer unveränderten Bedeutung.
The study also brought to light the unnerving fact that this syntactic bias could potentially be manipulated by malicious users to bypass the AI’s safety protocols. Vinith Suriyakumar, an MIT graduate student and co-author of the study, emphasizes this concern, stating that “we need to figure out new defenses based on how LLMs learn language, rather than just ad hoc solutions.”
The research didn’t propose specific fixes but the team did develop a new tool for developers. This benchmarking tool will allow developers to uncover whether a model overly leans on syntactic patterns, thereby helping to enhance model credibility before deployment. The MIT team also plans to investigate potential mitigation strategies like incorporating more diverse syntactic templates in the training data and examine how this problem could affect reasoning models – a subcategory of LLMs designed to solve multi-step problems.
The study has drawn attention from professionals outside the investigating group. “This work highlights the importance of linguistic awareness in LLM safety research,” commented Jessy Li, an associate professor at the University of Texas. This project was made possible through support from the National Science Foundation, the Gordon and Betty Moore Foundation, Schmidt Sciences, a Google Research Award, and a Bridgewater AIA Labs Fellowship.
Diese Website verwendet Cookies.