MIT Study Reveals Hidden Shortcomings in Large Language Models
Recent research from MIT has unveiled a critical flaw in Large Language Models (LLMs). These advanced AI tools have revolutionized various sectors, providing services ranging from customer service bots to medical note summarizing platforms. However, an unexpected issue has emerged, as these models might be learning the wrong lessons while in the training phase.
The Dilemma of Syntax Superseding Sense
LLMs, surprisingly, don’t exclusively depend on domain knowledge in responding to given queries. Instead, they tend to rely on familiar grammatical structures they’ve previously encountered during training sessions. This learning technique can lead them to come up with rather convincing yet misguided responses, especially when faced with unfamiliar or syntactically deceptive questions.
These models undergo training using a broad spectrum of Internet text, allowing them to establish relationships between words, phrases, and sentence formats. In the process, LLMs associate specific syntactic patterns or “syntactic templates” with particular subjects or domains. For instance, the model might interpret the structure of a question such as “Where is Paris located?” to be often associated with geographic inquiries. Consequently, even if presented with a nonsensical query following the same structure, like “Quickly sit Paris clouded?”, the model would still respond with “France”, regardless of the question’s absurdity.
What began as an innocent reliance on pattern-oriented reasoning has transformed into a serious liability, especially in high-stakes environments. This flaw means that AI models like LLMs can fail unpredictably when summarizing clinical records, generating financial reports, or handling sensitive customer data. “This is a byproduct of how we train models”, explains Marzyeh Ghassemi, an associate professor at MIT and senior author of the study. “But models are now used in practice in safety-critical domains far beyond the tasks that created these syntactic failures.”
Exploring, Exploiting, and Evolving
To further delve into this issue, the research team conducted synthetic tests that essentially restricted each domain to one syntactic template while training. Surprisingly, the results revealed that LLMs could still produce accurate responses even to nonsensical inquiries as long as they followed a familiar grammatical structure. A rephrase with a different structure yielded incorrect answers from the models, regardless of an unchanged meaning.
The study also brought to light the unnerving fact that this syntactic bias could potentially be manipulated by malicious users to bypass the AI’s safety protocols. Vinith Suriyakumar, an MIT graduate student and co-author of the study, emphasizes this concern, stating that “we need to figure out new defenses based on how LLMs learn language, rather than just ad hoc solutions.”
The research didn’t propose specific fixes but the team did develop a new tool for developers. This benchmarking tool will allow developers to uncover whether a model overly leans on syntactic patterns, thereby helping to enhance model credibility before deployment. The MIT team also plans to investigate potential mitigation strategies like incorporating more diverse syntactic templates in the training data and examine how this problem could affect reasoning models – a subcategory of LLMs designed to solve multi-step problems.
The study has drawn attention from professionals outside the investigating group. “This work highlights the importance of linguistic awareness in LLM safety research,” commented Jessy Li, an associate professor at the University of Texas. This project was made possible through support from the National Science Foundation, the Gordon and Betty Moore Foundation, Schmidt Sciences, a Google Research Award, and a Bridgewater AIA Labs Fellowship.