Have you heard the buzz about large language models (LLMs) such as ChatGPT? They’re programming wonders that can instantly pen essays, brainstorm meals, or even help craft your emails. But as amazing as they are, they’ve not been historically great at challenging tasks like problem-solving, especially when it comes to mathematics or complex reasoning. However, this shortcoming is becoming less and less of an issue.
Enter the new wave of LLMs referred to as reasoning models, which are showing considerable improvement in handling complex tasks. These models, unlike their predecessors that relied heavily on language patterns to guess answers, employ more deliberate, step-by-step strategies much like a human would.
What’s more, researchers at MIT’s McGovern Institute for Brain Research noted a striking similarity between how humans and these new models approach difficult tasks. Interestingly, they discovered that tasks which demand the most mental effort from humans are also the ones that require the most computational strain from reasoning models. This lead to a new concept: the “cost of thinking” transcends the human-machine divide.
This somewhat unexpected alignment took the MIT team headed by Associate Professor Evelina Fedorenko by surprise. People who develop these models tend to focus on crafting a system that performs well and gives accurate results under numerous conditions rather than mimicking human cognitive behavior. Thus, the convergence of human and machine effort was an unexpected but exciting discovery.
These reasoning models are still fundamentally artificial neural networks – systems that learn by analysing data and recognizing patterns. However, they go a step further than their predecessors by addressing deeper cognitive tasks, such as math problems or coding. A key innovation is their approach to problem-solving: these models break problems down into smaller parts, which considerably enhances their performance.
Engineers also use reinforcement learning to train these models, rewarding correct answers and penalizing incorrect ones. The model over time learns to explore problem-solving paths that lead to accurate conclusions more often, replicating a more human-like cognitive process. This traditional method, though longer than the processes used by earlier LLMs, improves accuracy to a great extent.
Andrea Gregor de Varda, a postdoctoral fellow at MIT’s K. Lisa Yang ICoN Center, along with Fedorenko, conducted an experiment to determine this theory. They not only observed the accuracy but also gauged how much effort was required. For humans, this involved tracking response times to the millisecond. For models, they looked at how many tokens, or internal pieces of language, the model generates while working through a problem. Apparently, the harder the problem, the more tokens the model generates – much like how we humans metaphorically ‘talk to ourselves’ when encountering a tricky problem.
Seven types of problems, including arithmetic and intuitive reasoning, were posed to both humans and the reasoning model. Expectedly, harder problems took longer for humans to solve and also required the reasoning model to produce more tokens. However, while these findings are compelling, de Varda cautions against jumping to the conclusion that these models fully mirror human cognition. He highlights that as they still function primarily in an abstract, non-linguistic space, there’s still more to learn about how closely they model human thought processes.
Many questions remain unanswered. For instance, do these models represent information like our brains do? Can they tackle issues requiring real-world knowledge beyond their training data? As researchers explore these frontiers, one intriguing suggestion is clear: machines might slowly but surely be evolving closer to human-like cognition, not because they were explicitly programmed to, but possibly because it’s simply the most effective way to think.
Read more about the intricate relationship between human and machine cognition in the full article at MIT News.
This website uses cookies.