In today’s fast-paced, tech-driven world, artificial intelligence (AI) has become a silent yet efficient assistant aiding many professionals. From a scientist brainstorming a ground-breaking research idea to a CEO looking to optimize human resources and finance, AI, specifically AI agents, are the tools they never knew they needed. Operating as semi-autonomous software systems, these AI agents are increasingly used to leverage large language models (LLMs) to solve issues and rapidly complete tasks.
LLMs take on an even more potent role when paired with AI agents due to their adaptability and efficiency. One of their widely recognized applications is in automating the translation of outdated codebases into contemporary programming languages. For instance, a software company may utilize an LLM to translate one programming file at a time and then test each of these. However, the process can be laborious and time-consuming when errors are made by the LLM and need to be manually fixed.
This conundrum led to the development of EnCompass by researchers from MIT’s Computer Science and AI Laboratory (CSAIL) and Asari AI. EnCompass is a state-of-the-art framework that empowers AI agents to automatically backtrack and retry when LLMs encounter errors. Additionally, it eliminates the lengthy error-handling codes required by programmers.
Encompass stands out with its ability to clone the program’s runtime. This allows for simultaneous execution of multiple solution attempts. In other words, it explores multiple possible outcomes, not just one path, to find the most optimal resolution. With EnCompass, developers can earmark particular operations, such as LLM calls, where results can vary. These checkpoints, known as ‘branchpoints’, permit the program to explore multiple scenarios like in a choose-your-own-adventure story, finding the best possible conclusion.
Additionally, users can select or define a strategy for navigating these branches. EnCompass supports a variety of pre-built search strategies like Monte Carlo tree search and beam search. Alternatively, users can actualize custom strategies specifically tailored to their tasks.
The benefits of employing EnCompass are superb. In a test where EnCompass was used by an AI agent for translating Java code repositories into Python, the amount of code required for implementing the search was reduced by 82%. This resulted in saving a massive 348 lines of code. Further, the accuracy was improved by 15-40% across five different repositories when a two-level beam search strategy was used.
“With EnCompass, we’ve detached the search strategy from AI agent’s underlying workflow. This allows programmers to experiment freely with different search strategies to discover the most effective one,” said MIT EECS PhD student and CSAIL researcher, Zhening Li ’25, MEng ’25.
Encompass has shown promising results when used for agents implemented in Python that call LLMs. It can manage extensive code libraries, design scientific experiments, and even draft complex hardware blueprints such as rockets. However, the current success of EnCompass is more applicable to agents that follow a specific programmatic workflow, and it performs less effectively with agents completely governed by LLMs.
In the coming years, the EnCompass team plans to extend its functionality to more universal search frameworks. They aim to test the system on highly complex tasks and study its potential for collaborative efforts between AI agents and humans such as co-designing hardware or translating voluminous codebases.
EnCompass, thus, marks a pivotal moment in the AI agents and search-based techniques revolutionizing software development workflows. By precisely distinguishing an agent’s logic from its search strategy, EnCompass sets a robust foundation for constructing systematic, reliable, and high-performing AI systems.
For a more detailed account, you can read the original article on MIT News.
This website uses cookies.