When it comes to a mine collapse, time is of the essence. Critical search-and-rescue missions become incredibly challenging as the robot navigating through the hazardous, partially caved-in shaft must rapidly map its surroundings and ascertain its position. Relying only on its onboard cameras for navigation, this task becomes arduous to say the least.
Even with the recent advancements in machine learning facilitating robots to perform such tasks using visual data, there still persist limitations. The current models can only process a limited number of images at once. Imagine a situation where the robot needs to review and analyze thousands of images in real-time – it’s at this juncture that these constraints become a significant roadblock.
Enter the researchers at MIT, who have came up with an AI-driven system that marries the strengths of both contemporary deep learning and traditional computer vision techniques. This method can process an infinite number of images and quickly generate intricate 3D maps of complex environs such as a crowded office hallway.
Instead of digesting a gigantic scene in one fell swoop, the system divides the environment into smaller subsections or “submaps”. They are subsequently aligned and merged to create a complete 3D blueprint – all the while monitoring the robot’s position in real-time. The beauty of this method lies in its simplicity, speed and scalability, making it apt for applications ranging from search-and-rescue missions to industrial logistics, and extended reality experiences.
The essence of this breakthrough rests in redefining a potent robotics conundrum – simultaneous localization and mapping (SLAM). Traditionally, SLAM algorithms grapple with visually intense environments or rely heavily on pre-calibrated hardware. Machine learning models offer a solution but are restricted by the quantum of data they can deal with simultaneously, usually around 60 images.
MIT’s game-changing system addresses this impediment by focusing on smaller fragments of the environment. Even though every submap is created using just a few snapshots, they are quickly pieced together into an overarching, cohesive map; accelerating the process and enabling the robot to tackle more expansive and varied terrain.
At the outset, aligning the submaps seemed like a straightforward solution but soon it was discovered that machine-learning models’ flaws can cause the submaps to be slightly distorted. Traditional alignment methods using rotation and translation failed to deliver as the submaps themselves were disformed. So, the team revisited decades old computer vision research, fusing those insights with modern AI.
The result was a more flexible mathematical framework that embraced submap distortions. This enabled the system to accurately align even distorted submaps, producing a dependable 3D prototype and precise estimates of camera positions critical for robotic navigation. Impressive testing outcomes revealed the system excelling existing methods in both speed and precision, being capable of reconstructing intricate environments using only short smartphone videos, with an error margin of less than five centimeters.
For the future, the team envisages refining their method for even more multifaceted environments and incorporating it into real robots operating in the field. Their work ultimately showcases the merit of combining elementary knowledge with avant-garde AI to tackle real-world challenges. As MIT Associate Professor Luca Carlone aptly puts, “Knowing about traditional geometry pays off. If you understand deeply what’s going on in the model, you can get much better results and make things much more scalable.”
This intriguing research, which is supported by the U.S. National Science Foundation, the Office of Naval Research, and the National Research Foundation of Korea, is set to be presented at the Conference on Neural Information Processing Systems. For those interested in more details, you can check out the original article here.
This website uses cookies.