Categories: AutomationNews

Teaching Robots to Understand Their Bodies—With Just a Camera

At the forefront of pioneering robot manipulation, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have engineered a revolutionary way to control robots that draws its inspiration from human adaptive learning. Unlike traditional robotic systems armed with an array of complex sensors, and movement models carefully constructed by hand, this new system called Neural Jacobian Fields (NJF), has the ability to instruct robots to self-learn their own bodily movements and responses to commands purely through observation.

In the heart of the CSAIL lab, a soft robotic hand nimbly grasps a tiny object with finesse. Remarkably, the hand is sensor-free and its movements, observed by a single camera, are directed by visual data alone. This disruptive technology does not follow the rigid programming approach that’s been the norm. Instead, it steps into the realm of teaching and learning. The robotic devices become students—observing, learning, and adapting their movements just like humans.

“This work points to a shift from programming robots to teaching robots,” says Sizhe Lester Li, the MIT PhD student who led the research. “Instead of coding every movement, we can show a robot a task and let it figure out how to accomplish it.”

This innovation completely flips the traditional model which relies on rigid design and sensor-packed technology to ensure control. NJF allows for unprecedented freedom, enabling robots (regardless of whether they’re soft, irregular, or without sensors) build their own internal understanding of movement by simply watching and adapting. This radical approach opens up endless possibilities for engineers to create bio-inspired machines without worrying about later control or modelling complications.

“It’s like how you would learn to control your limbs. You observe, wiggle, and adapt,” explains Li. “That’s the same principle our system employs.”

The team put NJF to the test on various robotic forms—from a pneumatic soft hand and a rigid Allegro hand to a 3D-printed arm and a rotating platform sans sensors. In each case, the system used visual and random movement data to learn and understand the robot’s geometry and its response to commands. Once trained, the robot only needs a single monocular camera to operate in real time, achieving a speed of 12 frames per second, a significant advancement compared to other simulators.

Embedded in NJF is a neural network which teaches two critical aspects—the robot’s 3D shape and its response to control signals. The system learns by observing random actions performed by the robot, completely bypassing the need for human input or pre-existing knowledge.

“Vision alone can provide the cues needed for localization and control,” says Daniela Rus, CSAIL director and co-author of the study. “This opens the door to robots that can function in high-chaos, unstructured environments—without the need for costly infrastructure.”

Current challenges include needing to individually train each robot using multiple cameras, and a lack of tactile sensing. However, the team, with support from the Solomon Buchsbaum Research Fund, the MIT Presidential Fellowship, the National Science Foundation, and the Gwangju Institute of Science and Technology, is committed to improving the system’s potential by addressing these challenges.

“Similar to how humans develop an intuitive sense of their bodies’ movements,” says Li. “NJF instils robots with that kind of embodied understanding, laying the foundation for flexible, adaptive control in the real world.”

More detailed information about this research can be found on MIT News: https://news.mit.edu/2025/vision-based-system-teaches-machines-understand-their-bodies-0724

Max Krawiec

Share
Published by
Max Krawiec

This website uses cookies.