{"id":8194,"date":"2026-03-11T05:00:00","date_gmt":"2026-03-11T04:00:00","guid":{"rendered":"https:\/\/aitrendscenter.eu\/revolutionizing-long-term-visual-task-planning-with-ai-at-mit\/"},"modified":"2026-03-11T05:00:00","modified_gmt":"2026-03-11T04:00:00","slug":"revolutionizing-long-term-visual-task-planning-with-ai-at-mit","status":"publish","type":"post","link":"https:\/\/aitrendscenter.eu\/pl\/revolutionizing-long-term-visual-task-planning-with-ai-at-mit\/","title":{"rendered":"Rewolucja w d\u0142ugoterminowym planowaniu zada\u0144 wizualnych dzi\u0119ki sztucznej inteligencji na MIT"},"content":{"rendered":"<p>MIT researchers have brought forth a revolutionary AI-based technique that significantly improves long-term visual task planning, like robot navigation. This ground-breaking method is reportedly twice as effective as some of the existing techniques \u2014 a big accomplishment in the world of AI-driven innovation.<\/p>\n<p>This advancement revolves around a vision-language model, a system designed to understand visual scenarios and map necessary actions to fulfill a given objective. But what makes it stand apart? It&#8217;s its ability to generate ready-to-use files for traditional planning software, basically automatically doing half of the job for you. Plus, with a success rate of around 70% \u2014 significantly outperforming the 30% rate of standard methods \u2014 this method is nothing short of a game-changer.<\/p>\n<h5>Adapting to New Challenges and Collaborative Efforts<\/h5>\n<p>This system&#8217;s distinct feature, as asserted by Yilun Hao, the lead author of the paper and a graduate student at MIT, is its ability to tackle problems it has never seen before. Such adaptability is vital in dealing with real-world scenarios where unpredictability is the name of the game.<\/p>\n<p>But Hao didn&#8217;t achieve this feat alone \u2014 he joined forces with Yongchao Chen (MIT Laboratory for Information and Decision Systems, or LIDS), Yang Zhang (MIT-IBM Watson AI Lab) and Chuchu Fan (Associate Professor at AeroAstro and a principal investigator in LIDS). Their collective efforts bore fruit so remarkable it will be showcased at the International Conference on Learning Representations.<\/p>\n<h5>Addressing Visual Tasks and Creating Reliable Solutions<\/h5>\n<p>The team used the Vision-Language Model (VLM) to bridge the gap between complex reasoning, planning and visual inputs; a move which tests the power of AI in dealing with real-life challenges, such as autonomous driving or robotic assembly. However, since VLMs often stumble while understanding spatial relationships between objects in the scene and reasoning through multiple steps, they joined forces with formal planners to come up with VLM-guided formal planning (VLMFP).<\/p>\n<p>VLMFP comprises two specialised VLMs, which transform visual planning problems into files ready for traditional planning software. The system starts with a small model, SimVLM, which work to describe visual scenarios in natural language. A larger model, GenVLM, then uses SimVLM&#8217;s descriptions to generate initial files in the Planning Domain Definition Language (PDDL). These files are then fed into a classical PDDL solver and step-by-step plans unfold.<\/p>\n<h5>Future Prospects<\/h5>\n<p>VLMFP has delivered impressive results, achieving about 60% success on six 2D planning tasks and over 80% success on two 3D tasks, such as multirobot collaboration and robotic assembly. It also managed to generate valid plans for more than half of the scenarios it had not encountered before, clearly outdoing traditional methods.<\/p>\n<p>In the future, the team hopes to further refine the capabilities of VLMFP, allowing it to handle even more complex scenarios and reduce potential mistakes made by VLMs. Ultimately, they believe generative AI models could evolve into agents capable of addressing even more complicated problems, signifying a great leap in AI-driven problem-solving.<\/p>\n<p>This work was partially supported by the the MIT-IBM Watson AI Lab. For more information, you can check out the original news article <a href=\"https:\/\/news.mit.edu\/2026\/better-method-planning-complex-visual-tasks-0311\" target=\"_blank\" rel=\"noopener\">tutaj<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>MIT researchers have brought forth a revolutionary AI-based technique that significantly improves long-term visual task planning, like robot navigation. This ground-breaking method is reportedly twice as effective as some of the existing techniques \u2014 a big accomplishment in the world of AI-driven innovation. This advancement revolves around a vision-language model, a system designed to understand visual scenarios and map necessary actions to fulfill a given objective. But what makes it stand apart? It&#8217;s its ability to generate ready-to-use files for traditional planning software, basically automatically doing half of the job for you. Plus, with a success rate of around 70% [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":8195,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46,47],"tags":[],"class_list":["post-8194","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation","category-ai-news","post--single"],"_links":{"self":[{"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/posts\/8194","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/comments?post=8194"}],"version-history":[{"count":0,"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/posts\/8194\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/media\/8195"}],"wp:attachment":[{"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/media?parent=8194"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/categories?post=8194"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aitrendscenter.eu\/pl\/wp-json\/wp\/v2\/tags?post=8194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}