Kategorien: NachrichtenVideo

Der KI beibringen, Ihr Haustier zu erkennen: Neue MIT-Methode trainiert Modelle, um personalisierte Objekte zu erkennen

Ponder this for a second: your adorable French Bulldog, Bowser, is at the local dog park. Amidst the blur of canines capering about, your eyes easily distinguish Bowser. But what if you wished for an AI to do the same while you’re holed up in the office? It’s at this point things become complex.

Our present vision-language models (VLMs), like the popular GPT-5, are excellent at singling out general objects. For instance, identifying a ‘dog’ or a ‘tree’ is a breeze. But, the challenge arises when these models are tasked with pinpointing a specific, personalized object. If you expect an AI to recognize Bowser the Frenchie in a line-up of French Bulldogs, it would probably fumble. This is an impediment to anyone intending to utilize AI for tasks such as pet monitoring, object tracking, or assistive technology.

Das Streben nach Personalisierung

To bridge this gap, researchers from MIT and the MIT-IBM Watson AI Lab conceived a new training method that enables AI models to recognize personalized objects more effectively across diverse scenes. They worked on re-training VLMs using specially curated video-tracking data, which follows the same object across a series of frames. This method essentially coerces the model to depend on contextual clues over memorized information. The AI model is fed a handful of sample images of a specific object, for instance, a pet or a backpack. The revamped system then becomes far superior at identifying that object in novel images, while retaining the model’s broader capabilities.

Es zum Leben erwecken

Dieser Fortschritt könnte sich in verschiedenen Bereichen als bahnbrechend erweisen. Von KI-Systemen, die bestimmte Tiere für Umweltstudien aufspüren, bis hin zu Hilfstechnologien, die sehbehinderten Nutzern helfen, persönliche Gegenstände in ihren Häusern zu finden - die Möglichkeiten sind vielfältig. Diese Technik könnte auch die Robotik und Augmented-Reality-Tools verstärken, die eine schnelle und genaue Identifizierung bestimmter Objekte in einer sich entwickelnden Umgebung erfordern.

Das Projekt wird von Jehanzeb Mirza geleitet, einem MIT-Postdoktoranden und Hauptautor der Forschungsarbeit. Neben Mirza hat auch ein Team von Forschern des MIT, des Weizmann Institute of Science und von IBM eine entscheidende Rolle bei dem Projekt gespielt. Ihre Ergebnisse werden auf der kommenden International Conference on Computer Vision vorgestellt.

Den menschlichen Geist nachahmen

According to Mirza, the ultimate goal is for these models “to learn from context, just like humans do”. If an AI model can achieve this, then, rather than retraining it for each new task, the model could be fed a few examples and it would infer how to perform the task from that context. This, in his opinion, would be an unrivaled ability. However, this vision isn’t without its own set of challenges. The research community is yet to find a definitive answer to the question of why VLMs struggle where humans don’t. The problem could lie in the integration of the visual and language components, where some visual information might get lost, but the conclusion isn’t clear cut yet.

The team’s work has resulted in impressive strides. With their newly curated dataset, they observed an average improvement of 12% in personalized object localization. Moreover, when pseudo-names were used instead of actual object names, performance skyrocketed by up to 21%. Additionally, the larger the model, the more substantial the gains. As they move ahead, the team plans to delve deeper into the learning inconsistencies of VLMs and LLMs, and investigate fresh strategies to enhance VLM performance without necessitating constant re-training of the models.

Mirza und sein Team haben das enorme Potenzial für eine schnelle, instanzspezifische Einbindung in praktische Arbeitsabläufe erkannt und sind überzeugt, dass ihr datenzentrierter Ansatz die weit verbreitete Integration von Modellen für die Grundlage der Bildsprache unterstützen kann. Gemeinsam mit Mirza haben Wei Lin, Eli Schwartz, Hilde Kuehne, Raja Giryes, Rogerio Feris, Leonid Karlinsky, Assaf Arbelle und Shimon Ullman an dieser bahnbrechenden Arbeit gearbeitet, die vom MIT-IBM Watson AI Lab finanziert wurde.

Weitere Einzelheiten finden Sie in dem Originalartikel hier.

Max Krawiec

Weiter Introducing a 27B Parameter Foundation Model for Single-Cell Analysis »

Vorherige « Revolutionize Your 3D Printing Content Strategy with AI Portal Maker

Teilen Sie

Herausgegeben von

Max Krawiec

vor 4 Monaten

Wie 3D-Druckunternehmen durch die Automatisierung von Inhalten an Sichtbarkeit gewinnen können.

Diese Website verwendet Cookies.

Der KI beibringen, Ihr Haustier zu erkennen: Neue MIT-Methode trainiert Modelle, um personalisierte Objekte zu erkennen

Das Streben nach Personalisierung

Es zum Leben erwecken

Den menschlichen Geist nachahmen

Verwandter Beitrag

Neueste Beiträge

Enhancing the Efficiency of Reasoning Large Language Models

Trump’s Plan to Curb Rising Electricity Costs: A Pledge from Tech Giants

Google’s Gemini: A Leap Forward in Mobile AI

Blending AI with Physics: Bringing Creative Designs to Life

Streamline Your Client Acquisition: AI for Accounting Firm Social Media Leads

Google’s Gemini AI: Revolutionizing Task Automation on Your Smartphone