Kategorien: BilderNachrichten

Ein neuer Weg zur Bearbeitung und Generierung von Bildern ohne herkömmliche KI-Generatoren

Reimagining How AI Creates and Edits Images

We’re living in an age when artificial intelligence can whip up a bizarre fantasy or hyper-realistic photo out of a few words. All that magic rests on giant neural networks quietly learning from millions—sometimes billions—of photos and written descriptions. But the next breakthrough in AI art might be slight of hand: what if you could create and transform images with AI, but without the heavy machinery of a traditional image generator?

That exact thought inspired a group of researchers from MIT and Facebook AI Research. Their latest paper, unveiled at the 2025 International Conference on Machine Learning (ICML), shows how a surprisingly simple technique could dramatically streamline how AI conjures up and edits pictures.

From Class Project to Groundbreaking Discovery

The journey started as a class project for MIT grad student Lukas Lao Beyer, mentored by Professor Kaiming He. What began modestly soon attracted collaborators—Tianhong Li, Xinlei Chen, and Sertac Karaman—and took on a much larger scope.

Beyer’s inspiration came from new research out of the Technical University of Munich and ByteDance. That team created a “1D tokenizer”—an unusual AI model that distills a 256×256 image down to just 32 tokens, each a 12-digit binary value. Imagine it as an ultra-condensed “language” of 4,000 words, only this language describes pictures instead of ideas.

Fascinated by how much information these tokens carried, Beyer began experimenting: by swapping out tokens in an image, he watched the picture morph—resolution jumped, colors dulled or popped, backgrounds became crisp or blurred, and sometimes a bird’s pose simply changed direction.

No Generator Needed: Editing at the Core

Here’s where it gets wild: rather than redrawing the pixels directly, Beyer and the team started editing specific tokens to get the changes they wanted in an image. Turns out this kind of swap isn’t just for tinkering; it offers a whole new way to generate images, too. By stacking a 1D tokenizer with a decoder that reconstructs the image, plus a guiding neural network called CLIP (which keeps the AI’s work aligned with a text prompt), the team could “nudge” bundles of tokens until a picture of, say, a red panda began looking like a tiger—or even create something entirely new by starting from random tokens and gradually refining them to fit a prompt.

What’s astonishing is how much this sidesteps the need for old-school image generators. Traditional generators are hulking beasts—slow to train, power-hungry to run. This more minimalist method sidesteps much of the computational slog, making advanced AI art potentially cheaper and more widespread. It can also perform tricks like inpainting—filling in missing image parts—using the same tokenizer and decoder setup. Sometimes, the boldest advances come from reusing tools in unexpected ways.

Experts are paying attention. Saining Xie from NYU called the results “pretty surprising,” while Princeton’s Zhuang Liu pointed out the promising possibility of cutting AI image costs.

And the potential isn’t limited to pictures. Sertac Karaman highlighted how the same idea—compressing actions or driving directions into tokens—could benefit fields like robotics or self-driving cars. Beyer agrees: “The power of that kind of compression could unlock amazing things,” he says.

At its heart, this new approach shows what can happen when you dare to challenge the old ways and make the most of the tools at hand. By thinking differently, the MIT team hasn’t just found a shortcut—they’ve paved a new road for AI creativity.

Lesen Sie den Originalartikel: https://news.mit.edu/2025/new-way-edit-or-generate-images-0721

Max Krawiec

Teilen Sie
Herausgegeben von
Max Krawiec

Diese Website verwendet Cookies.