In the realm of ideas and communication, words sometimes aren’t enough. A quick sketch, like drawing a circuit, can effectively convey complex concepts. But what if you could utilize artificial intelligence in this creative process? Well, this idea is no longer a wild fantasy, thanks to the brilliant minds of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stanford University. They’ve developed an AI system that sketches more like a human, under the moniker of SketchAgent.
This refreshing new tool utilizes multimodal language models, systems that are trained on both text and images. It can process natural language prompts and render them into simple, hand-drawn-style sketches in a matter of seconds. Unlike many AI art tools that emphasize photorealistic images or stylized cartoons, SketchAgent focuses on sketching’s process, mimicking the way humans draw one stroke at a time. This approach allows more organic, iterative visualizations. Hence, SketchAgent can draw anything from a simple house to working on a complex doodle collaboratively with humans, taking text-based instructions and sketching each component individually.
In developing SketchAgent, the scientists opted for a unique approach. Instead of training the AI on huge databases of human sketches, they taught it using a so-called ‘sketching language’. This ingenious method breaks drawings down into sequences of strokes mapped onto a grid, with each stroke numbered and labeled. This sketching language enables the AI system to deduce how to sketch new concepts it hasn’t encountered before.
The team leading this new AI revolution includes Yael Vinker, Tamar Rott Shaham, Alex Zhao, Antonio Torralba from MIT, and Kristine Zheng and Judith Ellen Fan from Stanford. The world is set to learn about their trailblazing work at the 2025 Conference on Computer Vision and Pattern Recognition (CVPR).
The distinct feature about SketchAgent that makes it stand out is its ability to draw each stroke sequentially, much like a human would. This ability results in sketches that feel natural and human-like. While other AI models can generate visually engaging images from text, they often miss the step-by-step creativity involved in sketching. On top of this, the AI has the potential to draw a wide range of ideas, from butterflies and DNA helices to the iconic Sydney Opera House, thanks to its ability to tap into the broad knowledge of pre-trained language models, albeit these models don’t naturally know how to draw.
Another ground-breaking feature of SketchAgent is its capability to work collaboratively with humans. During testing, it was found that the AI’s contributions were vital to the final sketches. For example, if the AI-drawn mast was removed from a sailboat sketch, the drawing became unidentifiable. Researchers also experimented with varying language models to find out which created the most human-like drawings. Claude 3.5 Sonnet emerged as the top performer, overshadowing GPT-4o and Claude 3 Opus in crafting recognizable vector-based sketches.
Admittedly, despite its tremendous potential, SketchAgent still has a few kinks to iron out. Currently, it does an excellent job drawing basic stick figures and doodles but faces challenges with complex images like logos, text, or detailed creatures such as unicorns and cows. Also, it occasionally misinterprets user intentions, such as creating a two-headed bunny sketch, likely due to the AI’s step-by-step process becoming misaligned with its human collaborator. To address these teething troubles, the research team plans on training SketchAgent using synthetic data from diffusion models and perfecting the user interface to make it more intuitive and responsive during joint sketching sessions.
Nevertheless, SketchAgent heralds a new era in human-AI communication. By aiding visual communication through sketches, invaluable possibilities open up for teachers, researchers, and anyone wishing to express their ideas in a visual form. Lead author Yael Vinker mentioned, “Many people don’t realize how often they draw in daily life — whether it’s brainstorming or explaining something visually. SketchAgent aims to replicate that process, helping AI become a more effective tool for visual expression.” Indeed, as AI continues to advance, innovative tools like SketchAgent might transform how we interact with machines — moving beyond words to shared, visual creativity.
Source: MIT News
This website uses cookies.