Computer-aided design (CAD) systems are tried-and-true tools used to design many of the physical objects we use every day. But mastering CAD software requires extensive expertise, and many tools involve such a high level of detail that they don’t lend themselves to brainstorming or rapid prototyping.
In an effort to make design faster and more accessible to non-experts, researchers at MIT and elsewhere have developed an AI-powered robotic assembly system that allows people to build physical objects simply by describing them in words.
Their system uses a generative AI model to create a 3D representation of an object’s geometry based on user input. Then, a second generative AI model reasons about the desired object and figures out where different components should go according to the object’s function and geometry.
The system can automatically build objects from a set of premade parts using robotic assembly. It can also work on designs based on user feedback.
The researchers used this end-to-end system to create furniture, including chairs and shelves, from two types of precast components. Components can be disassembled and reassembled at will, reducing the amount of waste generated through the manufacturing process.
They evaluated these designs through a user study and found that more than 90 percent of participants preferred the objects created by their AI-powered system compared to different approaches.
Although this work is an initial demonstration, the framework could be particularly useful for rapid prototyping of complex objects such as aerospace components and architectural objects. In the long term, it could be used to make furniture or other items locally in homes, without the need to ship bulky products from a central facility.
“Sooner or later, we want to be able to communicate and talk to robots and AI systems in the same way we talk to each other to build things together. Our system is the first step toward enabling that future,” says lead author Alex Kyaw, a graduate student in the MIT departments of Electrical Engineering and Computer Science (EECS) and Architecture.
Kyaw is joined by MIT Architecture graduate student Richa Gupta; Faiz Ahmed, associate professor of mechanical engineering; Lawrence Sass, professor and chair of the Computation Group in the Department of Architecture; Senior author Randall Davis, an EECS professor and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); As well as others from Google Deepmind and Autodesk Research. The paper was recently presented at the Conference on Neural Information Processing Systems.
Creating a Multicomponent Design
While generative AI models are good at generating 3D representations, known as meshes, from text prompts, most do not produce uniform representations of an object’s geometry that contain the component-level details needed for robotic assembly.
Separating these meshes into components is challenging for a model because specifying the components depends on the geometry and functionality of the object and its parts.
The researchers addressed these challenges by using a vision-language model (VLM), a powerful generative AI model that has been pre-trained to understand images and text. They task the VLM with figuring out how two types of preformed parts, structural components and panel components, should fit together to form an object.
“There are many ways we can place panels on a physical object, but the robot needs to see the geometry and reason on that geometry to make decisions about it. By acting as both the robot’s eyes and brain, the VLM enables the robot to do this,” Kyaw says.
A user prompts the system with text, perhaps by typing “make me a chair,” and it returns an AI-generated image of a chair to get started.
Then, the VLM reasons about the chair and determines where the panel components go on top of the structural components, based on the functionality of several example objects seen previously. For example, the model may dictate that the seat and backrest should have panels to create a surface for someone sitting and reclining in the chair.
It outputs this information as text, such as “seat” or “backrest”. Each surface of the chair is then labeled with numbers, and the information is sent back to the VLM.
The VLM then chooses labels that match the geometric parts of the chair that should receive panels on the 3D mesh to complete the design.
Human-AI Co-Design
The user remains in the loop throughout this process and can refine the design by giving the model a new prompt, such as “Use only the panel on the backrest, not the seat.”
“The design space is huge, so we limit it through user feedback. We believe this is the best way to do this because people have different preferences, and it would be impossible to create a perfect model for everyone,” says Kyaw.
“The human process in the loop allows users to drive AI-generated designs and have a sense of ownership in the end result,” says Gupta.
Once the 3D mesh is finalized, a robotic assembly system builds the object using preformed parts. These reusable parts can be taken apart and reassembled in different configurations.
The researchers compared the results of their method to an algorithm that places panels on all horizontal surfaces that are upward, and an algorithm that places panels randomly. In a user study, more than 90 percent of individuals preferred the design created by their system.
He also asked VLM to explain why it decided to install panels in those areas.
“We learned that the vision language model is able to understand to some extent the functional aspects of the chair, like recline and recline, to understand why it’s placing panels on the seat and backrest. It’s not just randomly making these assignments,” Kyaw says.
In the future, the researchers want to improve their system to handle more complex and subtle user signals, such as tables made of glass and metal. Additionally, they may want to incorporate additional precast components, such as gears, hinges, or other moving parts, so that the items can have greater functionality.
Davis says, “Our hope is to significantly lower the barrier of access to design tools. We’ve shown that we can use generic AI and robotics to turn ideas into physical objects in a fast, accessible, and sustainable way.”