Generative AI and robotics are taking us closer to the day when we can ask for an item and have it made in a matter of minutes. In fact, MIT researchers have developed a speech-to-reality system, an AI-powered workflow that allows them to provide inputs to a robotic arm and “bring objects into existence,” creating things like furniture in as little as five minutes.
With a speech-to-reality system, a robotic arm mounted on a table is able to receive spoken input from a human, such as “I want a simple stool,” and then build objects from modular components. To date, researchers have used this system to create decorative objects such as stools, shelves, chairs, a small table, and even a dog statue.
“We are combining natural language processing, 3D generative AI, and robotic assembly,” says Alexander Htet Kyaw, an MIT graduate student and Fellow of the Morningside Academy for Design (MAD). “These are fast-moving areas of research that have not been brought together before in a way that you can actually create physical objects from just a simple speech signal.”
play video
Speech to Reality: On-Demand Production Using 3D Generative AI and Discrete Robotic Assembly
The idea began when Kyaw – a graduate student in the Architecture and Electrical Engineering and Computer Science departments – took Professor Neil Gershenfeld’s course, “How to Make Almost Anything.” In that class, he created a speech-to-reality system. He continued to work on the project with Hwan Jeon, graduate students in the Department of Mechanical Engineering at the MIT Center for Bits and Atoms (CBA), directed by Gershenfeld, and Mianna Smith of the CBA.
The speech-to-reality system starts with speech recognition that processes the user’s request using a large language model, followed by 3D generative AI that creates a digital mesh representation of the object, and a voxelization algorithm that breaks the 3D mesh into assembly components.
After that, geometric processing modifies the AI-generated assembly to take into account real-world manufacturing and physical constraints, such as the number of components, overhangs, and connectivity of the geometry. A feasible assembly sequence and automated path planning is then constructed for the robotic arm to assemble physical objects from user prompts.
By leveraging natural language, the system makes design and manufacturing more accessible to people without expertise in 3D modeling or robotic programming. And, unlike 3D printing, which can take hours or days, this system is built in minutes.
“This project is an interface between humans, AI, and robots to co-create the world around us,” Kyaw says. “Imagine a scenario where you say ‘I want a chair’, and within five minutes there is a physical chair standing in front of you.”
The team has immediate plans to improve the weight-bearing capacity of the furniture by changing the means of connecting the cubes to magnets to a stronger connection.
“We also developed a pipeline to convert voxel structures into viable assembly sequences for small, distributed mobile robots, which can help translate this work to structures at any size scale,” says Smith.
The purpose of using modular components is to eliminate the waste that goes into making physical objects by taking them apart and then combining them into something different, for example turning a sofa into a bed when you no longer need the sofa.
Because Kyaw also has experience using gesture recognition and augmented reality to interact with robots in the manufacturing process, he is currently working on incorporating both speech and gesture control into a speech-to-reality system.
Dwelling on memories of the Replicators in the “Star Trek” franchise and the robots in the animated film “Big Hero 6,” Kyaw explains his approach.
“I want to increase access for people to create physical objects in a fast, accessible, and sustainable way,” he says. “I’m working toward a future where the essence of matter is truly under your control. Where reality can be generated on demand.”
The team presented their paper, “Speech to Reality: On-Demand Production Using Natural Language, 3D Generative AI, and Discrete Robotic Assembly,” at the Association for Computing Machinery (ACM) Symposium on Computational Fabrication (SCF ’25), held at MIT on Nov. 21.