Imagine that a robot is helping you clean the dishes. You ask the sink to grab a soap bowl, but its gripper misses a little.
Using a new structure developed by MIT and Nvidia researchers, you can fix the behavior of the robot with simple interactions. The method will allow you to indicate the bowl or detect a trajectory on the screen, or simply give the aristor of the robot a elbow in the right direction.
Unlike other methods to fix robot behavior, this technique does not require users to collect new data and resume machine-learning models that strengthen the brain of the robot. It enables a robot to use a viable action sequence to use real -time human reaction that becomes as close as possible to satisfy the user’s intentions.
When researchers tested their structure, its success rate was 21 percent higher than an alternative method that did not take advantage of human interventions.
In the long run, this framework can be able to guide a user more easily a factory-tricked robot, which can be to perform a variety of domestic functions, even if the robot has never seen their home or objects.
“We can’t expect the coat to collect data and fine-tune a nervous network model. The consumer will expect the robot to work out of the box, and if it does not, they want an intuitive mechanism to customize it. It is a challenge that we deal with this work, ”Felix Yanwei Wang, an Electrical Engineering and Computer Science (EECS) graduate student and the lead author of a paper on this method.
His co-writers include Lirui Wang PhD ’24 and Yilun du PhD ’24; Senior writer Julie Shah, an MIT professor at Aeronautics and Astronautics and Director of Interactive Robotics Group at Computer Science and Artificial Intelligence Laboratory (CSAIL); At the same time Balakumar Sundarlingam, Xuning Yang, U-Vai Chao, Claudia Perez-Darpino PhD ’19, and Dieit Fox of Navidia. Research will be presented at the International Conference on Robot and Automation.
Reduce efficiency
Recently, researchers have begun using pre-informed generative AI models to learn a set of “policy,” or rules, which a robot follows to complete an action. Generative models can solve many complex tasks.
During training, the model only sees the possible robot movements, so it learns to produce valid trajectory for robots.
While these trajectory are valid, it does not mean that they always align with the intentions of the user in the real world. The robot may have been trained to catch the boxes from a shelf without closing a shelf, but it may fail to reach the box on top of someone’s bookshelf if shelph training has seen that it is differently oriented than those.
To overcome these failures, engineers usually collect data displaying new tasks and re-trained generic models, an expensive and time-consuming process that requires machine-learning expertise.
Instead, the MIT researchers wanted to allow users to run the robot behavior during deployment while making a mistake.
But if a human interacts with a robot to fix his behavior, it can unknowingly lead to a generative model to choose an invalid action. It can reach the box that the user wants, but in this process knock books from the shelf.
“We want to allow the user to interact with the robot without presenting those types of mistakes, so we get a behavior that is very lined with the intentions of the user during deployment, but also valid and possible,” Wang says.
Their framework completes the user by providing a three spontaneous user to correct the behavior of the robot, each of which provides some benefits.
First, the user can indicate the object that they want to manipulate the robot in an interface that shows its camera scene. Second, they can detect a trajectory in the interface that allows them to specify how they want the robot to reach the object. Third, they can physically move the arms of the robot in the direction they want to follow.
“When you are mapping the 2D image of the environment for tasks in 3D space, some information is lost. Says Wang, “physically the robot is the most direct way to specify the intention of the user without losing any information.
Sampling for success
To ensure that these interaction robots do not cause an invalid action to select, such as collision with other objects, researchers use a specific sampling process. This technique allows the model to choose an action from the set of valid actions that are most closely aligning with the user’s target.
“Instead of applying only the user’s will, we give an idea to the robot as to what the user intends, but let the sample process around the sets of their own learned behaviors,” Wang says.
This sampling method enabled the structure of the researchers to perform better than other methods than other methods than simulations and other methods during experiments with a real robot arm in a toy kitchen.
Although their method cannot always complete the task immediately, it provides the advantage of users being able to fix the robot immediately if they see it wrong, instead wait for it to end and then give it new instructions.
In addition, a user naked the robot for some time until it raises the right bowl, it can log the corrective action and incorporate it in its behavior through future training. Then, the next day, the robot can lift the right bowl without the need of a elbow.
“But the key to that continuous improvement is a way to interact with the user with a robot, which we have shown here,” Wang says.
In the future, researchers want to maintain their performance or improve the speed of the sampling process by maintaining or improving. They also want to use robot policy production in novel environment.