The latest AI model from Google DeepMind has the ability to execute a variety of tasks, learn and improve itself.
The world is increasingly adapting to the changing tides of technology. Even as artificial intelligence continues to make strides across industries, robotics is not far behind. Google DeepMind has just introduced a new dimension to robotics. The latest creation known as RoboCat has the capability to perform various tasks through diverse robotic arms.
RoboCat can be categorised as a different league of robotics owing to its unique ability to ‘tackle and adjust’ to various tasks using different types of robots in real-world scenarios. This is something that Google DeepMind claims have never been achieved before in robotics.
In its official post, DeepMind said that most robots are programmed to perform specific tasks. However, with the advances in AI, robots may be able to perform more tasks. It said that the progress in general-purpose robots is considerably slow due to the time consumed in gathering real-world training data.
“RoboCat is a foundation agent for robotic manipulation as such it can perform many tasks with multiple robot types and it can adapt quickly to previously unseen types of robots and skills. We can communicate a task we want RoboCat to perform on any robot by showing it a desired configuration of objects to one of the cameras – this becomes the agent’s goal,” the company said in its demonstration video.
What is RoboCat?
Google DeepMind claims RoboCat is a self-improving AI agent for robotics. It learns to perform a wide range of tasks across different arms and then generates new training data on its own to improve itself.
Several researchers in the past have explored robots that can learn to multitask at scale and comprehend large language models along with the real-world capabilities of a helper robot. According to Google DeepMind, RoboCat is the first agent to perform and adapt to multiple tasks and to do the same across different real robots.
How does RoboCat learn and improve itself?
RoboCat is based on Google DeepMind’s multimodal model Gato that can process language, images, and actions from simulated and physical environments. The company claimed that it infused Gato’s architecture with a large training dataset which is sequences of actions and images of various robot arms that are solving hundreds of tasks. After this round, the company said that it launched RoboCat into a self-improvement training cycle with a set of unseen tasks. The learning of new tasks took place in five steps.
“The combination of all this training means the latest RoboCat is based on a dataset of millions of trajectories, from both real and simulated robotic arms, including self-generated data. We used four different types of robots and many robotic arms to collect vision-based data representing the tasks RoboCat would be trained to perform,” the company said in its official post.
RoboCat is essentially an agent that is a visual goal-conditioned decision transformer that has been trained on video clips of hundreds of tasks being done. The data is gathered from a vast set of real-world robot arm types and simulated environments.
The most noteworthy facet of this agent is that it continues to learn and improve itself with each new task. The first model reported a success rate of nearly 36 per cent on previously unseen tasks after being presented with 500 demonstrations. However, Google DeepMind says that after RoboCat learned more tasks, its success rate more than doubled. The versatility, adaptability, and multimodal capabilities of RoboCat can have far-reaching benefits in the field of robotics.
Source:indianexress.com