Google AI introduces the Robotics Transformer 1 (RT-1), a multitasking model that codes robot input-output routines to enable efficient runtime inference


The primary source of the latest technological advances we see today in many subfields of machine learning is the knowledge transfer that occurs from large, non-task-specific data sets to expressive models that can effectively ingest all of this data. This ability has been remarkably demonstrated previously when it comes to areas such as computer vision, natural language processing, and speech recognition. However, its application is still undetermined when it comes to robotics. A major contributor to this limitation is the lack of extensive and diverse robotic data, which limits the model’s ability to accommodate a wide range of robotic experiments. Moreover, another concern is the lack of scalable models and their ability to generalize learning from such huge datasets.

Researchers from Google AI have worked in this direction and detail that a combination of open task-neutral training and high-capacity engineering capable of ingesting all the different robotic data is key to success for general robotic models. To test their hypotheses, a team of Google AI researchers created the Robotics Transformer 1 (RT-1), a multitasking model that codes a robot’s input and output actions to facilitate efficient runtime inference and enable real-time control. This model was developed using a real-life bot dataset of more than 130,000 episodes collected with 13 daily robots (EDRs) over a long period.

The main distinguishing characteristics of the RT-1 are image encoding, action encoding, and token compression. The architecture of the transducers underpinning the RT-1’s design allows it to effectively create symbolic actions from its inputs which include a brief history of images captured by the robot’s camera and task descriptions written in natural language. The input images are run through a model pre-trained on ImageNet during the image encoding step, and then the output is normalized accordingly. The image token then uses the FiLM layers to extract the image features necessary for the task at hand. For learning with the TokenLearner attention module, the model adaptively selects soft sets of compressible image tokens. This is what speeds up inference.

The researchers stressed the need for a large and diverse data set of robot trajectories in order to develop such a system that can generalize to novel tasks and demonstrate robustness to various stimuli and backgrounds. The researchers used 13 EDR bots to collect 130,000 episodes over a 17-month period to create this dataset. Data collection includes activities such as picking and arranging things, opening and closing drawers, knocking things over, etc. Additionally, they added a written description of the bot’s action as annotation to each episode.

The team evaluated the generalization capabilities and performance of the RT-1 against three base models in four categories: performance on known tasks, performance on unseen tasks, power, and long horizon scenarios. In all four areas, the RT-1 performs much better than the baselines, showing vastly superior zero-shot generalization to new missions, environments, and objects. They also examined the effects of coding, procedure representation, dataset composition, and several other design decisions that were built into the model and training set.

In summary, the RT-1 Robotics Transformer is a straightforward and scalable motion generation model suitable for real-world robotics tasks. When it comes to future work, the researchers will focus on increasing the number of robot skills faster by creating technologies that allow even beginners to train a robot through guided data collection and model routing. They predict that scalable attention and memory will enhance the response times and retention of autotransformers. Google has also opened up the RT-1’s code in hopes that it will prove to be a useful tool for upcoming research on scaling up robot learning. The project website and other details can be accessed over here.

scan the paper And the Articles. All credit for this research goes to the researchers on this project. Also, don’t forget to join Our Reddit page And the discord channelwhere we share the latest AI research news, cool AI projects, and more.

Khushboo Gupta is a Consultant Trainee at MarktechPost. She is currently pursuing her Bachelor of Technology degree from Indian Institute of Technology (IIT), Goa. She is passionate about the areas of machine learning, natural language processing, and web development. You enjoy learning more about the technical field by participating in various challenges.

#Google #introduces #Robotics #Transformer #RT1 #multitasking #model #codes #robot #inputoutput #routines #enable #efficient #runtime #inference

Leave a Comment

Your email address will not be published. Required fields are marked *