Since then, home robot Astro has been turning heads Amazon unveiled the device last fall. Customers can ask the pet-sized robot to patrol the house, Check out petshandling video calls, ordering groceries, and even bringing a drink. But few people are more amazed at his abilities than the scientists who brought him back to life.
“Even as someone who works on this kind of stuff for a living, it feels like magic,” says Wontak Kim, an audio engineer at Amazon who helped his team teach Astro to precisely process audio.
It may sound like magic, but Astro’s ability to respond to requests in a crowded room is actually the result of countless hours of dedicated work. Kim’s team, which is part of Amazon’s Devices and Services organization, includes audio scientists and engineers at Amazon’s Audio Lab in Cambridge, Massachusetts. Working with colleagues in Sunnyvale, California, and Bellevue, Washington, they designed and built Astro’s voice features, including speech recognition and voice and video calls. They knew that for a home robot to be successful, it needed to be able to clearly understand and process voice requests. But not only that; Astro’s video calling feature has to be running pretty much in real time for customers to be able to use it.
“Humans cannot tolerate latency with sound,” says Mrudula Athi, an audiologist on Kim’s team. “Even 20ms of delay is immediately noticeable. So, for Astro, we needed to process and clean up 125 frames per second of the audio signal.”
The magic lies in untangling the sound waves
The Astro’s voice features use Amazon Alexa, the company’s AI voice. On any Alexa-enabled device, Alexa doesn’t automatically identify speech the way we speak when someone speaks to us. As you perform a speech request, sound waves bounce off walls and ceilings on their way to the device’s microphone.
With Astro, this challenge is compounded by the fact that the robot is moving around the house. In order for the robot to satisfy customers, it needed to accurately process speech requests without the distraction of pets or other common household noises, the subtle sounds of the electric motors powering it, or the music or other sounds being played. For example, when the Astro moves over a tile floor, says Amit Chhetri, a senior scientist on the Sunnyvale team, “the noise level of the wheel in the microphones is higher than that of speech.”
The magic lies in untangling all the extra sounds.
“If you send all this noise to a speech recognition app, it won’t work very well,” says Athi. “Our job is to take those microphone signals and make sure they are clean enough so that Alexa can perform at a level that results in a good customer experience.”
All that sound sorting has to happen, too quickly.
This is a tough problem, and Amazon has put together some serious brainpower to solve it. The Astro audio team included acoustic scientists well versed in the physics of sound, application researchers building algorithms for processing sound waves, and software engineers weaving those algorithms into robust code.
Taking AI-based algorithms to a new level
The team first focused on muting background noise during audio and video calls, so people can talk and understand each other even while the bot is moving around in a noisy place. To make it all work as quickly as it should, the team used an AI-based algorithm called a deep neural network (DNN), which is often used to tackle sound and computer vision problems. But they took it to a new level. Chhetri, specifically, designed a new network architecture that reduces background noise and eliminates echo, allowing Astro to handle calls.
Use simulation data
DNN – especially advanced ones like Chhetri, Athi and the team – requires a lot of data to train. This is where the team’s voice simulation expert came in. Because of the data he generated, Athy says the engineers were able to rely on simulating audio “of a person speaking from different locations in different types of rooms, with different levels of artificial room noise.” Amazon acoustic scientists typically use simulation data for projects like helping devices identify sound sources. But with Astro, the team had to take it a step further. Since the robot makes its own noise, they needed more data on Astro to build their speech enhancement model.
Another Amazon team has recorded Astro making a distinctive noise while driving around the house in all sorts of scenarios. Athy says this data was ideal for their speech improvement problem. So, she mixed it with the speech data sets she had collected to train the bot, and she solved the problem.
“state of the art” solution
The audio team was happy with the result, but now they had to put all that code inside the bot, which was another unique challenge. But then again, teams from Amazon’s Audio Labs have gone all over the country. The result is incredibly advanced, Athy says.
“The amount of noise reduction we get with speech enhancement performance that we have, with our ability to operate in real time, not in the cloud, but on the device… That whole thing together is state of the art,” she says.
Having Astro’s speech enhancement feature installed on the device is one of the things Athey says she’s most proud of in her career. But Kim, Athi, Chhetri, and the rest of the audio team won’t be stopping anytime soon. They’re continuing to improve Alexa’s speech recognition, Astro’s speech improvement, and have a number of projects in progress that they’re happy to present to customers.
“We’re very proud to work in this voice field for Amazon and for customers,” says Kim.
Want to know more about all the fun, convenience and security that Astro has to offer? Check out the updates announced by Amazon To the home robot in the launch of devices and services.
Illustration by Mojo Wang.
#Meet #Amazon #researchers #helping #robots #sense #noise #hear #results