Posted on April 22, 2019 by Charles Morris
If AIs can win at Jeopardy! and beat human champions at popular video games, how hard can it be for them to learn to drive? Trent Eady, writing in Medium, tells us that AlphaStar, a new AI from Google’s DeepMind, beat professional gamer MaNa at StarCraft II in December. There’s some dispute as to whether this particular man-vs-machine matchup was a fair fight, and MaNa defeated a new version of AlphaStar in a January rematch. However, the AI’s performance was undoubtedly impressive, and the way it learned to play StarCraft has some interesting parallels with the way Tesla is believed to be training its AI systems to drive.
Above: Tesla's Autopilot continues to improve (Image: Tesla)
Mr. Eady explains that AlphaStar mastered StarCraft by using two different machine learning techniques. In supervised imitation learning, an AI examines a huge number of examples of something, and eventually learns how to recognize that something - in the classic example, if you show an AI a million photos of cats, it will learn to identify a cat, as opposed to a dog (which isn’t as easy as it sounds). The second technique, reinforcement learning, is a process of trial and error - an AI takes a random action, observes the effect, and learns which actions lead to the desired results. “Imitation learning followed by reinforcement learning is a one-two punch I suspect we could see a lot of in the future,” writes Eady.
Tesla vehicles produced since October 2016 include the hardware suite that Tesla says will eventually enable full self-driving. As these cars drive around, they collect data from the cameras and other sensors, and the Autopilot computers can use that data to learn by example. The driving decisions of a human driver are analogous to the pictures of cats in the example above.
Above: Tesla's Autopilot in action (Instagram: themaverique)
As Amir Efrati writes in The Information, “Tesla’s engineers believe that by putting enough data from good human driving through a neural network, that network can learn how to directly predict the correct steering, braking and acceleration in most situations.”
How much data is enough? There are currently about 400,000 cars with the Autopilot hardware on the road, and they drive a collective 12.5 million miles per day. The size of the fleet is increasing by over 5,000 cars per week. Tesla is the only company that has anything like this data set.
Mr. Eady reports that Google’s Waymo tested supervised imitation learning on a small scale, using data from up to 75,000 miles of driving, with impressive results. Israeli firm Mobileye, which previously supplied equipment for Tesla’s Autopilot, and is now owned by Intel, has achieved some success with reinforcement learning, and released a video showing its system in action on the famously chaotic streets of Jerusalem.
OpenAI, one of Elon Musk’s many recent side projects, used reinforcement learning to train an AI called OpenAI Five to play a video game called Dota 2, and it was able to beat a team of skilled human players. OpenAI’s Chief Scientist, Ilya Sutskever (who delivered a lengthy speech on the latest advances in AI at the recent AI Frontiers conference), said that, while AI researchers are skeptical that reinforcement learning is capable of solving hard problems, his organization’s work with OpenAI Five demonstrates that reinforcement learning simply hasn’t been used with the scale of training data required until now.
Above: Insight into OpenAI's efforts with Dota 2 (Youtube: OpenAI)
Mr. Eady reminds us that we don’t know for sure that Tesla is using reinforcement learning to train Autopilot - so far the company has only said it’s using imitation learning. However, considering the vast amounts of driving data Tesla has available, and the success that others have been demonstrating with reinforcement learning, it seems likely that the company is combining the two techniques in its quest to teach a machine to match (or hopefully, exceed) human driving ability. In a recent posting for an intern position, Tesla said it was looking for candidates with expertise in reinforcement learning.