Posted on August 01, 2019 by Charles Morris
As you read this, Tesla’s vehicles are out there on the world’s roads, learning, learning. Like precocious children, they’re steadily building a mental image of the world around them, a body of knowledge that forms the heart of Tesla’s Autopilot suite of features.
Above: Tesla's Model X (Instagram: @fromwhereicharge)
One of the chief architects of this 24/7 educational endeavor is Andrej Karpathy, Tesla’s Director of Artificial Intelligence and Autopilot Vision. Karpathy is a specialist in machine learning and image recognition, two fields that are seen as the keys to autonomous driving. He recently hosted a workshop on Neural Network Multi-Task Learning, where he gave a talk entitled Multi-Task Learning in the Wilderness. (For those who prefer the executive summary, YouTuber Mother Frunker helpfully condensed his talk, scroll below, to a nine-minute video.)
Karpathy begins by describing some of Autopilot’s current features: Navigate on Autopilot, which can pretty much drive your Tesla on the highway, executing lane changes and passing slower vehicles automatically; and Summon, which can bring your Tesla to you from a dark and rainy parking lot. “Then you get in like royalty,” says Karpathy. “It’s the best...when it works.”
Above: Summary of Tesla's Andrej Karpathy recent presentation on neural network multi-task learning (YouTube: Mother Frunker)
In terms of functionality, the self-driving systems being tested by potential competitors may seem the same as Tesla’s, but under the hood, they are very different. As Karpathy explains, most other autonomy systems use LIDAR to build high-definition maps, whereas Tesla’s Autopilot uses neither of those things. It relies primarily on eight cameras that provide a 360-degree view around the car. The system parses the video inputs using neural networks to create an image of the surrounding scene.
As safety is the prime goal, nothing less than 99.999% accuracy will do. Like a student determined to earn a scholarship, the system must study and study, constantly building on what it’s already learned. It’s an iterative multi-step process: build a dataset; train your network; deploy it and test it. When you notice that the network is “misbehaving,” you incorporate the errors into the training set. As Karpathy puts it, “you spin this data engine loop over and over,” and the system learns from its mistakes until it reaches the desired level of accuracy.
Above: The future for Tesla's Autopilot will eventually be hands-free (Instagram: themaverique)
When you break down the number and variety of “data points” that (good) human drivers effortlessly process, it’s quite amazing. Static objects, moving objects, road signs, overhead signs, traffic lights, lane lines and road markings, curbs, crosswalks - each of these “tasks” actually has multiple sub-tasks (e.g. a moving object could be a car, a bus, a pedestrian, a cat, etc etc) and all must be correctly identified in real time. The vehicle passes through many different types of environments, from residential neighborhoods to bridges to tunnels to tollbooths.
As Karpathy explains, it’s not all just ones and zeros. Humans have to do a certain amount of “massaging” of the data to ensure that the system is building a useful map of the real driving environment. The end result is a system that’s already transforming the way Tesla owners drive, and is growing more capable by the day.