The Companies You’ve Never Heard of Training Your Autonomous Driving Car

In this modern euphoria around autonomous vehicles, a number of new players have come into the fray from outside the traditional automotive industry. Self-driving startups like Zoox,, and May Mobility are names that typically come to mind. These are companies that are building out the full intelligent systems — the brains that enable an autonomous vehicle (AV) to rationalize how to navigate on the road.

Amid this growing hype, far less attention has centered on a number of lesser known but increasingly important players in the AV ecosystem. Such players do not strive to be the brains of the AV but are rather focused on the important task of training the brains of an AV to operate intelligently. For our purposes here and given the lack of market definition in this field, let’s call such companies AV-training companies. As I’ll explore, such players are an emerging part of the AV ecosystem. While lesser known, they will become increasingly important to the development of autonomous driving technology.

1. How Automotive Companies Train their AV’s Today: The Emerging Role of AV-Training Companies

Let’s start at the foundations. How do automotive companies train the AV today? Today automotive companies primarily rely on two techniques: 1) real-world driving and 2) AV simulation:

  1. Real-world driving: Training the car relies in part on real-world driving of the AV. Industry experts often cite 6 billion miles as the common estimate of the number of miles it will take to train an AV to drive on the road with the accuracy and intelligence of a human being. However, this is much easier said than done, given the high costs in time and money of driving such lengths. Moreover, real-world driving is by itself insufficient because automotive companies have no means of ethically or effectively testing all the various events that may happen on the road — the so-called corner cases. For example, you can’t effectively test different scenarios for how a car would respond in an imminent car crash or solve the more ethical quandary of deciding whether to kill the baby or the grandma in a would-be accident. That is why simulation becomes necessary.
  2. Simulation: Automotive companies also use software simulations to train their AV’s how to correctly respond in various scenarios. Simulations can replicate not only the software and hardware dynamics of a given car, but also the environment in which an AV would drive (e.g. the road, other moving cars, pedestrians, etc.). Accordingly, simulators solve the biggest practical limitations of real-world training in that 1) they are much cheaper and faster to run and 2) they enable the testing of scenario corner cases (e.g. car crashes, ethical dilemmas, etc.) that would be impossible to run in real-life. Just like real-world driving, simulators are not a cure-all and face various limitations around the imperfect state of the technology for the automotive market.

Given these two approaches to training the AV, two types of companies have emerged in this new AV training frontier. These include:

A conceptual framework for understanding the two main types of AV training companies. Note that this does not purport to be a comprehensive list of all such companies.
  1. Training-Data-as-a-Service companies that offer services to label data produced during real-world driving; and
  2. Simulator companies that provide the software simulation technology to automotive companies for training their AV systems.

Let’s explore these two types of AV training companies in turn.

2. Training-Data-as-a-Service Companies

When operating in the real world, AV’s generate a lot of data. Cars have to process information coming from various sensors, including LiDar, camera, radar, among others. From this data, they must identify any objects they encounter on the road. Is that box-like blob a car? Is that iridescent rhombus a person? Is that shiny beam a red light or a green light? AV’s will have to answer all such questions to run effectively.

The only way that an AV can learn how to identify a given object is by training it with labeled data. Today automotive companies go through terabytes and terabytes of data, labeling objects so that AV’s can understand them. A person responsible with the task of labeling, for example, may go through thousands of pictures, drawing bounded boxes around objects that looked like cars, so that the next time the AV came across a similar looking object, it would know that it is indeed another car.

The output of an AV labeling task from MightyAI.

As you might guess, labeling is a time-consuming and expensive process, as training the AV requires loads and loads of labeled data to be effective. Many automotive OEM’s simply outsource such repetitive tasks to interns. Such projects, however, still are not scalable; after all, there’s only so many interns you can hire, and interns aren’t always the cheapest resource.

Understanding the need for labeled data by automotive companies, a number of startups have emerged to provide training data as a service. MightyAI and Crowdflower are probably the two most known players in this space. They provide labeled data at scale and at far lower costs than automotive companies could complete such projects internally. They typically work by outsourcing the task of labeling to their networks of thousands of contractors living in other countries, with the Philippines as the go-to country of choice for most such providers due to the low cost of labor. Some providers also combine the manual work of paid contractors with their own machine learning algorithms that can identify objects as a way to label large data sets more cheaply and at greater scale.

Training-data-as-a-service companies represent an interesting emerging market in the AV ecosystem. However, what is less clear about this market is the ability of such companies to become large, scalable businesses. This is because the likely trajectory of such companies is a race to the bottom: that is, the company that can label data the most cheaply will win. Of course, ensuring the accuracy of the labeled data will figure as another factor determining a player’s competitiveness, but the ultimate arbiter of success will come down to a company’s ability to optimize its business model to provide the cheapest solution.

3. AV Simulation Companies

To compensate for the practical limitations of real-world driving, automotive companies are currently using software simulators to train AV’s. Simulators replicate in digital form the software and hardware components of AV technology, as well as the digital world in which such AV’s operate. Simulators help engineers test the effectiveness of an AV, as well as optimize the system as a whole. You can think of software simulation involving two main parts: 1) The training agent: the simulation of the AV you want to train; and 2) the environment: the simulation of the dynamic environment in which the training agent operates, including such elements as road textures, dynamic agents like other cars and pedestrians, the road geometries, etc.

Cognata’s simulation engine. See

Simulators are not new technology. Engineers across aerospace, robotics, and even automotive have used them for development and R&D for the last couple of decades to simulate how machines act in the real world.

However, the thing that is “new” in this market is the emergence of startups developing simulation products specifically tailored to meet the needs of autonomous driving. Notable companies include Cognata, Righthook, Monodrive, and AIMotive. Such companies provide a holistic solution for AV simulation, including simulation of both the AV car undergoing training and the environment in which it operates. Other companies, such as Ansys, also provide engineering simulation solutions but release their products across multiple verticals — from automotive, aerospace, consumer products, and so forth — as opposed to solely tackling the automotive market.

Given the lack of mature simulation solutions historically and even still today, most automotive companies have ended up building AV simulators of their own, leveraging existing open-source and proprietary technologies as well. For example, an automotive company with an in-house simulation platform may use the Unity or Unreal physics engine to model how moving objects would interact in a certain environment; it may leverage the libraries of OpenScenario to run different scenarios in the simulation; and it may build its own software module for modeling the AV’s path and behavioral planning. At the end of the day, however, building simulators often falls far outside the core competency of automotive companies. Automotive companies are better suited toward tackling the software and hardware that actually make it into the car rather than building the software tools that enable such development. Hence automotive companies have begun to explore the simulation solutions provided by this new tide of startups.

Importantly, the path for AV-simulation startups to become big businesses by playing within the automotive ecosystem alone will be difficult. Automotive companies have already built out a lot of their simulation software using open-source and proprietary solutions. As such, they are unlikely to seek holistic platform solutions from simulator startups but only purchase certain software modules for areas like scenario-generation or the training environment for which they do not already have sophisticated solutions. Coupled with the limited number of automotive OEM’s, Tier-1’s, and even Tier-2’s that would potentially use such software, the market opportunity for simulation startups is extremely small. From my estimate, the market is no bigger than a few hundred million dollars.

Simulation startups bent solely on the autonomous driving problem therefore face a difficult battle ahead. A company may still be able to become a big business in this space, but it will have to tackle other market verticals outside of automotive only. Ansys is one multi-vertical simulation company well-positioned here. Further, companies that can provide the necessary technologies adjacent to the simulation market also have significant room to grow. Rescale is one such company. It provides high-performance computing infrastructure to run simulations for various applications across multiple industries. Other than such market-agnostic solutions, opportunities with the AV simulation space are otherwise limited.

4. The Venture Capital Opportunity(?) in AV Training

Now that I’ve belabored the point of the importance of AV training, an important question remains. If you’re a startup founder itching for the next big thing, you might wonder whether you should start your new AV training company. On the flip side, if you’re a venture capitalist, you might consider whether investing in this market is a smart thing to do. Let me give you the most lawyerly answer I can conceive of: it depends.

For training-data-as-a-service companies, their success will ultimately come down to how they optimize their business model and technology development toward the goal of becoming the cheapest solution. In this effort, they should never accept any compromises in the quality of their labeled data by developing technology mechanisms to ensure quality control.

Simulation startups must overcome the biggest hurdle of the limited market opportunity within automotive for simulation tools by developing solutions that can service multiple market verticals (e.g. automotive, aerospace, robotics, etc). Some may also tackle problems that we would consider adjacent to the simulation space, such as Rescale’s high-performance-computing solution.

While AV-training companies are the lesser known players in autonomous driving ecosystem, they will form an increasingly important role in the future of autonomous driving. How automotive companies will perform in the race toward autonomous driving will depend not only on how well and quickly they can build such technologies but also buy and partner with such companies — whether startups or established players. As for AV-training companies, the race is on!

Feel free to add me and message me via LinkedIn if you’re interested in talking shop:

Disclaimer: This blog represents solely the opinions of myself, not my employer.

Principal at BMW i Ventures. VC trends. AI themes. Social commentaries. A personal blog bridging tech, business, and human issues by a curious mind.