Waymo Safety Impact

It may seem like the miles driven by Waymo (in the 100’s of millions of miles) pales in comparison to the billions of miles driven in the cities where Waymo drives, or trillions of miles driven annually in the entire United States. When comparing the rates of two populations, however, the conclusions you can draw from data are governed by what is called statistical power . The question being answered by the Safety Impact Data Hub is are the Waymo and benchmark crash rates different? The input to this calculation is the number of crashes and the number of miles driven by Waymo and the benchmark populations and is modeled using a Poisson distribution, the most common distribution for handling count data.

An example of this problem would be to examine the number of students that do not pass an exam. In a school district, say that 300 out of 1,000 students that take the same test do not pass (3 do not pass per 10 testtakers). One could ask whether a Class A of 20 students performed differently than the overall population on this test (note we are assuming passing or not passing the test is independent of being in Class A for the sake of this simplified example). Say Class A had 10 out of 20 students that did not pass the exam (5 do not pass per 10 test takers). Class A had a not pass rate that is double the rate of the school district. When we use a Poisson confidence interval, however, the rate of not passing in the class of 20 is not statistically different from the school district average at the 95% confidence level. If we instead compare Class A to the entire state of 100,000 students (with the same 3 not pass per 10 test takers rate, or 30,000 out of 100,000 to not pass), the 95% confidence intervals of this comparison are almost identical to the comparison to the county (300 out of 1000 test takers). This means that for this comparison, the uncertainty in the small number of observations in Class A (only 20 students) is much more than the uncertainty in the larger population. Take another class, Class B, that had only 1 out of 20 students not pass the test (0.5 do not pass per 10 test takers). When applying the 95% confidence intervals, this Class B does have a statistically different pass rate from the county average (as well when compared to the state). This example shows that when comparing rates of events in two populations where one population is much larger than the other (measured by test takers, or miles driven), the two things that drive statistical significance are: (a) the number of observations in the smaller population (more observations = significance sooner) and (b) bigger differences in the rates of occurrence (bigger difference = significance sooner).

Now consider another experiment with Waymo data. Consider the figure below that keeps the number of Waymo airbag deployment in any vehicle crashes (34) and VMT (71.1 million miles) constant while assuming different orders of magnitude of miles driven in the human benchmark population (benchmark rate of 1.649 incidents per million miles with 17.8 billion miles traveled). The point estimate is that Waymo has 71% fewer of these crashes than the benchmark. The confidence intervals (also sometimes called error bars) show uncertainty for this reduction at a 95% confidence level (95% confidence is the standard in most statistical testing). If the error bars do not cross 0%, that means that from a statistical standpoint we are 95% confident the result is not due to chance, which we also refer to as statistical significance. This “simulation” shows the effect on statistical significance when varying the VMT of the benchmark population. This comparison would be statistically significant even if the benchmark population had fewer miles driven than the Waymo population (10 million miles). Furthermore, as long as the human benchmark has more than 100 million miles, there is almost no discernable difference in the confidence intervals of the comparison. This means that comparisons in large US cities (based on billions of miles) are no different from a statistical perspective than a comparison to the entire US annual driving (trillions of miles). Like the school test example, Waymo has driven enough miles (tens to hundred of millions of miles) and the reductions are large enough (70%-90% reductions) that statistical significance can be achieved.