The Autobahn is a world-famous German system of motorways. From afar it is not possible to distinguish the Autobahn from motorway systems in other countries. However, the Autobahn is very different in one way: on most of the Autobahn there are no speed limits. Want to drive 160? 180? 240? Fine, then do that.
Obviously, doing away with strictly enforced speed limits increases the probability of more serious, more frequent and more deadly car crashes. In other words, in Germany we pay for freedom with lives. And as a result, there is constant pressure on the German government to introduce a speed limit.
One reason why it is so impossible to settle the debate, whether or not to introduce speed limits, is that no one really knows how costly, in terms of human lives, the lack of speed limit is. Until recently, when a data journalist at Der Spiegel, a respected Hamburg based newspaper, made an honest attempt.[i]
Math from Hamburg
The article makes a big claim in its headline: speed limits could save up to 140 lives a year.[ii] 140 lives, in a country of around 80 million, is astonishing. If this was the true number, the policy would be a no-brainer. Unfortunately, it probably isn’t the true number. In fact, if this was the true number, then it implies that 34% of all road deaths in 2017 could be avoided, if only speed limits had been introduced. Which sounds – and probably is – too high.
The author of the article, Patrick Stotz, is clearly highly skilled. He carries out a complex programming exercise and serves his analysis to readers on in a language that everyone, data-minded or not, can easily understand. More importantly Stotz is completely transparent with his work. In a separate blog post[iii] Stotz provides more detail on his results, and on his Github page[iv] he shares all the data, for anyone – including me – to download and scrutinise.
A bit about the data
Stotz truly carriers out some impressive data collection, matching and cleaning work. I am not entirely sure how he has compiled this data (that is on me, I am not a programmer), but if it is accurate (which I have no reason to believe it’s not) Stotz should be praised for compiling and creating a fantastic database of speed limits and road accidents.
The database contains 42,410 segments of a road (90% of them less than 1.1 km long). Each segment has information on the following variables:[v]
- speed limit (states the speed limit on the segment),
- total annual traffic (how many km were driven on the road in 2017),
- number of fatal accidents, and
- number of serious accidents.
The data is not complete for all roads and all deaths, although it comes close. Also, it is not clear if there are biases in the dataset, which would either favour motorway deaths over rural road deaths or not. These are of course very important issues, which should be addressed, if possible. But for sakes of clarity I have decided to leave them aside in this note.[vi]
All in all, a great dataset that enriches the discussion on the speed limit.[vii]
How is the figure 140 calculated?
First thing Stotz does in his calculation is to split the road sections into two groups:
- segments without speed limit; and
- segments with speed limit.
For each segment he then proceeds to summing up (individually):
- Total annual traffic (km driven), and
- Total fatal accidents.
Then for each group of segments, he divides the total fatal accidents with the total annual traffic. The metric he gets out of that math is: deadly accidents per million km’s of driving.
Table 1: Replicated results of the Spiegel article
|Segment||Fatal accidents, per million km driven|
|Segments with speed limits||0.92|
|Segments without speed limits||1.67|
|Note: In the original calculation the segments with speed limits was 0.95. This is most likely due to the difference in treatment of the unclear speed limits (such as “signal”). I removed any ambiguous speed limits, which Stotz probably did not. This is however not important in the context of my analysis.|
In order to get to a counterfactual number of deaths (i.e. how many lives would have been lost had there been a time limit), the article assumes that if speed limits were introduced then death rates would be the same on current segments without speed limit as it is today on roads with speed limits.
In the dataset, there were a total of 242 deaths, 190 of which occurred on segments without speed limit. The segment that is without speed limit accounts for around 114 million kilometres of driving. So, in order to replicate Stotz calculation I multiplied the 0.92 (from the speed limit segment) with the total miles driven on the segment without limit. And the result was 84.6.
84.6 fatal accidents is not the number of fatal accidents that could have been saved had there been a speed limit in 2017. In fact, it is only the number of fatal accidents that would not have occurred had the fatality rate been the same on segments with and without speed limit. That is an important distinction.
84.6 lives account for around 35% of all deaths recorded in the database. But the database only contains 242 fatal accidents, whereas the number of accidents (and total fatalities) was higher in the year 2017. But Stotz gets around that issue with a simple assumption of no bias in his data, and by assuming that his sample is representative of all roads in 2017. He never states that, but it is intrinsic in how he gets to the 140 number. Adopting his assumption I can multiplying my 35% figure (change in deaths if speed limit was introduced) with the actual death toll in 2017 (409), and get to 143 lives saved, only 2 lives from Stotz’s calculation.[viii]
What’s wrong with that approach
Like with any calculation of such uncertainty, there are number of areas which can be critiqued. However, the key issue I took with the calculation had to do with the comparator group (that is, the group used to calculate the counterfactual deaths). And the implicit assumption of a speed limit well below the 130 km/h.
That assumption crashes the 140 figure into its death, at 110 km/h.
What Stotz does in his calculation is that he aggregates all segments on the autobahn that have a speed limit into a single comparator group. Regardless whether that segment has a limit of 130, 100 or 60 km/h. And by aggregating all roads into a single speed limit group, Stotz dilutes the comparator group – effectively raising the denominator (millions of km’s driven) for the death rate on segments with speed limit.
For the sakes of clarity: since almost no fatal accidents occur at speeds under 90, this approach deflates the death rate unreasonably. And in fact, it implies that the weighted average speed limit that would have to be imposed on the segments that are currently with no limit is around 110 km/h.[ix]
Table 2: traffic, deadly accidents and death-rate, by speed limit.
The discussion in Germany is not if a speed limit of 110 km/h should be applied, but rather whether a speed limit of 130 km/h is to be put in place. Therefore, the reasonable comparator group is not all roads, including 80 km/h roads, but roads that currently have a speed limit of 130 km/h. Which are associated with death rate of 1.53.
When I replicate Stotz’s calculation with that the death rate of a more reasonable comparator I get that the 32 lives would have been spared in, had the death rate on segments with no speed limit bin the same as the death rate on roads with 130 km/h as speed limit.
That approach is however not with-out shortfalls. Firstly the sample size on 130 km/h roads is small and they represent a small fraction of traffic as well as small number of deaths (see table above). Therefore, alternatively it might make sense to use the 120 km/h roads, as they count for around 15% of traffic and 20% of deaths. Since the death rate on 120 roads is relatively low, that calculation would give a much higher maximum estimate of 216 lives saved.[x] But now we are trading off realism and interpretation for sample size; the debate is not about setting the limit to 120 km/h, but 130 km/h.
My attempt to calibrate the value
Setting potential biases and intrinsic assumptions aside, another important issue in the calculation is the small sample size. The fact that the best comparator groups (130 km/h) only has 15 recorded deaths prevents us from generalising about any number derived from any calculation done with that number in the numerator. And, it is extremely prone to small errors that might exist in the database.
As an example, my calculation using the 130 km/h as comparator group implied that 15 accidents (32 lives) could have been avoided, had the death rate been the same on 130 km/h segments as it was on segments without speed limit. If, say, there is an error in the database and there were in fact 17 accidents that occurred on 130 km/h segments. This error would suffice to push the death rate of the 130 km/h segment above that of the segments without speed limit and turn the entire headline on its head: introducing speed limits could kill 17 people a year. Which is non-nonsense.
Therefore, in order to increase the sample size, I had to be creative. The database also contains information on serious accidents, and rather than looking only at the small number of deadly accidents, I decided to see what the figure would be if we were to look at all serious accidents (deadly and non-deadly). Effectively rephrasing the question to:
“what could the reduction in the rates of serious accidents be, if speed limits were to be introduced?”
For number of reasons the estimate from this exercise is still far from being the ultimate truth (this is clearly not causally robust, and there is no lack of omitted variable bias). But, with a much larger sample sizes, I think I can improve the accuracy of the number somewhat.
Table 3: serious (deadly and non-deadly) accidents
The answer to the question I posed earlier (“what could the reduction in the rates of serious accidents be, if speed limits were to be introduced?”), again depends. In this calculation, it turns out that serious accidents are more likely to occur at 120 km/h roads than on 130 km/h. Again, we are not dealing with very large sample sizes. However, it can be said that around 13-19% of serious accidents could have been avoided, had the accident rate been the same on segments without speed limit as it was on roads with 120 or 130 km/h speed limit. Which, allowing for a bit of assumption flexibility, equates to somewhere in the range of 52-77 lives. Which is still a huge number and suffices to make an explosive headline.[xi]
Table 4: reduction in serious accidents, by different speed limit serious accident rates.
Some final words
This article is not meant as an attack. Stotz deserves applause for his work. It is really good. Instead, it is written to help people – data journalists and readers of data journalism – understand the complexities that lie behind the statements they make and read. Being a data journalist is hard. Good news outlets, such as Der Spiegel, should invest more in the support of their data journalists. Hire a chief statistics officer, for quality control. Get proof-readers, but not for words but for stats.
Although I do not expect this article to go viral in Germany (especially in that light that I am too lazy to translate it), I would still like to put my safety break on and make my personal position clear: In principle I am against introducing a speed limit. I think it is one of the many things that reflect the, often unnoticed, German free spirit. Just like the fact that there is a nude beach right across from the Cologne cathedral, in the middle of Germany’s fourth largest (and greatest) city. Or the low age limit (which is entirely ignored anyway) on alcohol purchases. Germans love their rules, and German bureaucracy is a what the House that Sends you Mad (see Asterix and the 12 Tasks ) is built on. But at its core Germany is not a nanny state. And if some Bavarians want to reduce his life expectancy, by rocketing down the Autobahn, then that is his freedom to do so.
At least that was my conviction, before I looked at this data.
The externality caused by fast drivers is substantial. The driver of the BMW only pays a part of the cost (with his live) when he crashes into slow, innocent bystanders, who had never agreed to decrease their live expectancy. This probably happens all the time and amounts to a transfer of value from carful people to people who like to live dangerously. The danger seekers get freedom, but at the cost of higher probability of death from the careful folks. And that is not fair.
Which puts me on the fence.
Anyway, the dataset contains lots of information. Spiegel should keep working on it. They should try and understand what caused these accidents and look at how speed played a role. They should slowly, but steadily, build a more detail into the data-set. An even though they will never be able to get the detail on thousands of road-deaths, they could get the detail on hundrads. And that could really help inform poicymakers. Becuase if you have too choose, good data is better than big data.
[i] The main reason is because it is an awful partisan debate. A bit like the American gun debate, but objectively not as dumb.
[v] The data contains more variables, but these are the once that Stotz used in his calculation and I will use in mine.
[vi] Although I could speculate on biases, I do not know enough about the data collection process to be in a position do make meaningful comments. But it would be interesting to hear Stotz’s thoughts, as he is obviously very close to the data.
[vii] The data is an aggregated cross-section for one year. In other words, there is no time dimension. Which means it is hard to come up with a creditworthy empirical strategy that can truly address causality.
[viii] When I replicated Stotz’s calculation I got 143 lives saved, a slightly higher number than he did. In his calculation, Stotz got 141 (before rounding in the headline, which I fully endorse!). I suspect the difference being down to difference in our data cleaning methods. However, a difference of two is nothing to loose sleep over.
[ix] I am being a little bit liberal with the language here, but to be clear the 110 km/h is the weighted average of the speed limits of the individual speed limits in the comparator group.
[x] This would imply that 62% of all lives lost on roads without limit and 130 km/h roads could have been saved had it been for 120 km/h limit. Which is completely unbelievable, especially when looking at the relatively moderate death rates on German roads, compared to other developed nations which do all have speed limits.
[xi] When extrapolating to number of deaths I am making a big assumption on the distribution of deadly and non-deadly accidents on 130 km/h roads and roads without speed limits. This could easily be challenged, and in case that there is a large difference in these rates I could potentially be underestimating the lives spared.