Archive for January, 2008
This is a summary of my more detailed post on Friday, outlining the key points.
- The DfT surveyed traffic in June and July.
- They waited until September to ensure no late corrections to the data.
- By this time many of the bikers surveyed had taken their bikes off the road for the winter.
- At this point the DfT checked the figures against the DVLA’s VED database.
- The bikes which had been removed from the road were mistakenly assumed to be evading tax.
- The error was then amplified by a “corrective” assumption that tax dodgers would use their bikes less and get missed by the survey, the number of evaders would be underestimated. This one step doubled the number of bikers assumed to be riding without tax and it did this because it assumed bike mileage figures would match evasion figures in the same way as they do for trucks (The only category of vehicle they have stats for). This doesn’t take account of the important fact that the average motorcycle covers many, many fewer miles than the average trucks. That means that the people you see most often are not necessarily people traveling furthest as they would be for trucks, but are much more likely to be people who just live nearby to a survey site. This means that the assumption that you will see lower than actual levels of evasion (because tax dodgers travel less far) is undermined, as only a very few of the people on bikes are travelling large distances.
- Because of the small number of motorcyclists surveyed, the DfT’s own figures show that the margin for error would be at least 20% either way even if the incorrect assumptions were to have been true.
What does this mean?
These are quite major flaws in the methodology of the survey and (I think) blow apart the reported figures. The headline figure was extrapolated from an “observed” figure of only 16% on the basis more tax evading motorcyclists would have been missed, as they don’t travel as far, which I’ve shown above is almost certainly a flawed assumption. If only around 10% of the riders had SORNed their bike at the end of August or during early September, the vast majority of the “untaxed” bikers would disappear from the stats. Add to that a 20% margin of error, because of the small survey size and the figures may well be comparable with the rates for cars. A precise figure is going to be very difficult to arrive at, as no-one currently has the relevant data that could quantify the errors more precisely.
Whose fault is it?
The mathematics used in the statistical modelling was all applied correctly. The errors arose because of mistaken assumptions about how motorbikes are used and would probably have been spotted if a single representative of the motorcycling community had been consulted at the design stage of the survey. What probably should have been spotted is the ridiculously high figure of 38% evasion, which should, I believe, have raised alarm bells. I suspect that this is why Southampton University were asked to double check the result, but they only checked the statistical techniques used, and did not carry out an assessment of way the VED data had been obtained nor of the validity of the underlying assumptions.
So the blame for all of this lies with whoever designed the survey and data processing methodology, and not with anyone who actually carried it out.
I think at the very least all bikers are owed an apology from Edward Leigh MP, of the Public accounts committee for his intemperate remarks. And another apology is due, I feel, from the DfT, for managing to balls up the figures in quite such a spectacular fashion.
You probably saw on the news the shocking headline that 40% of motorcycle users are riding around on untaxed bikes. The head of the House of Commons Public Accounts Committee (PAC) (download pdf of the PAC’s Vehicle Excise Duty report here), Edward Leigh, went as far as to comment:
Large parts of the biking community are cocking a snook at the law.
Which would be fine if it bore any resemblance to reality. Anyone who is actually a part of the biking community will have probably been scratching their heads trying to work out who these evaders might be, as have our friends at MAG – they say:
Anecdotal visual studies carried out by the group at motorcycle events do not reflect anything remotely like this level of non compliance. (MAG article)
So what’s going on?
Well a press release by the Motor Cycle Industry Association (MCIA) (download it in Word format here)questions some of the methodologies used by the government statisticians. David Taylor, head of the MCIA, says:
We are expected to believe that motorcycle VED evasion rose by 47 per cent from an already highly unlikely figure the previous year. Common sense suggests that the estimate of nearly 40 per cent is wildly inaccurate, or they would surely be very easy to catch.
That seems obvious enough, but what isn’t at all obvious is the methodology and statistics adopted by the Department for Transport (DfT), who commissioned the traffic survey, the private company that carried out the survey, National Statistics who analysed it, DVLA who provided large amounts of data and the Commons PAC (the group of MP’s who have to interpret all this), who caused all the fuss.
Now there’s no requirement to be trained in statistics if you are an MP on the PAC, so despite Mr Leigh’s intemperate outbursts against bikers, we can’t blame him or the rest of the committee for taking the DfT report at face value.
When you take a deeper look at the DfT report and the National Statistics report that underlies it, though, it’s obvious that there are some big assumptions regarding the data and the statistical techniques that have been carried out on it.
This might get quite heavy, but bear with me – there’s not too much maths.
Now I’m assuming that not many people reading this have any kind of statistics qualification, so I’ll try to summarise what I’ve been able to work out without using too much maths. Unfortunately, the relevant bodies above haven’t been that kind, so if anyone is better at this kind of maths than me here are the original documents, feel free to post comments:
- National Statistics VED stats 2006
- National Statistics / DfT report of above VED stats.(pdf)
- Statistical Review of the VED figures by Southampton University
In what follows I’ll refer to these documents by the number I’ve given each one.
Looking at the full published figures, you will find hidden away in an appendix a list of confidence ranges for the final estimate of percentage of vehicles with no VED (road tax). (1 – appendix E7 / Table 18) Confidence range is a statistical term, but it’s not that hard to grasp, using the figure in question as an example. What it means in practice is that although the average evasion recorded is running at 37.8 percent, this is an estimate, but based on the data collected they can say they are 95% confident that assuming all their prior assumptions are true the true figure lies somewhere between 29.9% and 45.7%. This is a truly massive margin for error (for comparison the equivalent 95% confidence figures for cars are 4.0%-4.6%) and shows that even in the best case scenario the margin for error in the bike figures is going to be running at ±20%.
Why is this figure so massive. Well, there are a few reasons, but it mainly comes down to the fact that the number of bikes counted in the survey was much, much smaller than the number of cars. For every bike they counted, they saw 110 cars. (1) Table 10 Whenever you have a small sample, your uncertainty will be larger. Small uncertainties in a sample also tend to balloon when you perform other operations on that sample, as you introduce extra uncertainties which multiply through.
As I mentioned in passing above, all of these error margins are “best case scenarios” and rely very much on the assumptions made by the method used to derive the figures. If there are incorrect estimates made due to these assumptions the final figures will be seriously distorted
For this survey I believe that the assumptions made are in many cases entirely wrong, and I believe this has played a large part in inflating the figures. And although the figures have been independently checked by statisticians those doing the checking have done so on the basis that these assumptions are true (3), as they rightly state at the start of the analysis.
Assumption (b) is a crucial one. In statistical language it is this:
the observed sample of vehicles sighted in the Roadside Traffic Observation Survey is a simple random sample with replacement of the registered vehicles(3)p13
This is not immediately obvious, and uses technical terms, but what it means is this. Every time the person at the side of the road takes a measurement, the chance of seeing a particular vehicle pass by is the same as if he were picking registration numbers at random from a massive lottery machine containing a single ball for each vehicle in Britain. This is justifiable, if you were to be recording the traffic on every road in Britain, but in practise, with only 249 sites, this method has potential to be badly skewed. Anyone who happens to live near one of the sampling sites has a much higher probability of being ‘picked’ (and probably picked multiple times at that) than someone who never passes by.
What this means is that the selection of sample sites is going to have a large bearing on what is recorded. The sample sites chosen represent 1 of each kind of road (as defined by the DfT) per police force area (49 of those) with London getting an extra 3 of each. Motorways are sampled by local government region, and fewer of those are picked.
How have they selected which roads to measure? Well they left that “to the discretion of contractors” who had to reach a set minimum number of vehicle sightings at each location. We can guess they probably chose fairly busy examples of each type of road. It seems likely that the type of person who is going to dodge road tax is more likely to be of a lower social class than average, and live in a scummier area, probably closer than average to a busy road. This is supposition, but it is reasonable, and there is nothing in the results the DfT have presented to suggest that this kind of sampling bias has been effectively eliminated.
Another assumption taken by the survey is that stated quite clearly in (3)
One of the most important assumptions in the model is that the average number of sightings of a given ve
hicle is proportional to its mileage. This hypothesis is not testable from the survey data itself because the mileage of individual vehicles is not directly observed through the survey process. However, the first time that this working assumption was adopted – see §4 in Appendix C of (Department of Transport, 1984) – a postal survey of the keepers of heavy goods vehicles was used to test the adequacy of this hypothesis. Given that this research was carried out some time ago and for a limited sample of vehicles in a single tax class, the Department for Transport should investigate whether alternative data sources exist, or could be obtained, which could be used to re-examine the validity of this crucial assumption.
Or, put differently. it seems unlikely that motorcycles on the road today are being used in the same way as trucks were used in 1984! Estimated mileages for bikes are therefore likely to be way out of kilter with actual figures.
All of these assumptions seem likely to overestimate the proportion of bikes appearing to evade road tax, but there is potentially a far bigger problem, which is not even mentioned in any of the DfT documentation – bikes are much more likely than the average vehicle to be taken off the road.
Many bikers, as we all know (but perhaps the DfT doesn’t) are fairweather bikers. Many more people own a bike but might, like my Dad, keep it locked in a garage for years at a time. Many of these people will have notified the DVLA that the bike is stored off-road via a SORN form, but I would guess that a lot of people don’t. We probably all know someone with a bike in their garage that they probably didn’t use at all last year.
Notice there are two different figures for motorbikes without VED. The figures are 16% of motorbikes in traffic, and 37.8% of vehicle stock (i.e. all bikes with a reg number). Why is the second figure more than double. The logic runs like this. The vehicles they spot on the road tend to be those that travel more miles, so they will have not counted lots of vehicles that have a fairly low mileage. Because they also know that untaxed vehicles have lower average mileages, they apply a corrective figure.
What they don’t seem to have taken account of, though, is that at any given time, a very large number of motorcycles are sat in a garage not being used for months at a time, hence, for those bikes with a mileage of 0, they may be estimating high levels of tax avoidance!
If the assumptions about motorbike usage are corrected, we might find that this doubling effect will vanish.
But there is one last doubt I have about these figures, that might potentially show massive levels of VED avoidance where very little exists.
It’s to do with the way the survey has been carried out and the figures derived.
All of the survey data was carried out in June and July of 2006, a time of the year when a lot of bikes are on the road. At some point the registration numbers queried were tested against the DLVA’s VED database. The survey results were first published on January 25th 2007. Following that a review was carried out on the figures, and the PAC finally got round to studying them just a few days ago.
The crucial question is this – when were the data compared against the DVLA database, and what figures did they use?
The survey contractors almost certainly didn’t have the means to check the DVLA database in real time, and probably didn’t have access to the data, anyway. It seems likely that they would have returned all the data at once, at the conclusion of the survey period. But they probably carried out their own checks on the data before they did. So it seems likely that some time between July 06 and January 07 they checked against DVLA records. It then takes the DfT and National Statistics a further six months to process the data – they probably have a lot of validation and checking work to do, but exactly what and in what order we do not know.
So consider the following scenario. I get my bike out of storage on March 1st and tax it for six months. During June I ride by one of the government surveys and am counted as “on the road” and “in traffic”. Come August 1st the bike is back in the garage and SORNed for the winter. If the DfT didn’t get round to checking the VED database until September I’ll probably be recorded as not being taxed.
Did this happen? I don’t know. But realistically, the a huge number of bikers are going to put their bikes in storage through the winter and claim a VED refund by way of SORN, so if it did, the effect could be a huge overestimate of the number of bikers dodging tax.
Again I don’t know if this has happened, but I have asked National Statistics for more detail on how the VED status was derived and when (if) they reply, I will post here.
Well, this is a mammoth post already, but we’ve reached the end. There are other areas where an unintended bias may have crept into the sample, but we’ve considered what I suspect are the biggest sources of potential error.
The bottom line is that I think it is extremely unlikely that as many as 40% of bikers are evading VED and what this shows more than anything is the danger of placing total faith in your statistics when the underlying assumptions are not a realistic model of the situation you are trying to assess, as well as the difficulty of designing a survey.
And the motorbike insurance angle, of course, is that all of these people without tax presumably have no insurance. Given the prevalence of Automatic Numberplate vehicles run by the police forces now, you’d think they would have noticed if 40% of the motorbikes going by were uninsured, wouldn’t you?
I’ve received my reply from National Statistics to my query about when the survey data was checked against the DVLA database.
Here is the text of the email:
Thank you for your enquiry.
The data for the VED survey in June 2006 was checked against the DVLA system in September 2006 to allow for late updates to be made.
So as I guessed above, anyone who SORNed their taxed bike before or during part of September will have been assumed to be dodging tax. This is going to have had a massive impact on the figures as reported in the press and the true figures for uninsured riding are much smaller than the ones the government has taken at face value.