Districts where residents have the highest average educational achievement tend to be the smaller ones. Staying true to the tradition of blogging, we’re stating this without having conducted any research whatsoever. Still, we’re confident in the assertion.
Oh, and did we mention that districts where residents have the lowest average educational achievement tend to be the smaller ones. Yep. That’s right.
Eh? How does that work? Well, it’s a consequence of the higher variability of averages in smaller groups.
In a city like London, the average IQ will be close to the UK national average (probably about 100). Granted, for London, it may be a fraction higher, due to the likelihood that an international city is a draw for talent—but it won’t be much above the average.
However, imagine a picturesque hamlet where all nine residents have average IQs. A successful entrepreneur with a genuis-level IQ (say 160) decides to build his dream house there and move in. Suddenly the average IQ of the hamlet is now 106. If the entrepreneur had moved to London the average IQ of the city would have changed imperceptibly.
Relying on averages alone is misleading. We need also to consider sample size.
Let’s take another example. Imagine there’s a software development project that has three components—a database, a server application and an iPhone application. All three components are essential parts of the overall system.
Each component is assigned to a separate team and all are asked for estimates of how long their projects will take to complete. For the sake of simplicity, we’ll assume that they all say eight weeks—which we’ll interpret as being a 50% chance that the component will be completed within eight weeks. So, there’s a 50% chance that the entire project will be completed within eight weeks, right?
Well, that’s how it would probably be reported by many project managers, but it’s wrong. In fact, there’s only a 12.5% chance that the project will be completed with eight weeks. All three sub-projects have to go well for the project to deliver within eight weeks. So, it’s highly likely that the project will miss its deadline. It’s impossible to say by how much, as the sub-project estimates are single-point estimates, as opposed to (more realistic) distrubtions—but that’s a topic for another time.
Clearly, these are fairly simple examples. But this kind of “average” thinking is going on every day. And, as the importance of data as a decision-making tool grows, sloppy analysis is going to increasingly undermine the value of good data.