Almost as soon as you start aggregating numbers, you start making cognitive mistakes. For instance, look at these two scenarios.
1. Women are roughly 50% of the population, yet they are only 10% of your workforce. Is some sort of management intervention necessary?
2. Your manufacturing plant has a robotic process which has been stable and measured for many years. Last week it deviated outside the 3 sigma range. Is some sort of management intervention necessary?
In the manufacturing example, we have a defined set of inputs, a stable, limited-variable process, and a defined way to measure output. Yes, something is going on. As managers, let’s take action based on the math.
In the first example, we are asked to reason by correlation and simile. Because something occurs at one rate in one place, we are asked whether or not a similar thing should occur somewhere else. No, the math does not say with certainty one way or another. Sure you might have strong moral feelings one way or another, and you should definitely act on them, but from a measurement standpoint there’s really just nothing there to show you one way or another. As managers, if we take action we must be clear that we are taking it based on something besides math. Perhaps intuition, or our best judgment of how a workforce should look statistically. (These are very good reasons to take action)
Yet we persist in treating both of these scenarios exactly the same way. Somebody presents us with numbers, and asks us to decide. After all, they’re both just statistics, right?
In the Monty Hall problem, making a choice actually changes the odds, something that is totally counter-intuitive to most people. The history of statistics is full of stories like this. When the Monty Hall Problem was first asked in Parade magazine, over 10,000 people — of which over 1,000 were PhDs — wrote in to the magazine insisting that the mathematically correct answer was in fact incorrect.
People do very badly with statistics. This has not gotten any better over time. And it impacts a hell of a lot more than just math problems in Sunday magazines.
I spent four hours with the Monty Hall problem the first time I saw it. I finally realized you should always switch, but I was still uncomfortable with the answer. Others seem to find the answer quite easily. Likewise, there are mistakes people make with statistics that I seem fairly good at pointing out, while others struggle. I have a high aptitude for math, so my inclination is to believe that different types of problems engage different emotional centers of the brain in different people. Not sure. It would be interesting to see a psychological study of some of these problems framed in various ways for different audiences. I probably shouldn’t hold my breath, though. About 20% of psychology studies that have been examined by mathematicians show serious errors in, you guessed it, statistics.
One of the reasons why I led this piece with a political-type example is that this type of reasoning is common there and we’re all familiar with hearing political-type statistics. Lots of folks play fast and loose with statistics to make political points. If I told you the United States has lost most of its manufacturing jobs, is that a problem? What if I told you the United States manufactures the most in the world, but manages to do so with the fewest number of people? (Much like how the U.S. produces the most agricultural goods, but uses very few people to do so) Would you still think that is a problem? You could argue this either way, of course, but the point is that the same observable reality can be presented in various ways, thereby slanting the story. As the guy said, there are truly lies, damned lies, and statistics.
Yet we are stuck with them. In business, any time we have to make decisions inside a large organization, we are going to be presented with statistics. 91% of people who visit our website come back. Is that a great number? Sure! Is there anything we can do with that? Not really — watch it as we continue to change things. That’s about it. The number itself gives you zero information about causation, which is really what matters when you’re running a business. It just shows a great aggregate metric. Most businesses would assume that the combination of things they are doing creates the metric, but the reality is that it’s the things the business does, plus the unique situation of all the users. There’s a lot in that number that we don’t know. In fact, the hard number “91″ actually gives us a sense of security that is not warranted at all (without being given more information, of course)
Facebook made money because their team was able to generalize huge pieces of the way most users’ brains worked and combine them in such a way to make a sticky app. Aside from the delays caused by switching costs, if any piece of this generalized model proves fragile another model will replace it. People say that Facebook is a great app because the site’s stickiness is good, but that’s wrong. It’s the other way around: the site’s stickiness is good because it’s a great app. “Stickiness” is an aggregate number, it represents the result of the quality of the app. It’s a result. It’s not a cause. The statistic shows you some kind of vague, generalized, mashed-up result. It never gives you causes.
So when people start throwing statistics around, be very careful about what kinds of assumptions and leaps of faith you are being required to make. Statistics are terrible at providing insight, although they might be terrific in terms of feedback. “Sales is up 10%” is good feedback that things in general are better than before, but it tells you absolutely nothing about what has changed in the world to cause the increase. And of course, that’s the most important information to know!
Website designers have had this problem for years. You put up a site, instrument it up, wait a while and promote it for a while, and then what? Tons of statistics, that’s what. Anybody that has used Google Analytics or one of the other packages has seen the pages and pages of reports, graphs, and statistics those packages can generate. Page A has more people spending time on it than Page B, but Page B has greater click-through. Is that a good thing for page A? Or page B?
Some website owners are lucky — they have landing pages and the only thing they care about is getting people to click-though (down the “sales funnel”) to an order. In this case, they’ll make two entry pages, page A and page B, and compare how each performs. Each page is instrumented, and they carefully look at how changes in the funnel change the behavior of visitors. This is called A/B testing.
Most website owners, however, are not so lucky. They are content creators, and their goal is to provide engaging and sticky content. There are lots of ways to measure that — we don’t have easy things like funnels to help us out. There is no one universal metric that makes sense. Maybe a ten million people visit regularly, but only once a year. That’s a great site, but those statistics tell you absolutely nothing about why or how to make it better.
Whether you have a funnel or not, statistics get in the way much more than they help. The critical skill of a good businessperson is selectively looking at certain statistics and making guesses about the market that they then quickly test. Bad business people make no guesses, or they make all the wrong guesses, or the guesses they make take too long to prove out. Out of all the startup skills I’ve studied, this one — effectively inferring intent from reams of numbers — is probably the most difficult. That’s why they tell founders to directly and physically interact with customers as much as possible. It’s as hard as hell to get anything out of a statistical report.
But still, I wonder if A/B testing might be useful in a lot more places than just sales pages, from regular content sites to industrial statistical process control to politics and economics. It’s a tool, once again, that we technologists have had to mature out of necessity. It should find wider use and acceptance. Because we ask ourselves this same question over and over again in business and life without realizing it. Can we identify which one thing, if changed, has the impact we want — what is a cause of change? If we’re serious about diagnostics in the rest of our world, we probably should be doing a lot more A/B testing in all kinds of places.
If you’ve read along this far you should follow me on Twitter!
Yes, I know about multi-variate testing. This article is meant just as an introduction to some of the general startup issues around statistics
If you've read this far and you're interested in Agile, you should take my No-frills Agile Tune-up Email Course, and follow me on Twitter