Tuesday, 3 February 2009
A defense of classical statistics against the bayesians
Introduction
There are lots of attacks on classical statistics, and only a few brave souls seem wiling to defend it philosophically. So, although my sympathies are Bayesian, I would like try to present a couple of arguments against Bayesian statistics and in favour of the classical approach.
The main argument is practical. When I read an article I am simply not very interested in the priors of the author or any other experts. All I need to update my beliefs and my priors, is the result of the experiment. In addition to this practical defence of the clasical approach, there are also some conceptual problems with the Bayesian method such as the problem of how to update on beliefs with probability one or zero.
Brief definitions
A Bayesian interprets probabilities as subjective beliefs and updates her beliefs using Bayes theorem when she receives new evidence. A classical statistician interprets probabilities as a measure of relative frequency and has little use for Bayes' theorem since they believe the priors are subjective and therefore they should not be used as inputs in order to arrive at a objective and scientific conclusion.
In practice
To examine the practical differences between the two approaches, consider how you would arrive at a probability about the effect of a new drug. The Bayesians will start by forming a prior belief about the effect of the drug, then conduct an experiment, then use the results from the experiment in order to update the prior beliefs and arrive at the final estimate of the effect of the new drug (the posterior distribution). The classical approach is to go straight to the experiment and present the results from this experiments without forming or updating on prior probabilities.
Now, philosophically I have little doubt that the Bayesians are right that probabilities should be interpreted as subjective beliefs. However, this does not mean that I am very interested in the prior beliefs of the author about the consequences of the drug and their probabilities. I have my own priors and all I need from the article is the result from the experiment in order to update my own beliefs. At a practical level therefore, all I want is the information contained in the classical approach.
One might argue that this is a bit too harsh and that it would be interesting to know the priors of a group of experts. I agree! However, although presenting expert opinions is an advantage, I still do not think this should be aggregated to a prior belief and used as an input in an updating process before presenting the final verdict. Why not? Because there is no such thing as a "group subjective belief." Only individuals have beliefs and there are lots of aggregation problems and psychological biases in a group process leading to the "expert consensus." So while I would like to be informed of these beliefs in an article, I do not want the author to use an expert prior, or his interpretation of what the expert consensus is, and update on this. Based on the information available and my reading of the opinions I have to form my own opinion of the initial priors and all I really need and want from a paper is the classical results: the relative frequencies.
What about beliefs with probability one or zero?
Let's say that you make a mistake and that you falsely believe that something is impossible. An experiment then shows that it is possible. Clearly you should change your mind! The problem is that it is impossible to do so if you are a true Bayesian. You simply cannot update beliefs which are assigned probabilities that are zero or one. They stay likie that forever regardless of the evidence.
This is clearly absurd and I guess that most Bayesians would like to shout at me that this simply means that you should not assign something a probability of zero or one. So instead of being a problem, they turn it into a partial virtue: As Bayesians we are careful not to bee too sure because we are aware of the foolishness that would result from being too certain. And the argument that Bayes therem does not work for zeros and one is simply a demonstration if this foolishness.
Although I am attracted to this argument, there are still some problems. First of all, to put it in general terms: What if I happen to be a fool at some point about some issue and assign it a probability of zero or one, I would to have a system that allowed me to recover and not be stuck! To point out that it was stupid of me to do something like that in the first place, does not eliminate the problem that it is impossible to get out. True, I should not have done so in the first place, but what if! That is the problem, and it is a problem the Bayesians cannot escape. (At least this is what I believe with a certainty of about 0.87)
Let me add a second argument that makes thiungs even worse. Stating that you should never assign something a probability of zero or one (unless it is an analytic truth, like a mathematical theorem), is easy to do if the list of possible outtcomes is well defined. However, what if the state space is infinite or there are some states you simply have not thought of? In theory this is equivalent to assigning a alternative a zero probability and in practice this is a much easier mistake than explicitly assigning something a probability of zero or one. We may be fools for being absolutely certain about an outcome we have heard about, but it seems much more common and less foolish simply to fail to list all the possible and relevant outcomes. Logically the two are euivalent for Bayesians. They must assume a complete description of the relevant state-space. This, I think, makes it more difficult to dismiss the zero/one argument as foolish. Instead it becomes a conceptual problem for the Bayesians: How to do statistics in a world in which we often fail to list all the possible outcomes?
Conclusion
I am a Bayesian in the sense that I think beliefs are subjective. However, I am most interested in the result of experiments when they are untainted by other people's subjective beliefs. In this way it becmes easier for me to update my own beliefs using my own priors. So in this sense I like the Bayesian philosophy, but I still like articles to have a classical approach. However, even the Bayesian philosophy givens me some worries sine it seems to lead to the absurd conclusion that we shouold maintain beliefs that are obviously false given that we at some point made a foolish mistake or, slightly less foolish, made a mistake when imagening the possible outomes.
Literature
There is a ton of literature on this topic, mostly by Bayesians pounding on the Classical approach. A philosophical discussion from both perspectives can be found in "Bayes or Bust." An interesting and more formal attack on Bayesianism is "The Limitation of Bayesianism" by Pei Wang.
Popular inaccuracies: A review of Malcom Gladwell’s “The Tipping Point”
Malcom Gladwell’s book The Tipping Point has itself become something of a fad. It is hugely popular and this might lead one to believe that the author really has discovered the social laws that explains why some things catch on, while other books – or diseases, or rumours, or habits – seem to stumble before they get off the ground. Is this true?
Despite the popularity of the book I remain sceptical of much of its content. First of all, the arguments are far too anecdotal to be convincing. Anecdotes makes for easy and entertaining reading. However, precisely because it is so easy to remember anecdotes they can convince us when we should be sceptical. Second, one of the major messages is at best an oversimplification and at worst very misleading. It is simply not true that it is necessary to have some highly connected individuals to get an epidemic going. Third and finally, at some points – such as the importance of zero-tolerance in fighting crime - detailed investigations reveals more doubts than the enthusiastic picture that is painted in the book.
A brief summary
How do epidemics – social and real – get going? The answer, Gladwell claims, is three principles. First, he assigns a crucial role to some individuals. The most important individual would be someone who are well-connected (have many friends, sexual partners and so on), are knowledgeable, and those who are able to sell the message. Second, the thing that is to be spread has to have “stickiness.” If it is information, it has to be easily remembered. Finally, the context can sometimes be such that an epidemic spreads easily, for instance by encouraging certain kinds of behaviors.
Obviously there is some truth to these arguments. Gladwell’s main point, however, is simply not just that these are important principles- He also argues that the unite a lot of rather diverse phenomena and that they show how small things can have large consequences. The spread of rumours, HIV, crime, syphilis, smoking and lots of others phenomena are used to illustrate the importance of the three principles and in all cases he argues that small changes can have large effects. For instance, small things making a message glue itself easier to the brain can make a a large difference to the sale of a product.
How are we convinced?
To convince the reader of his message, Gladwell gives us illustrative examples. He tells us about the homosexual flight-attendant who supposedly had 2500 partners, about a successful businessman who had an extremely large circle of friends, and how the spread syphilis in a city could be traced back to a small group of individuals. To be fair, Gladwell also cites scientific research. The reader is told about a wide range of experiments and results. However, the research is used selectively. He uses articles that support his views without pausing to mention potential weak spots or rival views. In sum, the combination of anecdotes and scientific references can be quite convincing at first, but on closer inspection they tend to reveal several weaknesses. A good example includes the role he assigns to “special” individuals.
The role of special individuals
The flight attendant with many sexual partners is used to convince us that some extremely well-connected individuals are crucial to the spread of diseases. It is clearly true that such individuals could spread a disease rapidly. Still, we must ask whether the existence of these individuals really are necessary and important for the quick spread of a disease. It turns out that they are not. To understand this, think first about a community in which everybody knows their neighbours and nobody else. In this imagined community everybody has the same number of friends and there are no special individuals in the sense of a person who is extremely well connected. In such a community it would take some time for a disease to spread from one random person to another because it would have to go through the whole chain of neighbours.
Now imagine that we create some friendships across the community so that our inhabitants meet some people in addition to the neighbours. There might be some differences in the number of friends each individual has, but this is not important. The important point is simply that we allow some more or less random connections between people other than to their neighbours. It turns out that in such a community a disease or a rumour can spread very quickly (the small-world property). Note here that the key is not that there are some people who are extremely well connected. It is the fact that there are some cross-cutting connections that is important for the speed by which something spreads. Reading the Gladwell’s book one gets the opposite feeling: That some individuals are extremely important. This is misleading in an important way. It makes us give more prominence to individuals than we should and less to the structure of the system.
The devil is in the details
The spread of crime can be viewed as an epidemic and the question then becomes what it is that is driving the epidemic. In this book Gladwell seems to favour the so called zero-tolerance or “broken-glass” argument and he cites researchers who have argued for this. However, while it is intuitive that our surroundings affect us and out behaviour, closer investigation reveals that there are weaknesses in the empirical evidence connecting the drop in crime with the introduction of zero-tolerance policies. In several places the drop appears to have started before the policy was investigated. Also, while Gladwell may be excused for not knowing about Lewitt’s work on the potential importance of abortion in explaining this drop, he must have known that there are many alternative possibilities that we should explore before simply accepting the zero-tolerance argument just because it fits the argument so well (that small things can have big effects).
In sum
One might complain that it is unfair to use such strict criteria for a popular science book. I sympathize with this. One clearly cannot write a book with lots of statistics and expect to sell a lot of books in airports. One could also argue that at least the book stimulates the thought and may serve as an appetizer to more rigorous arguments. Still, there are lots of popular science books that manage to be critical and popular without compromising too much on substance. Simon Singh’s book Fermat’s Last Theorem is an excellent example of this. Although not a bad book, I think The Tipping Point ends up compromising too much on substance.