I just saw a post on Statistical Modeling dealing with some of the worst use of statistical graphics this year. Be sure to check it out. I’d have to say I agree with that assessment. The case deals with two pictures of a road during the Crimean War. In the first picture, there is an road covered in cannonballs. In the second, the road is clear. Errol Morris challenged his readers to figure out which picture came first. The correct answer is the clear road.
Morris uses pie charts and bar graphs to display the reasons people gave for their decisions. While colorful, these graphs are also meaningless. So given the data, I z-normalized the on choices and off choices (made it so their distributions had mean 0 and standard deviation 1). I used the same bar graph setup (except horizontal this time). Since I normalized each distribution, the actual quantity of voters one way or the other no longer really makes a difference. I am just comparing the relative preference by one side or the other for a given reason. This assumes that there is some significance to a person not choosing a particular reason, which may be incorrect.
Click to enlarge the graph if it’s not properly visible:
So what I think my chart shows is that shadows are the worst feature to choose for correctly guessing which came first. People who focused on either the shelling or characteristics/artistic features were more likely to choose correctly. The most confusing feature is the number and position of the balls. Also confusing were practical concerns. If I were going to train a support vector machine to classify images of this type, I would use the three features: shelling, characteristics/artistic and shadows.
So what do you think? Am I way off on trying to normalize these and make this kind of assessment? I am, after all, a statistics amateur.