Thursday, April 30, 2009

Statistical reasoning

I came across an interesting article in Cognitive Daily this afternoon about reasoning with statistics. The article described three studies conducted to determine how well people understood statistics and observable data.

In the first study people were asked which hospital would be more likely to have a day in which 60% of the babies born were boys: a hospital with 15 births a day or a hospital with 45 births a day. Most of the respondents said the percentages should be the same; very few respondents understood that the smaller a test sample, the more likely there are to be large variations day to day.

In another study respondents were asked to imagine they were on a remote South Pacific island and that they discovered a new species of bird as well as an obese native. The respondents were then asked to estimate the portion of the new bird species that were blue and the portion of the native population that was obese. Most respondents believed that the blue bird was very representative of the new species, regardless of whether they saw one blue bird, three blue birds or twenty of them. They also believed that one obese native was not necessarily representative of all natives, but, the more obese natives they observed, the higher percentage they attached to them.

nisbett1.gif

The final study looked at a high school senior trying to decide where to go to college: Ivy College or Liberal College. Our senior had friends at both schools. His friends at Ivy had many complaints about the college, the atmosphere, the social scene, etc. His friends at Liberal were very content. He then went on a visit of both schools: he loved Ivy but was not so hot about Liberal afterwards.

A vast majority (74%) of the respondents said he should go to Ivy based on his friends' comments and his visit.

Researchers then ran a second test using the same scenario: only this time our senior made a list of classes, sites and activities from both colleges and picked several at random before making his visit. Once this detail was added the number of respondents picking Ivy dropped to 56%. The researchers concluded that when the respondents were told about the list they decided to look at the problem from more of a statistical viewpoint -- his friends at the schools had a better sampling of the environment than he did on his one visit.

The lesson to be learned is that the sample size will affect the reliability of the statistics generated from it. So, when confronted with statistical evidence, from DNA samples to field sobriety tests to breath-alcohol tests, focus on the sample size. The larger the sample size, the more reliable the numbers generated.

On field sobriety testing, point out the relative small sample sizes used in the "validation studies." Look at everything from the location of the tests, the ages and sexes of the participants, the level of intoxication of participants in lab testing, the time of day and the number of officers who took part in the study.  

As far as the breath test result -- just how accurate do you really think two breath samples taken two minutes apart are? 

No comments: