Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Wednesday, August 10, 2011

Numbers, numbers, numbers

ESPN has decided that fantasy football geeks need a new stat to drool all over. According to "The Worldwide Leader," the NFL's passer rating system is passe and in need of an update. I'm not going to bore you with the details - but I will agree that that current rating system yields a number that means next to nothing.

But here's the problem with statistical analysis in football - unlike baseball in which you can boil every confrontation down to pitcher v. batter, football is a team sport and the result of a pass play is far more dependent on the other 20 players on the field.

Besides, numbers have never been nearly as important in football as they are in baseball. The rules in football change every so often because the offense, or the defense, has "too much" of an advantage. The season has expanded from 10 to 12 to 14 to 16 games. There's also the realization that every yard gained on the football field is the result of an entire team working together. In baseball, if you hang a slider, it's more than likely going to find its way into the bleachers.

Before "Big Head" Barry Bonds stole took the title of Homerun King away from Hank Aaron, everyone knew what the numbers 755 and 714 meant. Before the juicers wiped Roger Maris' single-season homerun mark off the books, everyone knew what the numbers 61 and 60 meant. Nolan Ryan is baseball's strikeout king and has thrown more no-hitters than anyone else. "The Splendid Splinter" Ted Williams is the last big-leaguer to hit over .400 for the season. Pete Rose is the all-time hits leader. Quick -- what's the significance of the number 56?

If you're a baseball fan you know that's the number of consecutive games the "Yankee Clipper" Joe DiMaggio got a hit in. And it is also one of the few records that is not likely to ever be broken.

Thanks to Bill James we can argue over a beer until the end of time who the best baseball player was. We've got batting average, on-base percentage, slugging percentage and OPS (slugging + on-base percentage) in our debate kit. On the pitching side we've got ERA and WHIP (walks/hits per inning). In the world of baseball, these numbers mean something - they always have and they always will.

But who has the record in the NFL for most passing yards in a season? Most rushing yards? Most receptions? Most touchdowns? Most career passing yards? Most career rushing yards? And, even if you know who, what are the numbers?

We don't know because it isn't important in football.

So, thank you, ESPN, for another meaningless stat that no one outside fantasy football will concern themselves with. Just like the esteemed members of our legislature, ESPN has created a solution for a problem that didn't exist.

Thursday, April 30, 2009

Statistical reasoning

I came across an interesting article in Cognitive Daily this afternoon about reasoning with statistics. The article described three studies conducted to determine how well people understood statistics and observable data.

In the first study people were asked which hospital would be more likely to have a day in which 60% of the babies born were boys: a hospital with 15 births a day or a hospital with 45 births a day. Most of the respondents said the percentages should be the same; very few respondents understood that the smaller a test sample, the more likely there are to be large variations day to day.

In another study respondents were asked to imagine they were on a remote South Pacific island and that they discovered a new species of bird as well as an obese native. The respondents were then asked to estimate the portion of the new bird species that were blue and the portion of the native population that was obese. Most respondents believed that the blue bird was very representative of the new species, regardless of whether they saw one blue bird, three blue birds or twenty of them. They also believed that one obese native was not necessarily representative of all natives, but, the more obese natives they observed, the higher percentage they attached to them.

nisbett1.gif

The final study looked at a high school senior trying to decide where to go to college: Ivy College or Liberal College. Our senior had friends at both schools. His friends at Ivy had many complaints about the college, the atmosphere, the social scene, etc. His friends at Liberal were very content. He then went on a visit of both schools: he loved Ivy but was not so hot about Liberal afterwards.

A vast majority (74%) of the respondents said he should go to Ivy based on his friends' comments and his visit.

Researchers then ran a second test using the same scenario: only this time our senior made a list of classes, sites and activities from both colleges and picked several at random before making his visit. Once this detail was added the number of respondents picking Ivy dropped to 56%. The researchers concluded that when the respondents were told about the list they decided to look at the problem from more of a statistical viewpoint -- his friends at the schools had a better sampling of the environment than he did on his one visit.

The lesson to be learned is that the sample size will affect the reliability of the statistics generated from it. So, when confronted with statistical evidence, from DNA samples to field sobriety tests to breath-alcohol tests, focus on the sample size. The larger the sample size, the more reliable the numbers generated.

On field sobriety testing, point out the relative small sample sizes used in the "validation studies." Look at everything from the location of the tests, the ages and sexes of the participants, the level of intoxication of participants in lab testing, the time of day and the number of officers who took part in the study.  

As far as the breath test result -- just how accurate do you really think two breath samples taken two minutes apart are?