Book Rounds: Statistical Smarts


Book Rounds, Professional Skills Development / Tuesday, July 2nd, 2019

How to Lie with Statistics

Darrell Huff

A humorous, insightful and easy read outlining the foibles of statistics and our interpretation of them. Highly recommended as a refresher for those that feel like they’ve lost touch with statistics, or those that would appreciate a reasonable reference to offer owners who wish to be well-informed. The author uses many real life examples of statistics gone wrong that improves understanding.

The title is rather tongue in cheek, as the author points out that our enamoration with precision often washes right over our common sense, just as often as the misuse or misinterpretation of statistics contributes to problems. The author sums his purpose with “The crooks already know these tricks; honest men must learn them in self-defense.” 

The common manipulations he suggests we be wary of are as follows: 

Sample Bias: Even the most honest of researchers can be blind-sided by unintentional sample bias. Be especially wary of self-selecting or self-reporting samples. The conclusions drawn need to be very limited to the represented population, and rarely provide a “true” representative sample. Sample size can also bias the results. Small sample sizes are rarely representative of the normal distribution of data points. Check for number and source of data points.

Average: Average can actually be a very non-specific term, and may be used to refer to the meanmedian or mode. Frequently, the mean is used. This may be helpful in some cases, but given it is the sum of all values divided by the number of values, it may not give a clear picture. A small number of samples may lie at the extreme of values and skew the mean towards the extreme of the range. The median is the value at which half of the values lie above, and half below. This gives a clearer picture of the distribution of values. The mode is the value of most frequent occurrence, so can still skew the picture, but towards frequency, rather than range. Understanding which term is being utilized for average gives you a bit more information to understand the bias. The author warns that “an unqualified “average” may be … meaningless”. 



Data presentation: The data may be presented in a variety of fashions that warp our understanding. Graphs or visual representation can be manipulated in manners that invite us to subconsciously draw conclusions that may not be correct. A graph should be labeled on both axes. Check these labels to make sure they make sense. Do you know what the numbers are supposed to be, or are there just a set of numbers strung up the side of the graph? The data may be honestly and fairly manipulated (say, placed on a logarithmic scale), but the visual perception increases the differences between groups. Alternatively, the scale can be manipulated to omit chunks of numbers, which again toys with the visual perception of the data. Always be suspicious of visual depictions that aren’t labeled.

https://imgs.xkcd.com/comics/normal_distribution.png

Correlation vs Causation: A very common error and tendency is to identify correlation and presume causationCorrelation is only indicating that two factors have a commonality. It is entirely possible (and frequently probable) that this is merely a coincidence. Sensationalism and natural human tendencies will presume that this commonality indicates that one caused the other. Even if causation is present, the statistics indicating correlation don’t clarify which factor is the effector and which is the effectee. In other words, a chicken or the egg conundrum should be considered if you believe that the reported correlation indicates causation. Finally, correlation may indicate a direct relationship, but the relationship may be the consequence of an entirely different factor. For example, every time your child gets a cold, they always have a runny nose and a cough. That doesn’t mean the cough causes the runny nose, nor that the runny nose causes the cough. Rather, the cold virus causes both events to occur! 

p-value: This is a critical evaluator of the data, giving a probability that the significance of findings is due to true differences versus chance. A generally accepted p-value of significance is p<0.05. While generally accepted, this is somewhat arbitrarily set or selected. A researcher may choose to set significance at p<0.01. Partly to impress upon others their integrity and stringent evaluation methods, and partly to indicate that the probability that their findings are random chance are particularly low. There is nothing to stop a researcher from setting their p-value significance at 0.1 or even 0.5, other than perhaps peer ridicule. Be suspicious of data presented or implied to be significant which is not accompanied by a p-value.  

https://www.facebook.com/sassyeconometrics/posts/i-hope-p-value-jokes-are-still-funny/1953677048179439/

Numbers: Be cautious of a false sense of security when numbers are thrown at you. We tend to assume greater validity when precise numbers are given, but that can be a false assumption. Numeric differences may be insignificant when associated with subjective material.  For example, intelligent tests-> can you really qualify someone as smarter than another individual with a two point difference in scores? Percentages can also manipulate our understanding of data, and can do some pretty underhanded tricks to our brains and our understandings. Beware of data presented or compared between percentages. A 1% increase in salary for a company’s employees could be approximately a hundred dollar addition (for an employee that grosses a thousand a month) or ten thousand (for an employee that grosses one million per year). The CEO and the employee are going to have very different feelings about that 1% increase. 

Indirect conclusionsDrawing a conclusion based on inference from the information is a very natural human tendency, but can be very risky, and very false. These indirect conclusions may be made for you (often by non-professionals interpreting data, such as news reporters or media sources), or be designed to prey on your humanness, by allowing you to unwittingly do the dirty work. For instance, a hand soap my claim to reduce bacteria by 99% (with proven studies). You think, “That’s fantastic. Got to be better than the handsoap that makes no claims about effectiveness! I’ll buy this (for 0.80 cents more)!” And yet, no one studied (or reported) if that reduction in bacteria actually changes the incidence of disease. Your super computer of a brain subconsciously drew that inferred conclusion. That 0.80 cents more may not have accomplished anything other than relieving you of some spare change. Advertisers love this dirty little trick, and it can be employed even by people we think we should trust. 

https://www.forbes.com/sites/erikaandersen/2012/03/23/true-fact-the-lack-of-pirates-is-causing-global-warming/#7fcf22013a67

As the author points out, “despite its mathematical base, statistics is as much an art as it is a science.” Often, there are multiple statistical methods that may be appropriate, and the statistician must subjectively select which they feel reflects the data best. Nevertheless, it is prudent to ask yourself for every statistic you face: “Does it make sense?” Do not be white-washed by the sciency feel of numbers, abandoning your common sense! Those with unscrupulous biases are hoping you’ll do just that. Now we know their dirty, lying tricks, though, and are prepared! Go forth, and be skeptical!

104 Replies to “Book Rounds: Statistical Smarts”

  1. This is really interesting, You’re an excessively professional blogger.
    I’ve joined your rss feed and look forward to looking for more of
    your excellent post. Also, I have shared your site
    in my social networks

  2. Good day very cool site!! Guy .. Beautiful ..
    Superb .. I will bookmark your web site and take the feeds additionally?
    I am happy to search out numerous helpful information here within the submit, we want develop extra techniques in this regard, thanks for sharing.
    . . . . .

  3. What i do not realize is in reality how you are not really
    much more well-preferred than you may be now. You are very intelligent.
    You know therefore significantly when it comes to this matter,
    produced me personally imagine it from numerous various angles.
    Its like women and men aren’t involved except it’s something
    to accomplish with Woman gaga! Your individual stuffs great.
    At all times take care of it up!

  4. My developer is trying to convince me to move to .net from PHP.
    I have always disliked the idea because of the expenses. But he’s tryiong none the less.

    I’ve been using Movable-type on various websites for
    about a year and am concerned about switching to another platform.
    I have heard fantastic things about blogengine.net.
    Is there a way I can import all my wordpress content into
    it? Any kind of help would be greatly appreciated!

  5. You actually make it seem really easy along with your presentation but I to find this matter to be actually something that I think I’d never understand. It seems too complicated and extremely vast for me. I am looking ahead in your next post, I will try to get the hang of it!

  6. I look after a vape shop submission site and we have had a posting from a vape shop in the United States that likewise offers for sale CBD goods. A Month later on, PayPal has contacted use to say that our account has been limited and have requested us to take away PayPal as a payment solution from our vape store directory. We do not offer CBD items such as CBD oil. We only offer marketing and advertising professional services to CBD firms. I have visited Holland & Barrett– the UK’s Top Health Merchant and if you take a close look, you will see that they offer a rather extensive series of CBD goods, specifically CBD oil and they also happen to take PayPal as a payment solution. It appears that PayPal is applying double standards to different firms. Because of this stipulation, I can no longer accept PayPal on my CBD-related internet site. This has limited my payment choices and currently, I am heavily contingent on Cryptocurrency payments and direct bank transfers. I have consulted a barrister from a Magic Circle law office in The city of london and they stated that what PayPal is doing is altogether unlawful and discriminatory as it should be applying a consistent benchmark to all firms. I am yet to talk with another lawyer from a US law firm in London to see what PayPal’s legal position is in the USA. For the time being, I would be highly appreciative if anybody here at targetdomain could provide me with substitute payment processors/merchants that work with CBD companies.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.