Decision Mechanics

Insight. Applied.

  • Services
    • Decision analysis
    • Big data analysis
    • Software development
  • Articles
  • Blog
  • Privacy
  • Hire us

Statistical intuition

May 23, 2021 By editor

splash

The Monty Hall problem is a probability puzzle based on an old US game show. You are shown three doors. One contains a car, while the other two contains goats. The game show host invites you to pick a door. He then opens one of the doors containing a goat and asks, "Do you want to stick with your original selection or switch to the remaining door?"

What should you do…assuming that you don’t wish to own a goat?

I’ll take all the fun out of it. You should switch. It doubles your chances of getting the car.

Convinced? Probably not. Even when presented with the solution, many people struggle to accept it. It’s not particularly intuitive.

Martin Johnsson recently discussed this on his blog. He used paper simulation, computer modelling and mathematics to try and satisfy himself of the wisdom of switching. He concluded

…I’m not sure I have convinced myself of the solution to the generalised problem yet.

Reasoning about probabilities is hard. It’s very easy to be led astray by our "gut". As the eminent statistician Sir David Spiegelhalter has noted

…when asked a basic school question using probability, I have to […] try it a few different ways, and finally announce what I hope is the correct answer.

This dereliction of our instincts means that it’s essential to draw on the formal methods of statistics and Monte Carlo simulation when making important decisions.


Photo by Sergiu Vălenaș on Unsplash

Filed Under: Data science Tagged With: intuition, monty hall problem, probability, statistics

Guess the Correlation

April 14, 2021 By editor

People find it difficult to intuitively gauge the level of correlation between variables.

Guess the Correlation is an 80s-style video game that lets you flex your estimation muscles.

Just be aware that it doesn’t seem to present negative correlations, so you’ll have to intuit those elsewhere.

Filed Under: Data analysis, Data science Tagged With: correlation, game, statistics

Mathematics & Statistics Awareness Month

April 9, 2021 By editor

April is Mathematics & Statistics Awareness Month.

Let’s celebrate it by making sure we embrace statistics in our data science projects. It’s not just about Python, folks!

Filed Under: Data science Tagged With: statistics

Sharks are definitely scarier than mosquitos

March 24, 2021 By editor

Bill Gates retweeted a World Health Organization infographic showing that mosquitos kill vastly more people than sharks every year—on the order of 100,000 times more.

In his tweet Bill captioned the infographic with, "Why I would rather encounter a shark in the wild rather than a mosquito." Presumably he’s referring to man-eating sharks.

This was an informal comment designed to highlight the misery caused by malaria—a cause that is at the centre of Bill’s philanthropy. Clearly it wasn’t supposed to be a serious risk assessment.

But it illustrates how confusing conditional probabilities are, and how easy it is to make invalid statistic inferences.

The data in the infographic refer to the probability that, given you are dead, you were killed by a shark or a mosquito. Chances are that it was a mosquito—not a shark. That seems intuitive.

Technically, we can denote this as

$P(shark|death) << P(mosquito|death)$

I’m not convinced by Bill’s implication that it’s better to encounter a shark than a mosquito. I grew up after "Jaws" was released. Intuitively, surely sharks are much more dangerous, right?

The risk posed by meeting either of these creatures is the probability of being killed given you met them. If we encountered man-eating sharks as often as we encounter mosquitos we’d be getting munched on constantly.

Sharks are definitely scarier. We can represent this more formally as

$P(death|shark) >> P(death|mosquito)$

It’s important that we distinguish between $P(mosquito|death)$ and $P(death|mosquito)$ when drawing inferences.

Fortunately man-eating sharks live in the ocean and I don’t. Given that, I’m willing to sign up for more killer sharks and less mosquitos.

Filed Under: Data science, General Tagged With: conditional probability], statistics

Confidence intervals

August 24, 2020 By editor

Statistical confidence intervals are almost always misinterpreted. Consider the following statement.

"The prevalence of the disease P has a 95% confidence interval of 1% <= P <= 5%."

This is commonly taken to imply that there’s a 95% chance that the true prevalence is between 1% and 5%.

This isn’t the case.

Confidence intervals represent uncertainty about the interval, rather than the parameter of interest.

The correct interpretation of the confidence interval defined above is that if we collect many samples from the population and calculate confidence intervals from them, 95% of those confidence intervals will contain the true value of P.

In Bayesian statistics we generally calculate credible intervals which are compatible with the intuitive interpretation.

Filed Under: Data science Tagged With: confidence intervals, statistics

  • 1
  • 2
  • Next Page »

Search

Subscribe to blog via e-mail

Subscribe via RSS

Recent posts

  • Self-driving car from 1958
  • Sentient AI
  • Is functional programming more effective than object-orientated programming?
  • Assumptions can ruin your k-means clusters
  • # The Forer effect

Copyright © 2022 · Decision Mechanics Limited · info@decisionmechanics.com