Those of us involved in decision and data science would do well to remember the words attributed to John R. Searle.

If you have to add “scientific” to a field, it probably ain’t.

Insight. Applied.

By editor

Those of us involved in decision and data science would do well to remember the words attributed to John R. Searle.

If you have to add “scientific” to a field, it probably ain’t.

By editor

Statistical confidence intervals are almost always misinterpreted. Consider the following statement.

"The prevalence of the disease *P* has a 95% confidence interval of 1% <= *P* <= 5%."

This is commonly taken to imply that there’s a 95% chance that the true prevalence is between 1% and 5%.

This isn’t the case.

Confidence intervals represent uncertainty about the interval, rather than the parameter of interest.

The correct interpretation of the confidence interval defined above is that if we collect many samples from the population and calculate confidence intervals from them, 95% of those confidence intervals will contain the true value of *P*.

In Bayesian statistics we generally calculate *credible* intervals which are compatible with the intuitive interpretation.

By editor

Pandas is a Python package for working with tabular data—as in a spreadsheet or database table. It provides similar functionality to R’s data frames.

As pandas is rich in features it can be difficult to remember all its operations and syntax, so Enthought have produced a visual guide to the package in 8 handy pages.

By editor

Andy Kirk at Visualising Data ran a Twitter poll about the relative accessibility of R and Python to non-developers.

59% said that R was more accessible.

Obviously, the poll is far from scientific, but the comments he received reflect my own experiences of teaching both languages—such as the significance of the RStudio IDE and the `tidyverse`

packages in getting people off the ground.

By editor

BBC News are using the R ggplot2 package to create production-ready charts—which illustrates the flexibility and power of this package.

What’s particularly interesting is the approach they’ve taken to training their staff to use R. Initiatives include

sharing of tricks and tips through a living cookbook (knowledge base)

a six week course comprising six short introductions with links to online resources

a dedicated Slack channel for course participants to discuss issues or ask for help

an end-of-course project/challenge that exercises the skills developed over the six weeks

The strategies of spreading training over an extended period and embedding it in the workplace have been shown to increase retention and assist in moving from learning to delivering value to the business.