Twitter have an interest in detecting anomalies in their service. Anomalies could be down to user engagement, spamming or technical issues. Regardless of the reasons, it's something they want to know about when it happens. To aid detection of anomalies in their time series data they have developed, and open-sourced, an anomaly detection package for R. Their algorithm is based on the Generalized … [Read more...]
Visual p-hacking
It's that time of year when we start getting the "best x of 2015" posts. Nathan Yau of FlowingData just published his list of the best visualization projects. Yau reckons that this was the year of using visualization to teach about data and statistics. My favorite is "Science Isn't Broken" by Christie Aschwanden of FiveThirtyEight. It's a visual interactive demonstration of how you can shape the … [Read more...]
Don’t tell me you don’t have the data
If they can follow a seal into the freezing Antarctic water, you can collect a few log entries. Really. … [Read more...]
Subjectivity in data science
An article recently published in Nature reinforces the fact that the real challenge in data science is not mastery of the technical tools, but the ability to understand and define the problem. Researchers posed the question of whether the color of a soccer player's skin is a factor in how many red cards (serious reprimands) he receives. Seems like a pretty straightforward analysis. The authors … [Read more...]
Analyzing an Isle of Man TT legend
The BBC has an article on using a sensor array to determine what makes 23-time Isle of Man TT winner John McGuinness so quick. Motorcycle riders have been tackling the 38-mile street circuit for over a hundred years. As it's run on (closed) public roads, it's an incredibly dangerous race. Riders average 212kph (132mpg) round the course---often coming within inches of stone walls and … [Read more...]