Tableau have just announced that Tableau Public Premium will now be free. Tableau Public is a cloud-based service that makes is easy to analyze and visualize large data sets. Working with Tableau is akin to working with a spreadsheet. The premium services adds the following features: support for larger data sets---up to 10 million rows a storage limit of 10 gigabytes the ability to restrict … [Read more...]
Scala on the rise
The TIOBE Index (of programming language popularity) for April 2015 notes that: Another interesting move this month concerns Scala. The functional programming language jumps to position 25 after having been between position 30 and 50 for many years. Scala seems to be ready to enter the top 20 for the first time in history. Scala is the language in which the hugely popular Apache Spark … [Read more...]
Birth of a Theorem
I've been listening to extracts from "Birth of a Theorem: A Mathematical Adventure" on BBC Radio 4's Book of the Week. It's Cédric Villani's account of the years leading up to his award of the Fields Medal---the most coveted prize in mathematics. We rarely get a chance to see the creative process at work. All we get to see is the final result---wrapped up in a neat little bow. We don't see the … [Read more...]
Dirty data is the biggest challenge facing data scientists
A recent survery of data scientists by CrowdFlower found that, when it comes to challenges Dirty data is the #1 hurdle… My own experience leads me to agree. Access to good quality data remains a huge problem. Organizations would do well to invest in improving the quality of their data before boosting their analytics capabilities. Garbage in, garbage out. One error I see made regularly … [Read more...]
Introduction to Apache Spark Workshop videos
If you are interested in learning about Apache Spark, a popular large-scale data processing engine, the Introduction to Apache Spark Workshop videos from the 2014 Spark Summit are a good place to start. … [Read more...]