Leo Laporte and guests discuss social media profiling in the latest TWiT podcast. They describe credit ratings in China that are formed not just from financial data, but also social networking activities. According to the podcast these scores have already been used deny train travel to 7m people. As Laporte says, "It's out of Black Mirror." … [Read more...]
Statistics books to read for pleasure
I've read quite a few (really) dry, technical books in my time. But even I was shocked to see an article entitled "Statistics Books to Read for Pleasure". Isn't that just a step too far? However, it reminded me of one excellent book that should be required reading after this month's polling meltdown. "Everydata: The Misinformation Hidden in the Little Data You Consume Every Day" is an excellent, … [Read more...]
Machine learning at scale on HDInsight using Microsoft R Server at Spark
Microsoft have published an article on how to conduct a decision tree analysis using Microsoft R Server and Spark on Azure HDInsight. Using four 8-core 28Gb RAM (D4) worker nodes they were able to process 170 million rows (37GB) in around 5 minutes. This was 20% faster than using Spark's own MLLib libraries---although there's no comparison with Spark's newer ML libraries. … [Read more...]
Building data-driven web applications using R and Shiny
Learning Tree just published my article on using R and Shiny to build data-driven web applications. … [Read more...]
The best job of 2016 is…data scientist!
According to Glassdoor, the best job in 2016 (in America) is data scientist. They determine this based on three factors number of job openings salary career opportunities rating The latter two are reported by their users. Personally I'm not buying it. Number 2 is tax manager... … [Read more...]