The BBC has an article on using a sensor array to determine what makes 23-time Isle of Man TT winner John McGuinness so quick. Motorcycle riders have been tackling the 38-mile street circuit for over a hundred years. As it's run on (closed) public roads, it's an incredibly dangerous race. Riders average 212kph (132mpg) round the course---often coming within inches of stone walls and … [Read more...]
Prajna—Microsoft’s response to Spark
Microsoft are developing an open-source distributed analytics platform---codename "Prajna". It's apparently inspired by Spark, but will also make it easy for developers to deploy cloud servies that exploit the processing capabilities of the platform. It's written in F# and the project is hosted on GitHub. … [Read more...]
Spark 1.5.0 released
Spark 1.5.0 has now been released---and it's a significant one for the data science community. Databricks, in their announcement blog post, state Another major theme of this release is data science: Spark 1.5 ships several new machine learning algorithms and utilities, and extends Spark's new R API. Improvements of note include better coverage for the pipeline API and an MLlib API for … [Read more...]
A walk through a Spark Random Forest
Learning Tree International have just published one of my articles on using Random Forest models with Spark. … [Read more...]
R on Spark
The upcoming Apache Spark 1.4 release will include SparkR---an R package that will allow big data (Spark) analyses to be run from the R shell. Computations in SparkR will be comparable to those that use the native Scala language. Future developments are to include machine learning support. … [Read more...]