Wrappers for Spark's MLlib machine learning library in SparkR have been slow to arrive. However, the future looks bright. The imminent 2.0 release will bring k-means support to SparkR and the 2.1 release is scheduled to include wrappers for the following machine learning stalwarts Alternating Least Squares (ALS) Decision Trees Gaussian Mixture Models Isotonic Regression Latent Dirichlet … [Read more...]
The Mathematics of Machine Learning
Wale Akinfaderin, a research scientist intern at IBM Research, has written an article on the mathematical background required to fully understand machine learning. He includes a list of educational resources for those who wish to brush up in certain areas. Wale correctly stresses that you don't need this level of knowledge to start making use of machine learning. There are many packaged … [Read more...]
Machine learning algorithm cheat sheet
The recent explosion of interest in machine learning has resulted in a profusion of algorithms. It can be difficult to know which one is most suited to your problem. Recognizing this challenge Microsoft have produced a machine learning algorithm cheat sheet. It's designed to allow you to choose between the algorithms available in Microsoft's Azure Machine Learning Studio, but, as many of the … [Read more...]
ScaleR package now available as part of free Microsoft R Client
The ScaleR package provides functions for performing scalable and extremely high performance data management, analysis, and visualization in R. It was only available to those who had a Microsoft R Server license---until now. With the introduction of the free Microsoft R Client for Windows tool you can now work with the full set of ScaleR functions without having to part with a cent. Of course, … [Read more...]
Microsoft announces major commitment to Apache Spark
Microsoft have just announced an extensive commitment for Spark to power Microsoft’s big data and analytics offerings including Cortana Intelligence Suite, Power BI, and Microsoft R Server Spark 1.6.1 is available on Azure HDInsight and integration with R Server is following. This will allow R functions to be run at scale over thousands of Spark nodes. … [Read more...]