DataCamp have published an article on the five R packages with the most (direct) downloads. This is based on their leaderboard. Packages 3-5 are currently swapping positions. As I write this (18 November 2016) the top five are dplyr devtools ggplot2 cluster foreign It's notable that the list of the most popular packages is heavily weighted towards the manipulation and display of data. This is … [Read more...]
Public data sources
Data science requires data. Yep. Insightful. Unless you work at a data-rich organization, data can be hard to obtain. You may want to try out a new technique or tool. Alternatively, you may need additional data to fuse with your own limited in-house data. In either case, Nathan Yau's updated list of public data sources might help. He lists sources for the following types of … [Read more...]
RStudio 1.0 released
RStudio have released version 1.0 of their eponymous R IDE. They are calling it their ...biggest [release] ever! It certainly has a number of very significant features. Integrated support for Spark Spark and R are core tools for data scientists. While Spark has an R API, support for the machine learning libraries is lagging. So, it's great to hear that RStudio now has integrated support … [Read more...]
R at Microsoft
David Smith, R Community Lead at Microsoft, talks about how they are using R. He covers both how it is being integrated into the product line and how it is used internally to analyse operational data. … [Read more...]
Machine learning algorithm cheat sheet
The recent explosion of interest in machine learning has resulted in a profusion of algorithms. It can be difficult to know which one is most suited to your problem. Recognizing this challenge Microsoft have produced a machine learning algorithm cheat sheet. It's designed to allow you to choose between the algorithms available in Microsoft's Azure Machine Learning Studio, but, as many of the … [Read more...]