Neo4j have an informative whitepaper highlighting the top 5 use cases for graph databases. They highlight the following application areas Fraud detection Real-time recommendations Master data management Network and IT operations Identity and access management A similar article on Data Informed also highlights the role graph databases can play in managing the Internet of Things. … [Read more...]
RDDs, DataFrames and Datasets
There are now three Spark APIs for working with large volumes of data RDD DataFrame Dataset Which one should we use? Good question. Jules Damji provides a pretty comprehensive answer in an article on the Databricks blog. RDD was the original API for working with large volumes of data. The first thing to note is that the RDD API is not being deprecated. It has an important role to play. RDDs … [Read more...]
ScaleR package now available as part of free Microsoft R Client
The ScaleR package provides functions for performing scalable and extremely high performance data management, analysis, and visualization in R. It was only available to those who had a Microsoft R Server license---until now. With the introduction of the free Microsoft R Client for Windows tool you can now work with the full set of ScaleR functions without having to part with a cent. Of course, … [Read more...]
Microsoft announces major commitment to Apache Spark
Microsoft have just announced an extensive commitment for Spark to power Microsoft’s big data and analytics offerings including Cortana Intelligence Suite, Power BI, and Microsoft R Server Spark 1.6.1 is available on Azure HDInsight and integration with R Server is following. This will allow R functions to be run at scale over thousands of Spark nodes. … [Read more...]
Microsoft R Server documentation is now online
The complete Microsoft R Server documentation is now available on MSDN---and is publicly accessible. It includes comprehensive details of the RevoScaleR High Performance Analytics package. RevoScaleR includes the following analysis functions rxSummary (basic summary statistics) rxLinMod (linear modeling) rxLogit (logistic regression modeling) rxGlm (generalized linear modeling) rxCovCor … [Read more...]