R was prominent at last month's BUILD---Microsoft's conference for developers. Corporate Vice President for Machine Learning, Joseph Sirosh, said, if there is a single language that you choose to learn today...let it be R His keynote demonstrated a genomic data analysis using Revolution R Enterprise, recently acquired by Microsoft, running on a 1600 core Azure Hadoop cluster. Microsoft are … [Read more...]
Azure Data Lake
Microsoft have announced Azure Data Lake---a big-data repository for storing structured and semi-structure data in native formats. Data lakes can contain single files exceeding many petabytes or huge numbers of small files, so are equally suited to processing transaction logs or receiving data from many disparate "internet of things" sensors. As data lakes are compatible with the Hadoop file … [Read more...]
Big data helps to prevent students from dropping out of college
Marist College, in Poughkeepsie, NY, is using predictive analytics to determine whether students are likely to drop out of their courses. The college has developed an early warning system for students attending online and hybrid (traditional lecturing with homework completed online) courses. This system identifies students at risk based on factors such as: reading patterns (clickstream … [Read more...]
OneNet—a distributed functional programming platform from Microsoft
OneNet (now called Prajna) is a distributed functional programming platform being developed at Microsoft. As such, it has a lot of similarities to Apache Spark. Both platforms are built using functional languages---F#, in the case of OneNet, and Scala in Spark---which are also the primary languages for developers using the platforms. OneNet will have support for specializied computing devices, … [Read more...]
Happy 5th birthday, Spark
The Apache Spark project was first open-sourced on 31 March 2010. While much has been made of how quickly interest in Spark has grown, it's worth pausing to remember that it's been around for a whole five years. The project has had time to mature and expand into areas where there are the practical requirements (e.g. data frames, ML pipelines). Looking forward to what the team comes up with in … [Read more...]