Nathan Yau of FlowingData has published an up-to-date list of what he uses to turn raw data into his impressive visualizations. What's always striking about the tools he uses is that there's no "quick fix". He utilizes a range of industry standard data manipulation (e.g. R) and graphic design (e.g. Adobe Illustrator) tools in his work. I particularly liked his comments on processing and … [Read more...]
Microsoft R Server is now on the Microsoft Data Science Virtual Machine
The Microsoft Data Science Virtual Machine (DSVM) now comes pre-configured with Microsoft R Server Developer Edition. As you can scale the DSVM according to your needs, this is an easy way to get going with some heavy duty R computations. … [Read more...]
Google’s self-driving car causes crash
The New Scientist reports that one of Google's autonomous cars drove into a bus on 14 February 2016. Apparently Google's cars have been involved in 18 accidents in Mountain View since it started testing in 2010. All have been other vehicles striking a stationary or slow moving Google car. However, in the latest incident the AI decided to pull out into the path of the slow-moving bus---i.e. the AI … [Read more...]
Football match data repository
Football-Data maintains comprehensive CSV files of football matches dating back the the early 90s. Leagues currently covered are Belgian Dutch English French German Greek Italian Portuguese Scottish Spanish Turkish The data is well-formatted and free---so a great resource if you want to try out some data science techniques with simple, real-world, data. … [Read more...]
Kaggle provide home for high quality public datasets
Kaggle have launched Kaggle Datasets---a repository of "high quality public datasets". The repository will support: Access: simple, consistent access to the data with clear licensing Analysis: a way to explore the data without downloading it Results: visibility of previous work performed using the data Conversation: forums for discussing the nuances of the data … [Read more...]