Microsoft Research have released over 50 free data sets via their Open Data site. They include 38 million tweets from the 2012 US presidential election Profiles of 1 million celebrities (1000 with images) Are there actually 1 million celebrities now?! Maybe someone can analyze the data to confirm. As in all data science we’ll need […]
Linguist Martin Schweinberger has used base R to perform a sociolinguistic analysis of swear word use in Irish English. The data analyzed is from the Irish component of the International Corpus of English. I’m particularly tickled by the fact that the script contains a set of regular expressions that define swear words. For instance search.pattern2 […]
FiveThirtyEight are sharing the data and code behind some of their articles. A goldmine for those wishing to learn more about data science.
R comes with a range datasets that can be used when learning the basics or trying out a new approach/package. mtcars and weather` are popular choices. However, most of the common datasets are “toy” examples. They are great for practising basic techniques, but are useless when it comes to realistically simulating data science tasks. The […]
My paper on visualizing dynamic networks has just been published in the Handbook of Research Methods in Complexity Science. There’s a discount code for those purchasing in the next three months—VIP35.