Jupyter Notebooks are popular with data scientists. Microsoft even offers a free, hosted, “no-install” service for Python, R and F#. However, there are some downsides to notebooks—mostly to do with software engineering best practices. Joel Grus gave a provocative talk at JupyterCon 2018 entitled “I Don’t Like Notebooks”. Yihui Xie then followed up with a […]
Linguist Martin Schweinberger has used base R to perform a sociolinguistic analysis of swear word use in Irish English. The data analyzed is from the Irish component of the International Corpus of English. I’m particularly tickled by the fact that the script contains a set of regular expressions that define swear words. For instance search.pattern2 […]
FiveThirtyEight are sharing the data and code behind some of their articles. A goldmine for those wishing to learn more about data science.
R comes with a range datasets that can be used when learning the basics or trying out a new approach/package. mtcars and weather` are popular choices. However, most of the common datasets are “toy” examples. They are great for practising basic techniques, but are useless when it comes to realistically simulating data science tasks. The […]
Karl Broman and Kara Woo offer some good advice on organizing data in spreadsheets. They advocate confining the use of spreadsheets to data entry and storage—moving calculations and visualizations to other tools. This certainly avoids some of the biggest problems with using spreadsheets. However, spreadsheets don’t enforce any discipline. It’s up to the user to […]