Absence of evidence is not evidence of absence
Google Dataset Search
Google have created a tool to make it easier to discover datasets---Google Dataset Search. One potential downside is that it requires dataset owners to provide metadata. While the Google brand means that this might get some traction, not all dataset owners are motivated to help the cause. Publication of datasets is now mandated by some funding bodies, but that doesn't mean that the datasets have … [Read more...]
Jupyter Notebooks—love ’em or hate ’em?
Jupyter Notebooks are popular with data scientists. Microsoft even offers a free, hosted, "no-install" service for Python, R and F#. However, there are some downsides to notebooks---mostly to do with software engineering best practices. Joel Grus gave a provocative talk at JupyterCon 2018 entitled "I Don't Like Notebooks". Yihui Xie then followed up with a response to Grus' talk. Both authors … [Read more...]
R analysis of frequency of swear words in Irish English by gender and age
Linguist Martin Schweinberger has used base R to perform a sociolinguistic analysis of swear word use in Irish English. The data analyzed is from the Irish component of the International Corpus of English. I'm particularly tickled by the fact that the script contains a set of regular expressions that define swear words. For instance search.pattern2 <- c (" [A|a]rse [s|d]{0 ,1} ") Turns out … [Read more...]