R comes with a range datasets that can be used when learning the basics or trying out a new approach/package. mtcars
and weather
` are popular choices.
However, most of the common datasets are “toy” examples. They are great for practising basic techniques, but are useless when it comes to realistically simulating data science tasks.
The dslabs package provides datasets more suited to exploring data science techniques. It includes
admissions
Gender bias among graduate school admissions to UC Berkeley (example of Simpson’s paradox)divorce_margarine
Divorce rate and margarine consumption data (illustrates spurious causation)gapminder
Health and income outcomesheights
Self-reported heightsmurders
US gun murders by state for 2010na_example
Randomly generated count data with missing valuesoutlier_example
Randomly generated adult male heights (in feet) with outlierspolls_us_election_2016
Fivethirtyeight 2016 Poll Dataresearch_funding_rates
Gender bias in research funding in the Netherlandsrfalling_object
Simulate falling object datatake_poll
Models results from taking a poll