R comes with a range datasets that can be used when learning the basics or trying out a new approach/package. mtcars and weather` are popular choices.
However, most of the common datasets are “toy” examples. They are great for practising basic techniques, but are useless when it comes to realistically simulating data science tasks.
The dslabs package provides datasets more suited to exploring data science techniques. It includes
- admissionsGender bias among graduate school admissions to UC Berkeley (example of Simpson’s paradox)
- divorce_margarineDivorce rate and margarine consumption data (illustrates spurious causation)
- gapminderHealth and income outcomes
- heightsSelf-reported heights
- murdersUS gun murders by state for 2010
- na_exampleRandomly generated count data with missing values
- outlier_exampleRandomly generated adult male heights (in feet) with outliers
- polls_us_election_2016Fivethirtyeight 2016 Poll Data
- research_funding_ratesGender bias in research funding in the Netherlands
- rfalling_objectSimulate falling object data
- take_pollModels results from taking a poll
