Decision Mechanics

Insight. Applied.

  • Services
    • Decision analysis
    • Big data analysis
    • Software development
  • Articles
  • Blog
  • Privacy
  • Hire us

Real-world datasets for learning data science in R

March 4, 2018 By editor

R comes with a range datasets that can be used when learning the basics or trying out a new approach/package. mtcars and weather` are popular choices.

However, most of the common datasets are “toy” examples. They are great for practising basic techniques, but are useless when it comes to realistically simulating data science tasks.

The dslabs package provides datasets more suited to exploring data science techniques. It includes

  • admissions Gender bias among graduate school admissions to UC Berkeley (example of Simpson’s paradox)
  • divorce_margarine Divorce rate and margarine consumption data (illustrates spurious causation)
  • gapminder Health and income outcomes
  • heights Self-reported heights
  • murders US gun murders by state for 2010
  • na_example Randomly generated count data with missing values
  • outlier_example Randomly generated adult male heights (in feet) with outliers
  • polls_us_election_2016 Fivethirtyeight 2016 Poll Data
  • research_funding_rates Gender bias in research funding in the Netherlands
  • rfalling_object Simulate falling object data
  • take_poll Models results from taking a poll
Print Friendly, PDF & Email

Share this:

  • Email
  • Twitter
  • LinkedIn
  • Facebook

Filed Under: Data analysis, Data science Tagged With: datasets, real-world

Search

Subscribe to blog via e-mail

Subscribe via RSS

Recent posts

  • Data Wrangler
  • The Trolley Problem
  • Counting votes using Excel
  • Accuracy vs precision
  • It’s not because we have insufficient data…

Copyright © 2025 · Decision Mechanics Limited · info@decisionmechanics.com