Decision Mechanics

Insight. Applied.

  • Services
    • Decision analysis
    • Big data analysis
    • Software development
  • Articles
  • Blog
  • Privacy
  • Hire us

Google Dataset Search

November 7, 2018 By editor

Google have created a tool to make it easier to discover datasets—Google Dataset Search.

One potential downside is that it requires dataset owners to provide metadata. While the Google brand means that this might get some traction, not all dataset owners are motivated to help the cause. Publication of datasets is now mandated by some funding bodies, but that doesn’t mean that the datasets have to be discoverable. Ideally we’ll see funding bodies now mandating the addition of metadata.

While we wait for Google Dataset Search to evolve, we can still rely on curated repositories and lists such as

  • UCI Machine Learning Repository
  • Rdatasets

Filed Under: Data analysis, Data science, Machine learning Tagged With: datasets

FiveThirtyEight data

March 6, 2018 By editor

FiveThirtyEight are sharing the data and code behind some of their articles.

A goldmine for those wishing to learn more about data science.

Filed Under: Data analysis, Data science Tagged With: datasets, fivethirtyeight

Real-world datasets for learning data science in R

March 4, 2018 By editor

R comes with a range datasets that can be used when learning the basics or trying out a new approach/package. mtcars and weather` are popular choices.

However, most of the common datasets are “toy” examples. They are great for practising basic techniques, but are useless when it comes to realistically simulating data science tasks.

The dslabs package provides datasets more suited to exploring data science techniques. It includes

  • admissions Gender bias among graduate school admissions to UC Berkeley (example of Simpson’s paradox)
  • divorce_margarine Divorce rate and margarine consumption data (illustrates spurious causation)
  • gapminder Health and income outcomes
  • heights Self-reported heights
  • murders US gun murders by state for 2010
  • na_example Randomly generated count data with missing values
  • outlier_example Randomly generated adult male heights (in feet) with outliers
  • polls_us_election_2016 Fivethirtyeight 2016 Poll Data
  • research_funding_rates Gender bias in research funding in the Netherlands
  • rfalling_object Simulate falling object data
  • take_poll Models results from taking a poll

Filed Under: Data analysis, Data science Tagged With: datasets, real-world

Search

Subscribe to blog via e-mail

Subscribe via RSS

Recent posts

  • What the hell is “data science” anyway?!
  • Spreadsheet error delays opening of children’s hospital
  • 16,000 coronavirus cases missed by Excel
  • 20 cognitive biases that affect your decision-making
  • The science of decision-making and data

Copyright © 2021 · Decision Mechanics Limited · info@decisionmechanics.com