If you are interested in learning about Apache Spark, a popular large-scale data processing engine, the Introduction to Apache Spark Workshop videos from the 2014 Spark Summit are a good place to start. … [Read more...]
Those damn users
A picture is worth a thousand words. The above photograph, tweeted by @stuartice, illustrates why we need to think about the wider system when solving problems. This is especially true when, as in this case, the wider system involves sentient actors who may not share our goals. Similar effects can be seen in any situation in which decision-makers are trying to save people from themselves. If we … [Read more...]
Common Core echo chambers
Aankit Patel, a data consultant at NYC Department of Education, has published an interesting article on the Common Core State Standards (CCSS). The CCSS are nationwide US academic standards for English and mathematics. They were introduced in response to: American students' declining scores on international tests differing standards being applied by state education departments the prospects for … [Read more...]
If you have tools, everything looks chartable
Data and charts seem to be inseparable. If there’s data, the temptation to visualize it can be irresistible. We have all these amazing tools that produce amazing graphics with the press of a button—and showing the data feels honest. But it’s often a “cover your ass” strategy. The job of a data scientist is to make sense of data…draw insights from it—not … [Read more...]
Data science with F#
R is great for doing data analysis, but it can be frustrating as a programming language. It doesn't feel familiar to developers---it pays greater homage to its statistical heritage---and there's little consistency across the community-built packages. F#, on the other hand, is first and foremost a programming language. And, as it's a .NET language, it has access to a enterprise-grade tooling and … [Read more...]