There are now three Spark APIs for working with large volumes of data RDD DataFrame Dataset Which one should we use? Good question. Jules Damji provides a pretty comprehensive answer in an article on the Databricks blog. RDD was the original API for working with large volumes of data. The first thing to note is that the RDD API is not being deprecated. It has an important role to play. RDDs … [Read more...]