In this presentation from QCon SF, Doug Bateman walks through some of engines in Spark, and demonstrates its ability to rapidly process Big Data. He'll cover:
+ Extracting information with RDDs
+ Querying data using DataFrames
+ Visualizing and plotting data
+ Creating a machine-learning pipeline with Spark-ML and MLLib.
He'll also discuss the internals which make Spark 10-100 times
faster than Hadoop MapReduce and Hive.