Iterative jobs – Machine learning algorithms Ñ Iterative analytics Hadoop is slow when it comes to performs operations multiple times since each time it will come up as another MapReduce job. Introduction
abstraction over data objects Ñ Spark is implemented in Scala Ñ Many distributed operators are available [count, collect, first] Ñ 10x faster than Hadoop in iterative machine learning Ñ Sub-‐‑second latency to scan a 39GB dataset Introduction
created by files in file system [HDFS], transforming an existing RDD etc. Ñ Cacheable Ñ Tracks the lineage (how it was built) – this allows Spark to rebuild a lost RDD Resilient Distributed Dataset