Slide 7
Slide 7 text
• Deployment and scaling toolbox: autoscale the cluster, autoscale the local storage,
define jobs and clusters or just go ad hoc. Multiple cloud providers through a single
API. Bigger master, smaller workers.
• We wanted to see if Databricks Runtime join and filter optimizations could make
our jobs faster relative to what's offered in Apache Spark
• Superior, easy to use tools.Spark history server (only recently available elsewhere),
historical Ganglia screenshots , easy access to logs from a browser.
• Optimized cloud storage access
• Run our Spark jobs in the same environment where we run our notebooks
Why Databricks?
7