Talk from SF Scala / Bay Area Machine Learning Meetup on 9-22-2014.
This talk discusses learning Decision Trees in a distributed computing cluster using MLlib, the machine learning library built on top of Spark. Decision trees are a powerful machine learning algorithm which are used in many applications. Spark is an open-source project for large-scale data analytics. This talk explains how trees are implemented on Spark, discusses how best to use MLlib trees in practice, and gives a number of examples.