MLlib
• Core Machine Learning algorithms in the Spark stdlib.
• Primarily written in Scala, but can be used in PySpark via PythonMLLibAPI
• So far contains algorithms for :
- Regression : Ridge, Lasso, Linear
- Classification : Support Vector Machines, Logistic Regression,
Naive Bayes, Decision Trees
- Linear Algebra : DistributedMatrix, RowMatrix, etc.
- Recommenders : Alternating Least squares, (SVD++ in GraphX)
- Clustering : K means
- Optimisation : Stochastic Gradient Descent
- … …
More being contributed …. Look at the Spark JIRA and for the Spark 1.0 release