Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to Apache Spark MLlib

An introduction to Apache Spark MLlib

A introduction to Apache Spark MLlib, what is it and
how does it work ? What can it do ?

Mike Frampton

January 24, 2015
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Spark MLlib • What is Apache Spark ? •

    What is MLlib ? • Functionality • Dependencies • Books • Eco-system www.semtech-solutions.co.nz [email protected]
  2. Spark – What is it ? • Alternative to Map

    Reduce for certain applications • A low latency cluster computing system • For very large data sets • May be 100 times faster than Map Reduce • Used with Hadoop / HDFS • Uses in memory cluster computing • Memory access faster than disk access • Has API's written in Scala / Java / Python www.semtech-solutions.co.nz [email protected]
  3. Spark MLlib – What is it ? • Spark Machine

    Learning Library • Provided with Spark Install • Code in Scala / Java / Python • Contain libraries – Spark.mllib – Spark.ml ( V1.2 ) • Provides common functionality – classification, regression, clustering – collaborative filtering, dimensionality reduction www.semtech-solutions.co.nz [email protected]
  4. Spark MLlib – Functionality • Basic Stats • Classification and

    regression • Collaborative Filtering • Clustering • Dimensionality reduction • Feature extraction and transformation • Optimization www.semtech-solutions.co.nz [email protected]
  5. Spark MLlib – Dependencies • NumPy for Python • Breeze

    ( linear algebra ) • Netlib-java • Jblas • Gfortran runtime library www.semtech-solutions.co.nz [email protected]
  6. Available Books • See our Hadoop book from Apress /

    Springer – “Big Data Made Easy” • Look out for our Apache Spark based book – from Packt in 2015 www.semtech-solutions.co.nz [email protected]
  7. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems