Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to Apache Mahout

An introduction to Apache Mahout

A introduction to Apache Mahout, what is it and
how does it work ? What is machine inteligence ?
How can mahout be installed and tested on Hadoop ?

Mike Frampton

August 10, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Mahout • What is it ? • How does

    it work ? • Machine Learning • Algorithms • Install www.semtech-solutions.co.nz [email protected]
  2. Mahout – What is it ? • Machine learning •

    For large data • Based on Hadoop • But can work on a non Hadoop cluster • Scaleable • Licensed by Apache www.semtech-solutions.co.nz [email protected]
  3. Mahout – How does it work ? • Uses Hadoop

    Map Reduce • Has many supplied algorithms • Supports four use cases – Recommendation mining – Clustering – Classification – Frequent Itemset Mining www.semtech-solutions.co.nz [email protected]
  4. Mahout - Machine Learning Machine learning – what does it

    mean ? • A branch of artificial intelligence • Systems that learn from data • Classify data after learning • Learn on test data sets • Generalisation – the ability to classify unseen data sets – after learning www.semtech-solutions.co.nz [email protected]
  5. Mahout – Algorithms Some of the available algorithms (among many

    others) – Collaborative filtering • Narrow Sense – make predictions about user interests by collecting preferences • General - Multi agent collaboration for information filtering – Mean shift clustering • Mode seeking, used for visual tracking – Parallel frequent pattern mining • Find unique features www.semtech-solutions.co.nz [email protected]
  6. Mahout – Install So how do we install Mahout and

    test it ? – Install Maven • sudo apt-get install maven3 – Install Apache Mahout • You will need subversion installed • svn co http://svn.apache.org/repos/asf/mahout/trunk • Go to dir containing pom.xml file – mvn install ## in ./trunk Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz [email protected]
  7. Mahout – Test Install So let us run a test

    • cd $MAHOUT_HOME/examples/bin • ./build-reuters.sh • choose option 1 kmeans clustering • Should finish with – see next slide Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz [email protected]
  8. Mahout – Test Install cd $MAHOUT_HOME/examples/bin ; ./build-reuters.sh Please call

    cluster-reuters.sh directly next time. This file is going away. Please select a number to choose the corresponding clustering algorithm 1. kmeans clustering 2. fuzzykmeans clustering 3. lda clustering Enter your choice : 1 ok. You chose 1 and we'll use kmeans Clustering ................................. Inter-Cluster Density: NaN Intra-Cluster Density: 0.0 CDbw Inter-Cluster Density: NaN CDbw Intra-Cluster Density: NaN CDbw Separation: NaN Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz [email protected]
  9. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems