Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Apache Giraph

An Introduction to Apache Giraph

An Introduction to Apache Giraph, what is it ?
Graph processing ( BSP ) for Hadoop V2 ( YARN ).

Mike Frampton

August 03, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Giraph • What is it ? • How does

    it work ? • Dependencies • Examples www.semtech-solutions.co.nz [email protected]
  2. Giraph – What is it ? • Graph processing for

    Hadoop V2 • For tasks that dont fit Map Reduce • Better performance for those tasks • Processing by interations called super steps • Uses Bulk Synchronous Parallel computing ( BSP ) – See Apache Hama presentation • Licensed via Apache • For distributed computing • For massive calculations www.semtech-solutions.co.nz [email protected]
  3. Giraph – How does it work ? • Consider example

    – Input is chain graph – Find shortest path – Three super steps – Vertices have values – As do edges – Messages between steps www.semtech-solutions.co.nz [email protected]
  4. Giraph – Dependencies • What does Apache Giraph need ?

    – Java 1.6 – Maven 3 or higher – ZooKeeper – Hadoop • Yarn ( 2.0.3-alpha ) or • Version 0.20.x • So Giraph is graph processing for Hadoop V2 !! • Based on Google Pregel www.semtech-solutions.co.nz [email protected]
  5. Giraph – Examples • Consider the distance between friends problem

    – Facebook friends – ( and ) LinkedIn Connections – Shortest distance between friends • Its a graph • Process intensive to do as a Map Reduce job • See next two slides www.semtech-solutions.co.nz [email protected]
  6. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems