Apache Giraph
●
What is it ?
●
How does it work ?
●
Dependencies
●
Examples
www.semtech-solutions.co.nz [email protected]
Slide 2
Slide 2 text
Giraph – What is it ?
●
Graph processing for Hadoop V2
●
For tasks that dont fit Map Reduce
●
Better performance for those tasks
●
Processing by interations called super steps
●
Uses Bulk Synchronous Parallel computing ( BSP )
– See Apache Hama presentation
●
Licensed via Apache
●
For distributed computing
●
For massive calculations
www.semtech-solutions.co.nz [email protected]
Slide 3
Slide 3 text
Giraph – How does it work ?
●
Consider example
– Input is chain graph
– Find shortest path
– Three super steps
– Vertices have values
– As do edges
– Messages between steps
www.semtech-solutions.co.nz [email protected]
Slide 4
Slide 4 text
Giraph – Dependencies
●
What does Apache Giraph need ?
– Java 1.6
– Maven 3 or higher
– ZooKeeper
– Hadoop
●
Yarn ( 2.0.3-alpha ) or
●
Version 0.20.x
●
So Giraph is graph processing for Hadoop V2 !!
●
Based on Google Pregel
www.semtech-solutions.co.nz [email protected]
Slide 5
Slide 5 text
Giraph – Examples
●
Consider the distance between friends problem
– Facebook friends
– ( and ) LinkedIn Connections
– Shortest distance between friends
●
Its a graph
●
Process intensive to do as a Map Reduce job
●
See next two slides
www.semtech-solutions.co.nz [email protected]
Contact Us
●
Feel free to contact us at
– www.semtech-solutions.co.nz
– [email protected]
●
We offer IT project consultancy
●
We are happy to hear about your problems
●
You can just pay for those hours that you need
●
To solve your problems