Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Graft: A Tool for Debugging and Testing 
 Apache Giraph Programs

Graft: A Tool for Debugging and Testing 
 Apache Giraph Programs

Presentation given at the 2nd Giraph & Graph Analytics Meetup at Facebook.

https://github.com/semihsalihoglu/graft
http://doi.acm.org/10.1145/2723372.2735353

Jaeho Shin

August 24, 2015
Tweet

More Decks by Jaeho Shin

Other Decks in Programming

Transcript

  1. Graft* A Tool for Debugging and Testing 
 Apache Giraph

    Programs 
 Jaeho Shin <[email protected]>
 Graph Analytics Meetup @ Facebook Aug 24, 2015 * https://github.com/semihsalihoglu/graft Joint work with Semih Salihoglu, Vikesh Khanna, Ba Quan Truong, and Jennifer Widom
  2. Pregel Review Vertex Program 2 incoming message incoming message incoming

    message vertex state (value, isHalted) superstep = 0 superstep = n superstep = n-1 … outgoing message outgoing message
  3. Problem “Debugging vertex program is hard!” — from interviews of

    Giraph and GPS users Most programmers rely on print() statements! Typical debuggers not useful for distributed programs. 3
  4. Goal A replay-style debugger for Giraph • no need to

    change code for debugging • easy browsing of what’s going on • easy line-by-line replay of the vertex program • option to specify what to capture programmatically 4
  5. Running Giraph Programs 6 Worker1 Worker2 Workern … Giraph Program

    Giraph HDFS Input Output $ hadoop jar \ target/giraph-examples.jar org.apache.giraph.GiraphRunner \ org.apache.giraph.examples.SimpleShortestPathsComputation \ # more command line arguments ...
  6. Graft “Capture” 7 HDFS Worker1 Worker2 Workern … DebugConfig Giraph

    Program Graft Instrumenter Giraph Instrumented Giraph Program Captured Traces Input Output $ giraph-debug DebugConfig \ target/giraph-examples.jar org.apache.giraph.GiraphRunner \ org.apache.giraph.examples.SimpleShortestPathsComputation \ # more command line arguments ...
  7. Graft “Capture” 8 public class DebugConfig { boolean shouldDebugSuperstep(long s)

    { return s < 10; } int numberOfRandomVerticesToCapture() { return 5; } boolean shouldCaptureNeighborsOfVertices() { return true; } boolean shouldCatchExceptions() { return false; } boolean shouldDebugVertex(Vertex v, long superstep) { ... } boolean isVertexValueCorrect(ID vertexId, Value val) { ... } boolean isMessageCorrect(Message msg, ID srcID, ID dstID, int superstep) { return msg.value >= 0; } ... } $ giraph-debug -S{0..9} -R5 -V{1,3,5,7,9} -N -E \ target/giraph-examples.jar org.apache.giraph.GiraphRunner \ org.apache.giraph.examples.SimpleShortestPathsComputation \ # ... DebugConfig example Command-line options
  8. Graft “Visualize” 9 HDFS Worker1 Worker2 Workern … DebugConfig Giraph

    Program Giraph Instrumented Giraph Program Captured Traces Input Graft GUI Visualizations Output
  9. Graft “Reproduce” 12 HDFS Worker1 Worker2 Workern … DebugConfig Giraph

    Program Giraph Instrumented Giraph Program Captured Traces Input Graft Context Reproducer JUnit Test Cases Output
  10. Debugging with Graft 15 HDFS Worker1 Worker2 Workern … DebugConfig

    Giraph Program Graft Instrumenter Giraph Instrumented Giraph Program Captured Traces Input Graft GUI Graft Context Reproducer JUnit Test Cases Visualizations Output
  11. Graft Giraph Graft Instrumentation 17 User’s Vertex Program compute() AbstractComputation

    sendMessage() Worker Bottom Interceptor Top Interceptor compute() sendMessage() extends extends extends calls calls capture
 (inMsgs, valBefore, valAfter) capture
 (outMsgs) runs
  12. Graft Instrumentation • Using Javassist • Caching instrumented jars on

    HDFS for faster reruns and reliable reproduction • Short-circuits to reduce instrumentation overhead • Asynchronous HDFS writes 18
  13. Performance Overhead 0 0.5 1 1.5 2 GC-bipartite MWM-sk RW-tw

    Relative Runtime to 'no-debug' Algorithm & Graph Overhead of Graft no-debug DC-sp DC-sp+nbr DC-msg DC-vv DC-full 0 0 0 305 209 34 911 62747 2291 103 1 1 310375 1 13503 1246151 237467 24213 19
  14. Performance Overhead 0 0.5 1 1.5 2 GC-bipartite MWM-sk RW-tw

    Relative Runtime to 'no-debug' Algorithm & Graph Overhead of Graft no-debug DC-sp DC-sp+nbr DC-msg DC-vv DC-full 0 0 0 305 209 34 911 62747 2291 103 1 1 310375 1 13503 1246151 237467 24213 20 Acceptable! Wall clock time Usually ~15% Less than 30% when fully capturing 1.2M (~18GB)
  15. Limitations • Capturing states kept outside Giraph API • Capturing

    unread Aggregators • Instrumentation not perfect • DebugConfig expressivity limited to single vertex • Can’t filter outgoing messages by recipient’s state 21
  16. Summary Graft is a debugger for Apache Giraph programs •

    Streamlines Capture-Visualize-Reproduce process • Requires little effort to use (no change to code!) • Has acceptable performance overhead (20-30%) • Helps creating end-to-end test input graphs 23
  17. Pointers • https://github.com/semihsalihoglu/graft • Our SIGMOD ’15 Paper:
 Semih Salihoglu,

    Jaeho Shin, Vikesh Khanna, Ba Quan Truong, and Jennifer Widom.
 Graft: A Debugging Tool For Apache Giraph. 24