$30 off During Our Annual Pro Sale. View Details »

Understanding and Overcoming Parallelism Bottle...

Gustavo Pinto
January 14, 2018
110

Understanding and Overcoming Parallelism Bottlenecks in ForkJoin Applications

Gustavo Pinto

January 14, 2018
Tweet

More Decks by Gustavo Pinto

Transcript

  1. Modern Java applications run on parallel architectures java.lang.Thread • Widely

    used • Low level API • Error prone • Well used • High Level API • User friendly java.util.concurrent.Executors
  2. Modern Java applications run on parallel architectures java.lang.Thread • Widely

    used • Low level API • Error prone • Well used • High Level API • User friendly java.util.concurrent.Executors ForkJoin • Can be more used • Sophisticated API • Sophisticated scheduler
  3. Modern Java applications run on parallel architectures java.lang.Thread • Widely

    used • Low level API • Error prone • Well used • High Level API • User friendly java.util.concurrent.Executors • Can be more used • Sophisticated API • Sophisticated scheduler ForkJoin
  4. Why ForkJoin? Divider and conquer algorithm @gustavopinto fork() fork() fork()

    fork() join() fork() fork() join() join() join() n ForkJoin Task
  5. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    5 1 n n ForkJoin Task ForkJoin Worker 4 9 10 13 2 6 12 11 8 @gustavopinto
  6. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    5 1 n n ForkJoin Task 4 9 10 13 2 6 12 11 8 ForkJoin Worker @gustavopinto
  7. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    1 n n ForkJoin Task 4 9 10 13 2 6 12 11 8 ForkJoin Worker @gustavopinto
  8. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    1 n n ForkJoin Task 4 9 10 13 2 6 12 11 8 2 ForkJoin Worker @gustavopinto
  9. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    1 n n ForkJoin Task 4 9 10 13 6 12 11 8 2 ForkJoin Worker @gustavopinto
  10. ForkJoin Applications What are the ForkJoin applications? 1. Search for

    java.util.concurrent.ForkJoinPool 2. Investigate if ForkJoin is indeed used 3. Filter out class assignments and pet projects 4. Try to build and run the code @gustavopinto Application Programmers
  11. ForkJoin Applications What are the ForkJoin applications? 1. Search for

    java.util.concurrent.ForkJoinPool 2. Investigate if ForkJoin is indeed used 3. Filter out class assignments and pet projects 4. Try to build and run the code ecco ejisto mandelbrot knn jacer conflate cq4j mywiki exhibitor warp lowlatency … 30 projects selected e.g., @gustavopinto Application Programmers 380KLoC ~
  12. ForkJoin Applications What about that higher level libs? 1. 330K

    lines of code 2. 21k commits 3. 470 source code contributors 4. Written (mostly) in Scala and Java 5. Well-known and well-used @gustavopinto System Programmers
  13. This paper A depth-oriented study and restructuring of the akka

    message passing algorithm @gustavopinto A breadth-oriented study of 30 real-world ForkJoin open- source projects A refactoring tool (the first aimed at improving energy consumption of parallel systems)
  14. Understanding Parallelism Bottlenecks v0 For each version, we measured execution

    time and energy consumption Intel CPU: A 2×8-core (32-cores w/ hyper-threading), running Debian, 2.60GHz, with 64GB of memory, JDK version 1.7 .0 71, build 14. JRapl: Software-based energy measurement @gustavopinto { …
  15. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling actor actors process

    their own messages there is no side effect @gustavopinto mailbox
  16. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling actor actors process

    their own messages actors exchange, but do not share the same message there is no side effect @gustavopinto mailbox
  17. Overcoming Parallelism Bottlenecks Work Stealing 1 2 3 4 7

    3 5 1 n n ForkJoin Task ForkJoin Worker 4 9 10 13 6 12 11 8 @gustavopinto
  18. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling ( Centralized )

    @gustavopinto ( DEcentralized ) actor actor actor actor mailbox tn =
  19. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling ( Centralized )

    @gustavopinto ( DEcentralized ) actor actor actor actor mailbox tn = mailbox .fork()
  20. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling ( Centralized )

    @gustavopinto ( DEcentralized ) actor actor actor actor mailbox tn = mailbox .fork() tn
  21. Overcoming Parallelism Bottlenecks Work Stealing 1 2 3 4 7

    3 5 1 n n ForkJoin Task ForkJoin Worker 4 9 10 13 6 12 11 8 @gustavopinto
  22. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling ( Centralized )

    @gustavopinto ( DEcentralized ) actor actor actor actor mailbox tn = mailbox .fork() tn
  23. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t2 = first half t3 = second half @gustavopinto
  24. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t1 t2 t3 t2 = first half t3 = second half @gustavopinto
  25. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t1 t2 t3 t2 = first half t3 = second half make global @gustavopinto
  26. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t1 t2 t3 t2 = first half t3 = second half 0 1 2 3 4 5 6 7 @gustavopinto
  27. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t1 t2 t3 t2 = first half t3 = second half 1 2 5 6 Up to 20% of energy savings! @gustavopinto 0 3 4 7
  28. Overcoming Parallelism Bottlenecks Bottleneck #3: Copy on Join t1 =

    t2 = t3 = t1 + t2 a b c d e f g h @gustavopinto
  29. Overcoming Parallelism Bottlenecks Bottleneck #3: Copy on Join t1 =

    t2 = t3 = t1 + t2 a b c d e f g h t1 t2 t3 @gustavopinto
  30. Overcoming Parallelism Bottlenecks Bottleneck #3: Copy on Join t1 =

    t2 = t3 = t1 + t2 a b c d e f g h t1 t2 t3 0 1 2 3 0 1 2 3 @gustavopinto
  31. Overcoming Parallelism Bottlenecks Bottleneck #3: Copy on Join t1 =

    t2 = t3 = t1 + t2 a b c d e f g h t1 t2 t3 1 2 1 2 Up to 12% of energy savings! @gustavopinto 0 3 0 3
  32. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask b = forks the subtask @gustavopinto
  33. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d b = forks the subtask @gustavopinto a
  34. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d tn new b = forks the subtask @gustavopinto a
  35. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d tn new b = forks the subtask @gustavopinto a
  36. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d tn tn .fork() new b = forks the subtask @gustavopinto a b
  37. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d tn tn .fork() new many times b = forks the subtask @gustavopinto a b
  38. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 = aaaabbbb… b = forks the subtask @gustavopinto
  39. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 = aaaabbbb… t1 a b c d tn new b = forks the subtask @gustavopinto a
  40. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 = aaaabbbb… t1 a b c d tn new list.add( ) tn b = forks the subtask @gustavopinto a
  41. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 = aaaabbbb… t1 a b c d tn new list.add( ) tn global b = forks the subtask @gustavopinto a
  42. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask b = forks the subtask t1 = aaaabbbb… t1 a b c d tn new list.add( ) tn global after creating the objects.. for task in list: tn .fork() @gustavopinto a b
  43. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask b = forks the subtask t1 = aaaabbbb… t1 a b c d tn new list.add( ) tn global after creating the objects.. for task in list: tn .fork() 10% of energy savings 3% less cache misses 5% less context switches @gustavopinto a b
  44. Overcoming Parallelism Bottlenecks Patching Bottleneck #2: Copy on Fork 7/9

    of projects that replied have accepted the PR @gustavopinto