Understanding and Overcoming Parallelism Bottlenecks in ForkJoin Applications

D0270498e20bd573441f1f48f2e425cf?s=47 Gustavo Pinto
January 14, 2018
46

Understanding and Overcoming Parallelism Bottlenecks in ForkJoin Applications

D0270498e20bd573441f1f48f2e425cf?s=128

Gustavo Pinto

January 14, 2018
Tweet

Transcript

  1. Understanding and Overcoming Parallelism Bottlenecks in ForkJoin Applications F. Castor

    A. Canino @gustavopinto Y. D. Liu G. Xu
  2. Modern Java applications run on parallel architectures java.lang.Thread • Widely

    used • Low level API • Error prone • Well used • High Level API • User friendly java.util.concurrent.Executors
  3. Modern Java applications run on parallel architectures java.lang.Thread • Widely

    used • Low level API • Error prone • Well used • High Level API • User friendly java.util.concurrent.Executors ForkJoin • Can be more used • Sophisticated API • Sophisticated scheduler
  4. Modern Java applications run on parallel architectures java.lang.Thread • Widely

    used • Low level API • Error prone • Well used • High Level API • User friendly java.util.concurrent.Executors • Can be more used • Sophisticated API • Sophisticated scheduler ForkJoin
  5. Why ForkJoin? Divider and conquer algorithm @gustavopinto

  6. Why ForkJoin? Divider and conquer algorithm @gustavopinto fork() fork()

  7. Why ForkJoin? Divider and conquer algorithm @gustavopinto fork() fork() fork()

    fork() fork() fork()
  8. Why ForkJoin? Divider and conquer algorithm @gustavopinto fork() fork() fork()

    fork() join() fork() fork() join()
  9. Why ForkJoin? Divider and conquer algorithm @gustavopinto fork() fork() fork()

    fork() join() fork() fork() join() join() join()
  10. Why ForkJoin? Divider and conquer algorithm @gustavopinto fork() fork() fork()

    fork() join() fork() fork() join() join() join() n ForkJoin Task
  11. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    5 1 n n ForkJoin Task ForkJoin Worker 4 9 10 13 2 6 12 11 8 @gustavopinto
  12. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    5 1 n n ForkJoin Task 4 9 10 13 2 6 12 11 8 ForkJoin Worker @gustavopinto
  13. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    1 n n ForkJoin Task 4 9 10 13 2 6 12 11 8 ForkJoin Worker @gustavopinto
  14. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    1 n n ForkJoin Task 4 9 10 13 2 6 12 11 8 2 ForkJoin Worker @gustavopinto
  15. Why ForkJoin? Work Stealing 1 2 3 4 7 3

    1 n n ForkJoin Task 4 9 10 13 6 12 11 8 2 ForkJoin Worker @gustavopinto
  16. Why ForkJoin? Bedrock for higher-level Java concurrent libraries @gustavopinto

  17. System Programmers Application Programmers

  18. ForkJoin Applications What are the ForkJoin applications? 1. Search for

    java.util.concurrent.ForkJoinPool 2. Investigate if ForkJoin is indeed used 3. Filter out class assignments and pet projects 4. Try to build and run the code @gustavopinto Application Programmers
  19. ForkJoin Applications What are the ForkJoin applications? 1. Search for

    java.util.concurrent.ForkJoinPool 2. Investigate if ForkJoin is indeed used 3. Filter out class assignments and pet projects 4. Try to build and run the code ecco ejisto mandelbrot knn jacer conflate cq4j mywiki exhibitor warp lowlatency … 30 projects selected e.g., @gustavopinto Application Programmers 380KLoC ~
  20. ForkJoin Applications What about that higher level libs? 1. 330K

    lines of code 2. 21k commits 3. 470 source code contributors 4. Written (mostly) in Scala and Java 5. Well-known and well-used @gustavopinto System Programmers
  21. This paper A depth-oriented study and restructuring of the akka

    message passing algorithm @gustavopinto A breadth-oriented study of 30 real-world ForkJoin open- source projects A refactoring tool (the first aimed at improving energy consumption of parallel systems)
  22. Understanding Parallelism Bottlenecks @gustavopinto

  23. Understanding Parallelism Bottlenecks v0 @gustavopinto { …

  24. Understanding Parallelism Bottlenecks v0 For each version, we measured execution

    time and energy consumption Intel CPU: A 2×8-core (32-cores w/ hyper-threading), running Debian, 2.60GHz, with 64GB of memory, JDK version 1.7 .0 71, build 14. JRapl: Software-based energy measurement @gustavopinto { …
  25. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling actor actors process

    their own messages there is no side effect @gustavopinto mailbox
  26. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling actor actors process

    their own messages actors exchange, but do not share the same message there is no side effect @gustavopinto mailbox
  27. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling actor actor @gustavopinto

  28. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling actor @gustavopinto

  29. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling @gustavopinto ( Centralized

    ) actor actor
  30. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling @gustavopinto ( Centralized

    ) actor actor mailbox
  31. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling @gustavopinto ( Centralized

    ) actor actor mailbox tn =
  32. Overcoming Parallelism Bottlenecks Work Stealing 1 2 3 4 7

    3 5 1 n n ForkJoin Task ForkJoin Worker 4 9 10 13 6 12 11 8 @gustavopinto
  33. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling @gustavopinto ( Centralized

    ) actor actor mailbox tn =
  34. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling ( Centralized )

    @gustavopinto ( DEcentralized ) actor actor actor actor mailbox tn =
  35. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling ( Centralized )

    @gustavopinto ( DEcentralized ) actor actor actor actor mailbox tn = mailbox .fork()
  36. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling ( Centralized )

    @gustavopinto ( DEcentralized ) actor actor actor actor mailbox tn = mailbox .fork() tn
  37. Overcoming Parallelism Bottlenecks Work Stealing 1 2 3 4 7

    3 5 1 n n ForkJoin Task ForkJoin Worker 4 9 10 13 6 12 11 8 @gustavopinto
  38. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling ( Centralized )

    @gustavopinto ( DEcentralized ) actor actor actor actor mailbox tn = mailbox .fork() tn
  39. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling @gustavopinto

  40. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling @gustavopinto

  41. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling @gustavopinto 3.3x 6.4x

  42. Overcoming Parallelism Bottlenecks Bottleneck #1: Centralized pooling @gustavopinto 3.3x 6.4x

  43. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t2 = first half t3 = second half @gustavopinto
  44. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t1 t2 t3 t2 = first half t3 = second half @gustavopinto
  45. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t1 t2 t3 t2 = first half t3 = second half make global @gustavopinto
  46. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t1 t2 t3 t2 = first half t3 = second half 0 1 2 3 4 5 6 7 @gustavopinto
  47. Overcoming Parallelism Bottlenecks Bottleneck #2: Copy on Fork a b

    c d e f g h t1 = t1 t2 t3 t2 = first half t3 = second half 1 2 5 6 Up to 20% of energy savings! @gustavopinto 0 3 4 7
  48. Overcoming Parallelism Bottlenecks Bottleneck #3: Copy on Join t1 =

    t2 = t3 = t1 + t2 a b c d e f g h @gustavopinto
  49. Overcoming Parallelism Bottlenecks Bottleneck #3: Copy on Join t1 =

    t2 = t3 = t1 + t2 a b c d e f g h t1 t2 t3 @gustavopinto
  50. Overcoming Parallelism Bottlenecks Bottleneck #3: Copy on Join t1 =

    t2 = t3 = t1 + t2 a b c d e f g h t1 t2 t3 0 1 2 3 0 1 2 3 @gustavopinto
  51. Overcoming Parallelism Bottlenecks Bottleneck #3: Copy on Join t1 =

    t2 = t3 = t1 + t2 a b c d e f g h t1 t2 t3 1 2 1 2 Up to 12% of energy savings! @gustavopinto 0 3 0 3
  52. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask b = forks the subtask @gustavopinto
  53. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d b = forks the subtask @gustavopinto a
  54. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d tn new b = forks the subtask @gustavopinto a
  55. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d tn new b = forks the subtask @gustavopinto a
  56. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d tn tn .fork() new b = forks the subtask @gustavopinto a b
  57. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 a b c d tn tn .fork() new many times b = forks the subtask @gustavopinto a b
  58. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 = aaaabbbb… b = forks the subtask @gustavopinto
  59. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 = aaaabbbb… t1 a b c d tn new b = forks the subtask @gustavopinto a
  60. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 = aaaabbbb… t1 a b c d tn new list.add( ) tn b = forks the subtask @gustavopinto a
  61. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask t1 = aaaabbbb… t1 a b c d tn new list.add( ) tn global b = forks the subtask @gustavopinto a
  62. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask b = forks the subtask t1 = aaaabbbb… t1 a b c d tn new list.add( ) tn global after creating the objects.. for task in list: tn .fork() @gustavopinto a b
  63. Overcoming Parallelism Bottlenecks Bottleneck #4: Scattered Data t1 = ababababab

    … a = memory copies for a subtask b = forks the subtask t1 = aaaabbbb… t1 a b c d tn new list.add( ) tn global after creating the objects.. for task in list: tn .fork() 10% of energy savings 3% less cache misses 5% less context switches @gustavopinto a b
  64. Overcoming Parallelism Bottlenecks Automating Bottleneck #2: Copy on Fork @gustavopinto

  65. Overcoming Parallelism Bottlenecks Patching Bottleneck #2: Copy on Fork @gustavopinto

  66. Overcoming Parallelism Bottlenecks Patching Bottleneck #2: Copy on Fork 7/9

    of projects that replied have accepted the PR @gustavopinto
  67. None
  68. Questions? @gustavopinto gpinto@ufpa.br

  69. Understanding and Overcoming Parallelism Bottlenecks in ForkJoin Applications F. Castor

    A. Canino @gustavopinto Y. D. Liu G. Xu