Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Actors! And now? An Implementer's Perspective on High-level Concurrency Models, Debugging Tools, and the Future of Automatic Bug Mitigation

B207c84229c3cc91fa26369bc374d2eb?s=47 Stefan Marr
October 17, 2021

Actors! And now? An Implementer's Perspective on High-level Concurrency Models, Debugging Tools, and the Future of Automatic Bug Mitigation

The actor model is a great tool for various use cases. Though, it's not the
only tool, and sometimes perhaps not even the best. Consequently, developers
started mixing and matching high-level concurrency models based on the problem
at hand, much like other programming abstractions. Though, this comes with
various problems. For instance, we don't usually have debugging tools that help
us to make sense of the resulting system. If we even have a debugger, it may
barely allow us to step through our programs instruction by instruction.

Let's imagine a better world! One were we can follow asynchronous messages,
jump to the next transaction commit, or break on the next fork/join task
created. Though, race conditions remain notoriously difficult to reproduce. One
solutions it to record our program's execution, ideally capturing the bug. Then
we can replay it as often as need to identify the cause of our bug.

The hard bit here is making record & replay practical.
I will explain how our concurrency-model-agnostic approach allows us
to record model interactions trivially for later replay,
and how we minimized its run-time overhead.
In the case of actor applications, we can even make the snapshotting fast
to be able to limit trace sizes.

Having better debugging capabilities is a real productivity boost.
Though, some bugs will always slip through the cracks.
So, what if we could prevent those bugs from causing issues?
Other researchers have shown how to do it, and I'll conclude this talk
with some ideas on how we can utilize the knowledge we have in our
language implementations to make such mitigation approaches fast.

The talk is based on work done in collaboration with
Dominik Aumayr, Carmen Torres Lopez, Elisa Gonzalez Boix, and Hanspeter Mössenböck.

B207c84229c3cc91fa26369bc374d2eb?s=128

Stefan Marr

October 17, 2021
Tweet

Transcript

  1. Actors! And now? An Implementer's Perspec/ve on High-level Concurrency Models,

    Debugging Tools, and the Future of Automa/c Bug Mi/ga/on Stefan Marr 17 October 2021
  2. Got a Ques*on? Feel free to interrupt me! 2

  3. Job Ad We’re Looking for a Postdoc! 3 Project CaMELot:

    Catch and Mitigate Event-Loop Concurrency Issues h3ps://stefan-marr.de/2021/02/open-postdoc- posi=on-on-language-implementa=on-and- concurrency/ Please get in touch!
  4. Outcomes of Project MetaConc and work by 4 C. Torres

    Lopez D. Aumayr E. Gonzalez Boix H. Mössenböck
  5. Actors! What are Actors? • Many different variants • For

    the 50 Years’ Edition: – Which model is good for what? • Suitable problems/applications • Unsuitable problems per model – … 5
  6. Communicating Event Loops 6 Actor Actor

  7. 8-27 apps 3 studies ≈2-20 concurrency issues per app Websites

    in top 500 6 studies ≈1-10 concurrency issues per site Tip of the Iceberg Concurrency Bugs are Common in Event Loop Systems C 6 projects 1 study 35 known event races 53 projects, 57 issues 2 studies 12 projects, 1000 potential issues 12 projects 1 study 53 concurrency issues 7
  8. How to get rid of all these bugs? 8

  9. DEBUGGING ACTORS WITH SUITABLE BREAKPOINTS/STEPPING Perhaps not a way to

    get rid of them all, but at least to make it easier 9
  10. prom := aResult <-: get. prom whenResolved: [:r | r

    println ]. Actor Breakpoints/Stepping 10 Actor A
  11. prom := aResult <-: get. prom whenResolved: [:r | r

    println ]. Actor Breakpoints/Stepping 11 1 Actor A msg send msg receive promise resolver promise resolu=on
  12. prom := aResult <-: get. prom whenResolved: [:r | r

    println ]. Actor Breakpoints/Stepping 12 class Result = ()( public get = ( | result | result := 42. ^ result ) ) 2 Actor A Actor B msg send msg receive promise resolver promise resolu=on
  13. prom := aResult <-: get. prom whenResolved: [:r | r

    println ]. Actor Breakpoints/Stepping 13 class Result = ()( public get = ( | result | result := 42. ^ result ) ) 3 Actor A Actor B msg send msg receive promise resolver promise resolution
  14. prom := aResult <-: get. prom whenResolved: [:r | r

    println ]. Actor Breakpoints/Stepping 14 class Result = ()( public get = ( | result | result := 42. ^ result ) ) 4 Actor A Actor B msg send msg receive promise resolver promise resolu=on
  15. prom := aResult <-: get. prom whenResolved: [:r | r

    println ]. Actor Breakpoints/Stepping 15 class Result = ()( public get = ( | result | result := 42. ^ result ) ) 1 2 Actor A Actor B before async aGer async
  16. prom := aResult <-: get. prom whenResolved: [:r | r

    println ]. Actor Breakpoints/Stepping 16 class Result = ()( public get = ( | result | result := 42. ^ result ) ) 1 Actor A Actor B promise resolver promise resolution
  17. Apgar: A Debugger Made for Actor Programs 17 Carmen’s presentation

    is in about 5.5h here at AGERE
  18. Kómpos Architecture 18 Interpreter Debugger UI Apgar or Kómpos UI

    Kómpos Protocol The “Magic” Bit https://stefan-marr.de/papers/dls-marr-et-al-concurrency-agnostic-protocol-for-debugging/
  19. The Kómpos Debugger 19 Demo: h<ps://stefan-marr.de/2017/10/mulF- paradigm-concurrent-debugging/

  20. Even with be=er debuggers, we’ll s*ll have concurrency bugs in

    our actor systems… 20
  21. Maybe, just maybe! Maybe Actors aren’t the best choice for

    every problem? 21
  22. … Maybe there are no Silver Bullets? CSP Locks, Monitors,

    … Fork/Join Transactional Memory 22 Data Flow Actors
  23. Building an Online Sales-Data Processor 23 {"item": "beer", "price": 5.5,

    "quantity": 344, "customer": "<Prog>", "address": "Pleinlaan 2"} Stream of Sales Events • Track revenue • Report sales revenue over time
  24. Subsystems as Asynchronous AcDviDes 24 Use Actors as Main Abstraction

    Event-Loop Model fits UI and System Paradigms JSON Input Actor DataStore Actor Report Actor {"item": "beer", "price": 5.5,
  25. Parallelize JSON Processing 25 JSON Input Actor JSON fragment channel

    JSON token channel JSON Stream Tokenizer Result channel Data Filter Process Using Communicating Sequential Processes with Channels {"item": "beer", "price": 5.5, • Strict consumer/ producer relationship • Allow for pipeline parallelism
  26. Sales Revenue Over Time based on Large Data Array 26

    Report Actor 1 2 1 1 2 1 2 1 5 3 4 11 7 8 10 1 Construct Sum Tree in parallel Calculate Prefix Sum in parallel Parallel Prefix Sum Calculation with fork/join parallelism
  27. How to build debuggers to support all the Concurrency Models?

    27
  28. Κόμπος: A PLATFORM FOR DEBUGGING COMPLEX CONCURRENT APPLICATIONS 28

  29. The Kómpos Debugger 29 h8ps://stefan-marr.de/papers/dls-marr-et-al-concurrency-agnos9c-protocol-for-debugging/

  30. Kómpos Architecture 30 SOMNS Interpreter Debugger UI Kómpos Protocol JSON

    Web Socket
  31. Kómpos Architecture 31 SOMNS Interpreter Debugger UI Kómpos Protocol JSON

    Web Socket Actors CSP STM F/J Threads … Agnostic of Concurrency Models And we have two UIs! Apgar & Kómpos UI
  32. Kómpos Protocol Metadata 32 EntityType id: typeId name: string ActivityType

    icon: string DynamicScopeType BreakpointType name: string label: string applicableTo: Tag[] SteppingType name: string label: string applicableTo: Tag[] activities: ActivityType[] scopes: DynamicScopeType[] Concurrency semanCcs only known to language
  33. Kómpos Protocol Messages 33 SetBreakpoint location: Coord type: BreakpointType Stopped

    activityId: id location: Coord actType: ActivityType scopes: DynamicScopeType[] DoStep activityId: id type: SteppingType Debugger UI just “lists” available types
  34. A Model-AgnosDc Debugger: Example Channel Breakpoints 34 channel out write:

    42. channel in read Process A Process B 1 2 3 4 “just” source locations and ids! UI doesn’t need to know these concepts!
  35. Debuggers can be Great for High-level Concurrency Models! 35 ?

    ? ? Debugger UI Kómpos Protocol Make tools agnostic prom whenResolved: [:r | r println ]. promise resolver promise resolution Offer the Key Features as Breakpoints/Steps
  36. NON-DETERMINISM MAKES FOR UNHAPPY DEBUGGERS Reproduces only 1 in 10?

    How can I fix such a bug??? 36 F
  37. One Solution: Record & Replay • Record event order •

    Replay reorder to fit 37 A B C C B B C A F Capturing High-level Nondeterminism in Concurrent Programs for Prac9cal Concurrency Model Agnos9c Record & Replay D. Aumayr et al. The Art, Science, and Engineering of Programming, Programming, 2021. Efficient and Determinis9c Record & Replay for Actor Languages D. Aumayr et al. Proceedings of the 15th InternaFonal Conference on Managed Languages and RunFmes, ManLang’18.
  38. How is that going to work agnostic to concurrency models?

    38
  39. Looking at Communicating Event Loops 39 Actor Actor What are

    the Points of Non-determinism? Mailbox Mailbox The Mailboxes! (mailbox read order)
  40. CommunicaDng Event Loops 40 B C A C B Mailbox

    Replay messages in same order as originally
  41. Recording Non-determinism in CommunicaLng Event Loops 41 Actor Actor Mailbox

    What to record? Store to mailbox? Read from mailbox? Sender Receiver
  42. For Communicating Event Loops Sender-side and Receiver-Side Recording are “Functionally

    Equivalent” with complexity and performance trade-offs 42 most interesting bit
  43. Overview for Concurrency Models 43 Model Activities Passive Entities Non-

    determinism Communicating Event Loops Actor Promise, Message Message order per actor Threads & Locks Thread Lock, Condition Order of lock acquisitions Communicating Sequential Processes Process Channel Order of channel reads/writes Software Transactional Memory Transaction - Commit order
  44. Instrumented Operation if (RECORD) { … record( type, ordering) }

    else if (REPLAY) { Event e = poll() … } Model AgnosDc Framework 44 Framework peek poll record Trace file Thread-local buffers Trace parser Event queues per activity per thread Agnostic of Concurrency Models
  45. Allows us to Record&Replay a Multi-Paradigm Application 45 JSON Input

    Actor DataStore Actor Report Actor {"item": "beer", "price": 5.5, Actors CSP in here Fork/Join in here
  46. SOMNS : A NEWSPEAK FOR CONCURRENCY RESEARCH 46 Newspeak: newspeaklanguage.org

    SOMNS : github.com/smarr/SOMns NS
  47. Performance: Baselines • • • • 1 2 3 4

    5 6 7 Java Node.js SOMns Runtime Factor normalized to Java (lower is better) 47 Are We Fast Yet: Cross-Language Comparison https://github.com/smarr/are-we-fast-yet#readme SOMNS is on level of optimized dynamic languages!
  48. Performance: Baselines 48 Savina Actor Benchmark Suite hOps://github.com/shamsimam/savina#readme • •

    • • • • • • • • • • • • • • • • • • 1 2 4 6 Akka Jetlang Scalaz SOMns Akka Jetlang Scalaz SOMns Akka Jetlang Scalaz SOMns Akka Jetlang 0 1 2 3 4 5 Cores Runtime Factor normalized to SOMns (lower is better) • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 4 6 8 Akka Jetlang Scalaz SOMns Akka Jetlang Scalaz SOMns Akka Jetlang Scalaz SOMns Akka Jetlang Scalaz SOMns Akka Jetlang Scalaz SOMns 0 1 2 3 4 5 Cores (lower is better) CompeLLve with JVM actor frameworks!
  49. Overhead of Recording Actors for Replay Overhead on Savina benchmarks

    over execuFon without recording (geometric) • Specialized: 7.89% min. -21.42%, max. 36.29% (specialized to actors, without support for other concurrency models) • Sender-side: 7.82% min. -17.84%, max. 41.23% – Performance is compe==ve with specialized implementa=on • Receiver-side: 13.23% min. -19.33%, max. 53.1% – Not as op=mized as specialized 49
  50. Agnostic Record&Replay is Practical! 50 ? ? ?Keep Framework Agnostic

    Mailbox Store to mailbox? Read from mailbox? Capture Non-determinism Per Concurrency Model Framework peek poll record Trace file Thread-local buffers Trace parser Event queues per activity per thread
  51. LONG AND HUGE TRACES MAKE REPLAY IMPRACTICAL Snapshotting Actor Systems

    without Stopping Them 51
  52. Actor Asynchronous and Partial Heap Snapshots 52 snapshot on message

    receive but only objects reachable from a message
  53. SnapshoNng without Global SynchronizaLon 53 Message Message Time Message Message

    Message Message Message Start Snapshotting
  54. • AUach send phase number to messages • Messages sent

    in Phase n (previous) are captured Detecting Message Crossovers 54 Actor A Actor B Actor C Message Message [n] Message [n] Time Message Message [n] Message [n] Message Message [n] Message [n+1] Start Snapsho_ng Phase n Phase n+1 Snapshot before processing
  55. Detecting Snapshot Completion (2) 55 Msg [n-1] Msg [n-1] Msg

    [n-1] Thread 1 Thread n Actors wai7ng for execu7on (FIFO) Actors with messages from previous phase CompleJon Task Actors in current phase Thread Pool message sends may schedule actors for execuJon Msg [n] Msg [n-1] Msg [n-1]
  56. Detecting Snapshot Completion (3) 56 Actors wai7ng for execu7on (FIFO)

    Actors with messages from current phase CompleJon Task Thread Pool message sends may schedule actors for execution Msg [n] Msg [n] Msg [n] Thread 1 Thread n Msg [n-1]
  57. • Snapshot every second iteration • Worst-case scenario Evaluation -

    Savina 57
  58. • Snapshot every 1000 requests • Latency increases minimally (1,66%

    geo mean) • 20 Million requests total • Slow requests (> 100ms): 5.43% increase (0.007% of total requests) EvaluaLon – AcmeAir Web ApplicaLon 58
  59. Snapshots can be Low-Overhead, Without Stop-the-World Pause 59 Actor

  60. BUG MITIGATION If it fails only 1 in 10 Fmes,

    can we avert failure? 60 F Looking for a PostDoc
  61. Bug Mitigation: Basic Idea 61 A B C Detect Event

    Races At Run Time Order A -> B -> C problema?c? Let’s swap them! F
  62. Actor Messages Usually Access Predictable Parts of the Heap 62

  63. Use ExisLng VM Techniques to Minimize Race DetecLon Overhead 63

    product.setPrice(newPrice) func=on function (for polymorphic methods) Shape A 1: price(money) 2: id(int) 3: parts(array) 4: name(string) Shape B 1: id(int) 2: name(string) 3: price(money)
  64. Actor Restrict Monitoring to Parts that can Race 64 Shape

    B 1: id(int) 2: name(string) 3: price(money) Very Early, but: Heap Access Patterns promising for light-weight, low-precision race-possibility detection
  65. WRAP-UP/CONCLUSION 65

  66. Job Ad We’re Looking for a Postdoc! 66 Project CaMELot:

    Catch and MiLgate Event-Loop Concurrency Issues h3ps://stefan-marr.de/2021/02/open-postdoc- posi=on-on-language-implementa=on-and- concurrency/ Please get in touch!
  67. … Maybe there are no Silver Bullets? CSP Locks, Monitors,

    … Fork/Join TransacUonal Memory 67 Data Flow Actors
  68. Debuggers can be Great for High-level Concurrency Models! 68 Debugger

    UI Kómpos Protocol Make tools agnosCc prom whenResolved: [:r | r println ]. promise resolver promise resolu=on Offer the Key Features as Breakpoints/Steps
  69. Agnostic Record&Replay is Practical! 69 Mailbox Store to mailbox? Read

    from mailbox? Capture Non-determinism Per Concurrency Model Keep Framework AgnosCc Framework peek poll record Trace file Thread-local buffers Trace parser Event queues per activity per thread
  70. Snapshots can be Low-Overhead, Without Stop-the-World Pause 70 Actor

  71. Actor And maybe, we can use it to do race-mitigation!

    71 Shape B 1: id(int) 2: name(string) 3: price(money)
  72. 72 Debugger UI Kómpos Protocol Make tools agnosCc Mailbox Store

    to mailbox? Read from mailbox? Capture Non-determinism Per Concurrency Model Actor And don’t stop the world for snapshoTng! ? ? ?
  73. References • Capturing High-level Nondeterminism in Concurrent Programs for Prac9cal

    Concurrency Model Agnos9c Record & Replay (pdf) D. Aumayr, S. Marr, S. Kaleba, E. Gonzalez Boix, H. Mössenböck, <Programming>, p. 39, AOSA Inc., 2021. doi: 10.22152/programming-journal.org/2021/5/14 • Asynchronous Snapshots of Actor Systems for Latency-Sensi9ve Applica9ons (pdf) D. Aumayr, S. Marr, E. Gonzalez Boix, H. Mössenböck, MPLR'19, p. 157–171, ACM, 2019. doi: 10.1145/3357390.3361019 • Efficient and Determinis9c Record & Replay for Actor Languages (pdf) D. Aumayr, S. Marr, C. Béra, E. Gonzalez Boix, H. Mössenböck, ManLang'18, ACM, 2018. doi: 10.1145/3237009.3237015 • A Concurrency-Agnos9c Protocol for Mul9-Paradigm Concurrent Debugging Tools (pdf) S. Marr, C. Torres Lopez, D. Aumayr, E. Gonzalez Boix, H. Mössenböck, DLS'17, p. 3–14, ACM, 2017. doi: 10.1145/3133841.3133842 • Kómpos: A PlaNorm for Debugging Complex Concurrent Applica9ons (pdf) S. Marr, C. Torres Lopez, D. Aumayr, E. Gonzalez Boix, H. Mössenböck, <Programming Demo’17>, p. 2:1–2:2, ACM, 2017. Demo. doi: 10.1145/3079368.3079378 • A Study of Concurrency Bugs and Advanced Development Support for Actor-based Programs (pdf) C. Torres Lopez, S. Marr, H. Mössenböck, E. Gonzalez Boix, AGERE!'16 (LNCS), p. 155–185, Springer, 2018. doi: 10.1007/978-3-030-00302-9_6 • Towards Advanced Debugging Support for Actor Languages: Studying Concurrency Bugs in Actor- based Programs (pdf) C. Torres Lopez, S. Marr, H. Mössenböck, E. Gonzalez Boix, AGERE! '16, 2016. 73