Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Actors! And now? An Implementer's Perspective on High-level Concurrency Models, Debugging Tools, and the Future of Automatic Bug Mitigation

Stefan Marr
October 17, 2021

Actors! And now? An Implementer's Perspective on High-level Concurrency Models, Debugging Tools, and the Future of Automatic Bug Mitigation

The actor model is a great tool for various use cases. Though, it's not the
only tool, and sometimes perhaps not even the best. Consequently, developers
started mixing and matching high-level concurrency models based on the problem
at hand, much like other programming abstractions. Though, this comes with
various problems. For instance, we don't usually have debugging tools that help
us to make sense of the resulting system. If we even have a debugger, it may
barely allow us to step through our programs instruction by instruction.

Let's imagine a better world! One were we can follow asynchronous messages,
jump to the next transaction commit, or break on the next fork/join task
created. Though, race conditions remain notoriously difficult to reproduce. One
solutions it to record our program's execution, ideally capturing the bug. Then
we can replay it as often as need to identify the cause of our bug.

The hard bit here is making record & replay practical.
I will explain how our concurrency-model-agnostic approach allows us
to record model interactions trivially for later replay,
and how we minimized its run-time overhead.
In the case of actor applications, we can even make the snapshotting fast
to be able to limit trace sizes.

Having better debugging capabilities is a real productivity boost.
Though, some bugs will always slip through the cracks.
So, what if we could prevent those bugs from causing issues?
Other researchers have shown how to do it, and I'll conclude this talk
with some ideas on how we can utilize the knowledge we have in our
language implementations to make such mitigation approaches fast.

The talk is based on work done in collaboration with
Dominik Aumayr, Carmen Torres Lopez, Elisa Gonzalez Boix, and Hanspeter Mössenböck.

Stefan Marr

October 17, 2021
Tweet

More Decks by Stefan Marr

Other Decks in Research

Transcript

  1. Actors! And now?
    An Implementer's Perspec/ve on
    High-level Concurrency Models,
    Debugging Tools,
    and the Future of Automa/c Bug Mi/ga/on
    Stefan Marr
    17 October 2021

    View full-size slide

  2. Got a Ques*on?
    Feel free to interrupt me!
    2

    View full-size slide

  3. Job Ad
    We’re Looking for a Postdoc!
    3
    Project CaMELot: Catch and Mitigate
    Event-Loop Concurrency Issues
    h3ps://stefan-marr.de/2021/02/open-postdoc-
    posi=on-on-language-implementa=on-and-
    concurrency/
    Please get
    in touch!

    View full-size slide

  4. Outcomes of Project MetaConc
    and work by
    4
    C. Torres Lopez D. Aumayr
    E. Gonzalez Boix H. Mössenböck

    View full-size slide

  5. Actors! What are Actors?
    • Many different variants
    • For the 50 Years’ Edition:
    – Which model is good for what?
    • Suitable problems/applications
    • Unsuitable problems per model
    – …
    5

    View full-size slide

  6. Communicating Event Loops
    6
    Actor
    Actor

    View full-size slide

  7. 8-27 apps 3 studies
    ≈2-20 concurrency issues per app
    Websites in top 500 6 studies
    ≈1-10 concurrency issues per site
    Tip of the
    Iceberg
    Concurrency Bugs are Common in
    Event Loop Systems
    C 6 projects 1 study
    35 known event races
    53 projects, 57 issues 2 studies
    12 projects, 1000 potential issues
    12 projects 1 study
    53 concurrency issues
    7

    View full-size slide

  8. How to get rid
    of all these bugs?
    8

    View full-size slide

  9. DEBUGGING ACTORS WITH
    SUITABLE BREAKPOINTS/STEPPING
    Perhaps not a way to get rid of them all, but at least to make it easier
    9

    View full-size slide

  10. prom := aResult <-: get.
    prom whenResolved: [:r |
    r println
    ].
    Actor Breakpoints/Stepping
    10
    Actor A

    View full-size slide

  11. prom := aResult <-: get.
    prom whenResolved: [:r |
    r println
    ].
    Actor Breakpoints/Stepping
    11
    1
    Actor A
    msg send
    msg receive
    promise resolver
    promise resolu=on

    View full-size slide

  12. prom := aResult <-: get.
    prom whenResolved: [:r |
    r println
    ].
    Actor Breakpoints/Stepping
    12
    class Result = ()(
    public get = (
    | result |
    result := 42.
    ^ result
    )
    )
    2
    Actor A Actor B
    msg send
    msg receive
    promise resolver
    promise resolu=on

    View full-size slide

  13. prom := aResult <-: get.
    prom whenResolved: [:r |
    r println
    ].
    Actor Breakpoints/Stepping
    13
    class Result = ()(
    public get = (
    | result |
    result := 42.
    ^ result
    )
    )
    3
    Actor A Actor B
    msg send
    msg receive
    promise resolver
    promise resolution

    View full-size slide

  14. prom := aResult <-: get.
    prom whenResolved: [:r |
    r println
    ].
    Actor Breakpoints/Stepping
    14
    class Result = ()(
    public get = (
    | result |
    result := 42.
    ^ result
    )
    )
    4
    Actor A Actor B
    msg send
    msg receive
    promise resolver
    promise resolu=on

    View full-size slide

  15. prom := aResult <-: get.
    prom whenResolved: [:r |
    r println
    ].
    Actor Breakpoints/Stepping
    15
    class Result = ()(
    public get = (
    | result |
    result := 42.
    ^ result
    )
    )
    1
    2
    Actor A Actor B
    before async
    aGer async

    View full-size slide

  16. prom := aResult <-: get.
    prom whenResolved: [:r |
    r println
    ].
    Actor Breakpoints/Stepping
    16
    class Result = ()(
    public get = (
    | result |
    result := 42.
    ^ result
    )
    )
    1
    Actor A Actor B
    promise resolver
    promise resolution

    View full-size slide

  17. Apgar: A Debugger Made for Actor Programs
    17
    Carmen’s presentation is in
    about 5.5h here at AGERE

    View full-size slide

  18. Kómpos Architecture
    18
    Interpreter
    Debugger
    UI
    Apgar or
    Kómpos UI
    Kómpos Protocol
    The “Magic” Bit
    https://stefan-marr.de/papers/dls-marr-et-al-concurrency-agnostic-protocol-for-debugging/

    View full-size slide

  19. The Kómpos Debugger
    19
    Demo:
    hparadigm-concurrent-debugging/

    View full-size slide

  20. Even with be=er debuggers,
    we’ll s*ll have concurrency bugs
    in our actor systems…
    20

    View full-size slide

  21. Maybe, just maybe!
    Maybe Actors aren’t the best
    choice for every problem?
    21

    View full-size slide


  22. Maybe there are no Silver Bullets?
    CSP
    Locks, Monitors, …
    Fork/Join
    Transactional Memory
    22
    Data Flow
    Actors

    View full-size slide

  23. Building an
    Online Sales-Data Processor
    23
    {"item": "beer",
    "price": 5.5,
    "quantity": 344,
    "customer": "",
    "address": "Pleinlaan 2"}
    Stream of Sales Events
    • Track revenue
    • Report sales revenue
    over time

    View full-size slide

  24. Subsystems as Asynchronous AcDviDes
    24
    Use Actors as Main Abstraction
    Event-Loop Model fits UI and System Paradigms
    JSON Input
    Actor
    DataStore
    Actor
    Report
    Actor
    {"item":
    "beer",
    "price":
    5.5,

    View full-size slide

  25. Parallelize JSON Processing
    25
    JSON Input
    Actor
    JSON fragment
    channel
    JSON token
    channel
    JSON Stream
    Tokenizer
    Result
    channel
    Data Filter
    Process
    Using Communicating Sequential Processes
    with Channels
    {"item":
    "beer",
    "price":
    5.5,
    • Strict consumer/
    producer
    relationship
    • Allow for
    pipeline
    parallelism

    View full-size slide

  26. Sales Revenue Over Time
    based on Large Data Array
    26
    Report
    Actor
    1
    2 1 1
    2 1 2
    1
    5
    3 4 11
    7 8 10
    1
    Construct Sum Tree
    in parallel
    Calculate Prefix Sum
    in parallel
    Parallel Prefix Sum Calculation
    with fork/join parallelism

    View full-size slide

  27. How to build debuggers to
    support all the Concurrency
    Models?
    27

    View full-size slide

  28. Κόμπος: A PLATFORM FOR DEBUGGING
    COMPLEX CONCURRENT APPLICATIONS
    28

    View full-size slide

  29. The Kómpos Debugger
    29
    h8ps://stefan-marr.de/papers/dls-marr-et-al-concurrency-agnos9c-protocol-for-debugging/

    View full-size slide

  30. Kómpos Architecture
    30
    SOMNS
    Interpreter
    Debugger
    UI
    Kómpos Protocol
    JSON Web Socket

    View full-size slide

  31. Kómpos Architecture
    31
    SOMNS
    Interpreter
    Debugger
    UI
    Kómpos Protocol
    JSON Web Socket
    Actors
    CSP
    STM
    F/J
    Threads

    Agnostic of
    Concurrency
    Models
    And we have
    two UIs! Apgar
    & Kómpos UI

    View full-size slide

  32. Kómpos Protocol Metadata
    32
    EntityType
    id: typeId
    name: string
    ActivityType
    icon: string
    DynamicScopeType
    BreakpointType
    name: string
    label: string
    applicableTo: Tag[]
    SteppingType
    name: string
    label: string
    applicableTo: Tag[]
    activities: ActivityType[]
    scopes: DynamicScopeType[]
    Concurrency
    semanCcs only
    known to
    language

    View full-size slide

  33. Kómpos Protocol Messages
    33
    SetBreakpoint
    location: Coord
    type: BreakpointType
    Stopped
    activityId: id
    location: Coord
    actType: ActivityType
    scopes: DynamicScopeType[]
    DoStep
    activityId: id
    type: SteppingType
    Debugger UI just
    “lists” available
    types

    View full-size slide

  34. A Model-AgnosDc Debugger:
    Example Channel Breakpoints
    34
    channel out
    write: 42.
    channel in
    read
    Process A Process B
    1
    2
    3
    4
    “just” source locations and ids!
    UI doesn’t need to know these
    concepts!

    View full-size slide

  35. Debuggers can be Great for High-level
    Concurrency Models!
    35
    ? ?
    ?
    Debugger
    UI
    Kómpos Protocol
    Make tools agnostic
    prom whenResolved: [:r |
    r println ].
    promise resolver
    promise resolution
    Offer the Key Features
    as Breakpoints/Steps

    View full-size slide

  36. NON-DETERMINISM MAKES FOR
    UNHAPPY DEBUGGERS
    Reproduces only 1 in 10? How can I fix such a bug???
    36
    F

    View full-size slide

  37. One Solution: Record & Replay
    • Record
    event order
    • Replay
    reorder to fit
    37
    A
    B
    C
    C
    B
    B
    C
    A
    F
    Capturing High-level Nondeterminism in Concurrent Programs for Prac9cal Concurrency
    Model Agnos9c Record & Replay D. Aumayr et al. The Art, Science, and Engineering of
    Programming, Programming, 2021.
    Efficient and Determinis9c Record & Replay for Actor Languages D. Aumayr et al.
    Proceedings of the 15th InternaFonal Conference on Managed Languages and
    RunFmes, ManLang’18.

    View full-size slide

  38. How is that going to work
    agnostic to concurrency models?
    38

    View full-size slide

  39. Looking at
    Communicating Event Loops
    39
    Actor
    Actor
    What are the Points of
    Non-determinism?
    Mailbox
    Mailbox
    The Mailboxes!
    (mailbox read order)

    View full-size slide

  40. CommunicaDng Event Loops
    40
    B
    C
    A
    C
    B
    Mailbox
    Replay messages in same
    order as originally

    View full-size slide

  41. Recording Non-determinism in
    CommunicaLng Event Loops
    41
    Actor
    Actor
    Mailbox
    What to record?
    Store to
    mailbox?
    Read from
    mailbox?
    Sender Receiver

    View full-size slide

  42. For Communicating Event Loops
    Sender-side and Receiver-Side
    Recording are
    “Functionally Equivalent”
    with complexity
    and performance trade-offs
    42
    most interesting bit

    View full-size slide

  43. Overview for Concurrency Models
    43
    Model Activities Passive Entities Non-
    determinism
    Communicating
    Event Loops
    Actor Promise,
    Message
    Message order
    per actor
    Threads & Locks Thread Lock, Condition Order of lock
    acquisitions
    Communicating
    Sequential
    Processes
    Process Channel Order of
    channel
    reads/writes
    Software
    Transactional
    Memory
    Transaction - Commit order

    View full-size slide

  44. Instrumented Operation
    if (RECORD) {

    record(
    type, ordering)
    } else if (REPLAY) {
    Event e = poll()

    }
    Model AgnosDc Framework
    44
    Framework
    peek
    poll
    record
    Trace
    file
    Thread-local buffers
    Trace
    parser
    Event queues
    per activity
    per thread
    Agnostic of
    Concurrency Models

    View full-size slide

  45. Allows us to Record&Replay
    a Multi-Paradigm Application
    45
    JSON Input
    Actor
    DataStore
    Actor
    Report
    Actor
    {"item":
    "beer",
    "price":
    5.5,
    Actors
    CSP in here Fork/Join in
    here

    View full-size slide

  46. SOMNS
    : A NEWSPEAK FOR
    CONCURRENCY RESEARCH
    46
    Newspeak: newspeaklanguage.org
    SOMNS
    : github.com/smarr/SOMns
    NS

    View full-size slide

  47. Performance: Baselines




    1
    2
    3
    4
    5
    6
    7
    Java
    Node.js
    SOMns
    Runtime Factor
    normalized to Java (lower is better)
    47
    Are We Fast Yet: Cross-Language Comparison
    https://github.com/smarr/are-we-fast-yet#readme
    SOMNS
    is on level of
    optimized dynamic
    languages!

    View full-size slide

  48. Performance: Baselines
    48
    Savina Actor Benchmark Suite
    hOps://github.com/shamsimam/savina#readme










    ● ●








    1 2 4 6
    Akka
    Jetlang
    Scalaz
    SOMns
    Akka
    Jetlang
    Scalaz
    SOMns
    Akka
    Jetlang
    Scalaz
    SOMns
    Akka
    Jetlang
    0
    1
    2
    3
    4
    5
    Cores
    Runtime Factor
    normalized to SOMns
    (lower is better)










    ● ●


















    1 2 4 6 8
    Akka
    Jetlang
    Scalaz
    SOMns
    Akka
    Jetlang
    Scalaz
    SOMns
    Akka
    Jetlang
    Scalaz
    SOMns
    Akka
    Jetlang
    Scalaz
    SOMns
    Akka
    Jetlang
    Scalaz
    SOMns
    0
    1
    2
    3
    4
    5
    Cores
    (lower is better)
    CompeLLve
    with JVM actor
    frameworks!

    View full-size slide

  49. Overhead of Recording Actors for Replay
    Overhead on Savina benchmarks
    over execuFon without recording (geometric)
    • Specialized: 7.89%
    min. -21.42%, max. 36.29%
    (specialized to actors,
    without support for
    other concurrency models)
    • Sender-side: 7.82%
    min. -17.84%, max. 41.23%
    – Performance is compe==ve with specialized
    implementa=on
    • Receiver-side: 13.23%
    min. -19.33%, max. 53.1%
    – Not as op=mized as specialized
    49

    View full-size slide

  50. Agnostic Record&Replay is Practical!
    50
    ? ?
    ?Keep Framework
    Agnostic
    Mailbox
    Store to
    mailbox?
    Read from
    mailbox?
    Capture Non-determinism
    Per Concurrency Model
    Framework
    peek
    poll
    record Trace
    file
    Thread-local
    buffers
    Trace
    parser
    Event queues
    per activity
    per thread

    View full-size slide

  51. LONG AND HUGE TRACES MAKE
    REPLAY IMPRACTICAL
    Snapshotting Actor Systems without Stopping Them
    51

    View full-size slide

  52. Actor
    Asynchronous and Partial
    Heap Snapshots
    52
    snapshot on message receive
    but only objects reachable from a message

    View full-size slide

  53. SnapshoNng without Global
    SynchronizaLon
    53
    Message Message
    Time
    Message Message Message
    Message Message
    Start Snapshotting

    View full-size slide

  54. • AUach send phase number to messages
    • Messages sent in Phase n (previous) are
    captured
    Detecting Message Crossovers
    54
    Actor A
    Actor B
    Actor C Message Message [n] Message [n]
    Time
    Message Message [n] Message [n]
    Message Message [n] Message [n+1]
    Start
    Snapsho_ng
    Phase n Phase n+1
    Snapshot before
    processing

    View full-size slide

  55. Detecting Snapshot Completion (2)
    55
    Msg
    [n-1]
    Msg
    [n-1]
    Msg
    [n-1]
    Thread 1 Thread n
    Actors wai7ng for execu7on (FIFO)
    Actors with messages from
    previous phase
    CompleJon
    Task
    Actors in current phase
    Thread Pool
    message sends may schedule
    actors for execuJon
    Msg
    [n]
    Msg
    [n-1]
    Msg
    [n-1]

    View full-size slide

  56. Detecting Snapshot Completion (3)
    56
    Actors wai7ng for execu7on (FIFO)
    Actors with messages from current
    phase
    CompleJon
    Task
    Thread Pool
    message sends may schedule
    actors for execution
    Msg
    [n]
    Msg
    [n]
    Msg
    [n]
    Thread 1 Thread n
    Msg
    [n-1]

    View full-size slide

  57. • Snapshot
    every
    second
    iteration
    • Worst-case
    scenario
    Evaluation - Savina
    57

    View full-size slide

  58. • Snapshot every 1000
    requests
    • Latency increases
    minimally
    (1,66% geo mean)
    • 20 Million requests total
    • Slow requests (> 100ms):
    5.43% increase (0.007%
    of total requests)
    EvaluaLon – AcmeAir Web ApplicaLon
    58

    View full-size slide

  59. Snapshots can be Low-Overhead,
    Without Stop-the-World Pause
    59
    Actor

    View full-size slide

  60. BUG MITIGATION
    If it fails only 1 in 10 Fmes, can we avert failure?
    60
    F
    Looking for
    a PostDoc

    View full-size slide

  61. Bug Mitigation: Basic Idea
    61
    A
    B
    C
    Detect Event Races At Run Time
    Order A -> B -> C problema?c?
    Let’s swap them!
    F

    View full-size slide

  62. Actor
    Messages Usually Access
    Predictable Parts of the Heap
    62

    View full-size slide

  63. Use ExisLng VM Techniques to
    Minimize Race DetecLon Overhead
    63
    product.setPrice(newPrice)
    func=on
    function
    (for polymorphic
    methods)
    Shape A
    1: price(money)
    2: id(int)
    3: parts(array)
    4: name(string)
    Shape B
    1: id(int)
    2: name(string)
    3: price(money)

    View full-size slide

  64. Actor
    Restrict Monitoring to Parts
    that can Race
    64
    Shape B
    1: id(int)
    2: name(string)
    3: price(money)
    Very Early, but:
    Heap Access Patterns promising for
    light-weight, low-precision
    race-possibility detection

    View full-size slide

  65. WRAP-UP/CONCLUSION
    65

    View full-size slide

  66. Job Ad
    We’re Looking for a Postdoc!
    66
    Project CaMELot: Catch and MiLgate
    Event-Loop Concurrency Issues
    h3ps://stefan-marr.de/2021/02/open-postdoc-
    posi=on-on-language-implementa=on-and-
    concurrency/
    Please get
    in touch!

    View full-size slide


  67. Maybe there are no Silver Bullets?
    CSP
    Locks, Monitors, …
    Fork/Join
    TransacUonal Memory
    67
    Data Flow
    Actors

    View full-size slide

  68. Debuggers can be Great for High-level
    Concurrency Models!
    68
    Debugger
    UI
    Kómpos Protocol
    Make tools agnosCc
    prom whenResolved: [:r |
    r println ].
    promise resolver
    promise resolu=on
    Offer the Key Features
    as Breakpoints/Steps

    View full-size slide

  69. Agnostic Record&Replay is Practical!
    69
    Mailbox
    Store to
    mailbox?
    Read from
    mailbox?
    Capture Non-determinism
    Per Concurrency Model
    Keep Framework
    AgnosCc
    Framework
    peek
    poll
    record Trace
    file
    Thread-local
    buffers
    Trace
    parser
    Event queues
    per activity
    per thread

    View full-size slide

  70. Snapshots can be Low-Overhead,
    Without Stop-the-World Pause
    70
    Actor

    View full-size slide

  71. Actor
    And maybe,
    we can use it to do race-mitigation!
    71
    Shape B
    1: id(int)
    2: name(string)
    3: price(money)

    View full-size slide

  72. 72
    Debugger
    UI
    Kómpos Protocol
    Make tools agnosCc
    Mailbox
    Store to
    mailbox?
    Read from
    mailbox?
    Capture Non-determinism
    Per Concurrency Model
    Actor
    And don’t stop the world
    for snapshoTng!
    ? ?
    ?

    View full-size slide

  73. References
    • Capturing High-level Nondeterminism in Concurrent Programs for Prac9cal Concurrency Model
    Agnos9c Record & Replay (pdf)
    D. Aumayr, S. Marr, S. Kaleba, E. Gonzalez Boix, H. Mössenböck, , p. 39, AOSA Inc.,
    2021. doi: 10.22152/programming-journal.org/2021/5/14
    • Asynchronous Snapshots of Actor Systems for Latency-Sensi9ve Applica9ons (pdf)
    D. Aumayr, S. Marr, E. Gonzalez Boix, H. Mössenböck, MPLR'19, p. 157–171, ACM, 2019.
    doi: 10.1145/3357390.3361019
    • Efficient and Determinis9c Record & Replay for Actor Languages (pdf)
    D. Aumayr, S. Marr, C. Béra, E. Gonzalez Boix, H. Mössenböck, ManLang'18, ACM, 2018.
    doi: 10.1145/3237009.3237015
    • A Concurrency-Agnos9c Protocol for Mul9-Paradigm Concurrent Debugging Tools (pdf)
    S. Marr, C. Torres Lopez, D. Aumayr, E. Gonzalez Boix, H. Mössenböck, DLS'17, p. 3–14, ACM, 2017.
    doi: 10.1145/3133841.3133842
    • Kómpos: A PlaNorm for Debugging Complex Concurrent Applica9ons (pdf)
    S. Marr, C. Torres Lopez, D. Aumayr, E. Gonzalez Boix, H. Mössenböck, , p.
    2:1–2:2, ACM, 2017. Demo. doi: 10.1145/3079368.3079378
    • A Study of Concurrency Bugs and Advanced Development Support for Actor-based Programs (pdf)
    C. Torres Lopez, S. Marr, H. Mössenböck, E. Gonzalez Boix, AGERE!'16 (LNCS), p. 155–185, Springer,
    2018. doi: 10.1007/978-3-030-00302-9_6
    • Towards Advanced Debugging Support for Actor Languages: Studying Concurrency Bugs in Actor-
    based Programs (pdf)
    C. Torres Lopez, S. Marr, H. Mössenböck, E. Gonzalez Boix, AGERE! '16, 2016.
    73

    View full-size slide