Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Designing for Concurrency

Designing for Concurrency

A high-level overview of ways to abstract over concurrency in software applications and services.

Susan Potter

August 23, 2010
Tweet

More Decks by Susan Potter

Other Decks in Programming

Transcript

  1. Types of Clients Hedge firms (e.g. Stark, CIG, CS) Investment

    banks (e.g. BofA) Trading technology (SaaS/ASP) firms
  2. Concurrent Applications market data trading systems (front office) risk management

    (middle office) accounts/party service (back office)
  3. Traditional Approaches Task-based thread [pools] (e.g. database connections, server sockets)

    Hand coded locks (to access shared memory "safely") Data: sharding or replication (to increase throughput on data access)
  4. Less Traditional Approaches Actor-based processes (e.g. message passing like in

    Erlang) Software Transactional Memory (STM) (consistent and safe way to access shared state) Data: decentralized datastores (run map/reduce queries on many nodes at once)
  5. Task-based vs Actor-based task threads access shared state in objects

    task threads compete for locks on objects synchronous operations within task thread limited task scheduling (e. g. wait, notify) mailboxes buffer incoming messages actors do not share state, thus not competing for locks messages sent asynchronously actors react to messages sent to them
  6. When might actors be better? complexity of the task-based model

    becomes bottleneck (debugging race conditions, deadlocks, livelocks, starvation). Depends on your use case. system is event-driven conceptually. Easier to translate to high level abstraction in actor-based models.
  7. Locks vs STM Flexibility: fine vs coarse grained choice Pessimistic

    locking Locking semantic need to be hand coded Composable operations are not well supported Analogous to database transaction recording each txn as log entry Optimistic reading Atomic transaction Supports composable operations
  8. When to use STM? Using more cores/processors (STM=performance++) on larger

    numbers of cores/processors (~>=4) Hand coding and debugging locking semantics for application becomes your bottleneck to prevent deadlocks and livelocks Priority inversion often hinders performance BUT YOU CAN'T use STM when operation on shared state cannot be undone. Must be undoable!
  9. Replication vs Decentralized Can improve throughput Some flexibility: replication strategies

    for a few use cases Requires full replica(s) of data set on each node Improve throughput, performance of complex queries using map/reduce Flexibility to optimize two of three: Consistency, Availability, Partition tolerance (CAP Theorem) Does not require full replica(s) of data set
  10. When to use decentralized data? Large data set you want

    distribute without creating/managing your own sharding scheme Want to optimize two of CAP Run distributed map/reduce complex queries BUT datastore should satisfy your other needs first. Usually key-value/bucket lookup, not RDBMS!
  11. Other Approaches...(not in production) Compiler parallel optimizations e.g. Haskell sparks

    Persistent data structures to aid concurrency throughput by better API design
  12. General Tips Use SLA metrics/measures to optimize relevant parts of

    your concurrent system judiciously Ensure your applications fit use case(s) for approach Test your hypothesis by benchmarking NEVER assume your changes have made the impact you expect. There is no silver bullet: think, implement and test!