Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Coherence - Architecture key notes

Coherence - Architecture key notes


May 30, 2012

More Decks by aragozin

Other Decks in Technology


  1. Presentation overview Topics • Network model • Threading model •

    Cache operations scalability • Few well known pit falls * Usage of partitioned cache is assumed unless stated otherwise
  2. Network overview Inter member transport in Coherence cluster • Message

    based protocol TCMP • In-order guaranteed delivery between members • NACKs for low latency communications • Can work over UDP, TCP, SSL (+ Oracle ExoBus) • Limited use of multicast (deprecated) Cluster housekeeping protocol • TCP Ring – fast way to detect killed processes • Witness protocol for disconnecting members • Communication pausing
  3. Network overview Coherence*Extend • TCP or SSL transport • Each

    remote service have one connection at time  Single TCP is shared between all service users  Number of connection can be increase used multiple services • Request pass-through  if Extend connection and cluster are using same serialization
  4. Network pipeline API Cache service Packet publisher Packet speaker OS

    Packet listener Packet receiver OS Service thread Worker thread Packet receiver Packet publisher Packet speaker Packet listener Packet receiver Service thread Cache service Packet publisher Packet speaker Packet listener API Service thread Packet receiver Packet listener OS OS Packet speaker Packet publisher Service thread Worker thread Serialization Deserialization Client thread Simple cache GET request
  5. Network pipeline • Heavy thread usage  More cores is

    better  Starving on CPU – more context switches • Network IO is effectively single threaded  Multiple nodes per server may be required to utilize network • Each service has single control thread
  6. Data distribution #3 #0 #4 #1 #2 #5 Cache A

    Cache B Member 1 Member 2 Backing map Backing map Backing map Backing map
  7. Data distribution  Same partition distribution is used for all

    caches in same service • can be exploited for collocating data in caches  Balancing by partition count  Single backing map per cache • per node (default) • per partition (can be configured)  Partition backup is stored in separate backing map
  8. Threading overview  Control thread – one per service •

    receive network messages • perform cache operations in no thread pool configured  Thread pool – optional, size is configured • desterilize data in request (if needed) • perform operation (aggregators, processor, backing map access) • serialize result data (if needed)  Event thread – one per service • call map listeners
  9. Locking and job distribution • Update operations require partition locks

    • Reads including aggregators – lock free • “read dirty” – cross operation visibility • Updates are atomic per job • Jobs – (only if thread pool enabled)  Key set based request are split – job per partition  Filter based request – one multiple partition job  Calculate key set, lock partitions, execute job
  10. Problems of threading model • Event delivery is single threaded

    • Dispatching of large request may block control thread, making service unresponsive • No discrimination between tasks  Few long running task may saturate thread pool, making cache unresponsive • Limited scheduling priorities • Key based requests are producing more jobs, occupying more threads  Single large getAll() request for DB backed cache may saturate all thread pools on all nodes for considerable time
  11. Interaction with backing map  Backing map notifies content changes

    by events • events received by thread, execution write operation, added to transaction change set (change sets are replicated atomically) • events received out of bound, replicated asynchronously • backup partition copy is passive
  12. Operation scalability • Key based operations – linearly scalable 

    growing cluster – linearly increase operation throughput • Indexed queries / aggregation  growing cluster – marginally contributes to throughput  more data – marginal decrease of throughput • Non indexed queries / aggregation  throughput proportional data / cluster core count
  13. Well known pitfalls  Relying on reference walking • Problem

    - network latency accumulation • Hierarchical organization – is typical example  Solutions • Denormalization • Data affinity • Indexes
  14. Well known pitfalls  Too fine grained operations • accumulating

    network latency  Too bulky operations • blocking control thread for long • Saturation thread pools  Solutions • Grouping operations in limited size batches • Grouping operations per member • Grouping operations per partitions
  15. Well known pitfalls  Abusing grid-side (inplace) processing • CPU

    on storage nodes is limited resources • grid-side processing may require more total serialization efforts  Solution, account all factors choosing • As is data retrieval requires no marshaling on grid side • Network bandwidth is rarely a limitation • Grid nodes CPUs are shared and limited resource