$30 off During Our Annual Pro Sale. View Details »

Coherence - Architecture key notes

Coherence - Architecture key notes

aragozin

May 30, 2012
Tweet

More Decks by aragozin

Other Decks in Technology

Transcript

  1. Oracle Coherence
    Architecture key notes
    Alexey Ragozin
    [email protected]
    May 2012

    View Slide

  2. Presentation overview
    Topics
    • Network model
    • Threading model
    • Cache operations scalability
    • Few well known pit falls
    * Usage of partitioned cache is assumed unless stated otherwise

    View Slide

  3. Network overview
    Inter member transport in Coherence cluster
    • Message based protocol TCMP
    • In-order guaranteed delivery between members
    • NACKs for low latency communications
    • Can work over UDP, TCP, SSL (+ Oracle ExoBus)
    • Limited use of multicast (deprecated)
    Cluster housekeeping protocol
    • TCP Ring – fast way to detect killed processes
    • Witness protocol for disconnecting members
    • Communication pausing

    View Slide

  4. Network overview
    Coherence*Extend
    • TCP or SSL transport
    • Each remote service have one connection at time
     Single TCP is shared between all service users
     Number of connection can be increase used multiple services
    • Request pass-through
     if Extend connection and cluster are using same serialization

    View Slide

  5. Network pipeline
    API
    Cache
    service
    Packet
    publisher
    Packet
    speaker
    OS
    Packet
    listener
    Packet
    receiver
    OS
    Service
    thread
    Worker
    thread
    Packet
    receiver
    Packet
    publisher
    Packet
    speaker
    Packet
    listener
    Packet
    receiver
    Service
    thread
    Cache
    service
    Packet
    publisher
    Packet
    speaker
    Packet
    listener
    API
    Service
    thread
    Packet
    receiver
    Packet
    listener
    OS OS
    Packet
    speaker
    Packet
    publisher
    Service
    thread
    Worker
    thread
    Serialization
    Deserialization
    Client thread
    Simple cache GET request

    View Slide

  6. Network pipeline
    • Heavy thread usage
     More cores is better
     Starving on CPU – more context switches
    • Network IO is effectively single threaded
     Multiple nodes per server may be required to
    utilize network
    • Each service has single control thread

    View Slide

  7. Data distribution
    #3
    #0 #4 #1 #2 #5
    Cache A
    Cache B
    Member 1 Member 2
    Backing map
    Backing map
    Backing map
    Backing map

    View Slide

  8. Data distribution
     Same partition distribution is used for all caches in
    same service
    • can be exploited for collocating data in caches
     Balancing by partition count
     Single backing map per cache
    • per node (default)
    • per partition (can be configured)
     Partition backup is stored in separate backing map

    View Slide

  9. Threading overview
     Control thread – one per service
    • receive network messages
    • perform cache operations in no thread pool configured
     Thread pool – optional, size is configured
    • desterilize data in request (if needed)
    • perform operation (aggregators, processor, backing map
    access)
    • serialize result data (if needed)
     Event thread – one per service
    • call map listeners

    View Slide

  10. Locking and job distribution
    • Update operations require partition locks
    • Reads including aggregators – lock free
    • “read dirty” – cross operation visibility
    • Updates are atomic per job
    • Jobs – (only if thread pool enabled)
     Key set based request are split – job per partition
     Filter based request – one multiple partition job
     Calculate key set, lock partitions, execute job

    View Slide

  11. Problems of threading model
    • Event delivery is single threaded
    • Dispatching of large request may block control
    thread, making service unresponsive
    • No discrimination between tasks
     Few long running task may saturate thread pool, making cache
    unresponsive
    • Limited scheduling priorities
    • Key based requests are producing more jobs,
    occupying more threads
     Single large getAll() request for DB backed cache may saturate all
    thread pools on all nodes for considerable time

    View Slide

  12. Interaction with backing map
     Backing map notifies content changes by
    events
    • events received by thread, execution write
    operation, added to transaction change set
    (change sets are replicated atomically)
    • events received out of bound, replicated
    asynchronously
    • backup partition copy is passive

    View Slide

  13. Operation scalability
    • Key based operations – linearly scalable
     growing cluster – linearly increase operation throughput
    • Indexed queries / aggregation
     growing cluster – marginally contributes to throughput
     more data – marginal decrease of throughput
    • Non indexed queries / aggregation
     throughput proportional data / cluster core count

    View Slide

  14. Well known pitfalls
     Relying on reference walking
    • Problem - network latency accumulation
    • Hierarchical organization – is typical example
     Solutions
    • Denormalization
    • Data affinity
    • Indexes

    View Slide

  15. Well known pitfalls
     Too fine grained operations
    • accumulating network latency
     Too bulky operations
    • blocking control thread for long
    • Saturation thread pools
     Solutions
    • Grouping operations in limited size batches
    • Grouping operations per member
    • Grouping operations per partitions

    View Slide

  16. Well known pitfalls
     Abusing grid-side (inplace) processing
    • CPU on storage nodes is limited resources
    • grid-side processing may require more total serialization
    efforts
     Solution, account all factors choosing
    • As is data retrieval requires no marshaling on grid side
    • Network bandwidth is rarely a limitation
    • Grid nodes CPUs are shared and limited resource

    View Slide

  17. Thank you
    Alexey Ragozin
    [email protected]
    http://blog.ragozin.info
    - my articles
    http://code.google.com/p/gridkit
    - my open source code

    View Slide