Slide 1

Slide 1 text

Oracle Coherence Architecture key notes Alexey Ragozin [email protected] May 2012

Slide 2

Slide 2 text

Presentation overview Topics • Network model • Threading model • Cache operations scalability • Few well known pit falls * Usage of partitioned cache is assumed unless stated otherwise

Slide 3

Slide 3 text

Network overview Inter member transport in Coherence cluster • Message based protocol TCMP • In-order guaranteed delivery between members • NACKs for low latency communications • Can work over UDP, TCP, SSL (+ Oracle ExoBus) • Limited use of multicast (deprecated) Cluster housekeeping protocol • TCP Ring – fast way to detect killed processes • Witness protocol for disconnecting members • Communication pausing

Slide 4

Slide 4 text

Network overview Coherence*Extend • TCP or SSL transport • Each remote service have one connection at time  Single TCP is shared between all service users  Number of connection can be increase used multiple services • Request pass-through  if Extend connection and cluster are using same serialization

Slide 5

Slide 5 text

Network pipeline API Cache service Packet publisher Packet speaker OS Packet listener Packet receiver OS Service thread Worker thread Packet receiver Packet publisher Packet speaker Packet listener Packet receiver Service thread Cache service Packet publisher Packet speaker Packet listener API Service thread Packet receiver Packet listener OS OS Packet speaker Packet publisher Service thread Worker thread Serialization Deserialization Client thread Simple cache GET request

Slide 6

Slide 6 text

Network pipeline • Heavy thread usage  More cores is better  Starving on CPU – more context switches • Network IO is effectively single threaded  Multiple nodes per server may be required to utilize network • Each service has single control thread

Slide 7

Slide 7 text

Data distribution #3 #0 #4 #1 #2 #5 Cache A Cache B Member 1 Member 2 Backing map Backing map Backing map Backing map

Slide 8

Slide 8 text

Data distribution  Same partition distribution is used for all caches in same service • can be exploited for collocating data in caches  Balancing by partition count  Single backing map per cache • per node (default) • per partition (can be configured)  Partition backup is stored in separate backing map

Slide 9

Slide 9 text

Threading overview  Control thread – one per service • receive network messages • perform cache operations in no thread pool configured  Thread pool – optional, size is configured • desterilize data in request (if needed) • perform operation (aggregators, processor, backing map access) • serialize result data (if needed)  Event thread – one per service • call map listeners

Slide 10

Slide 10 text

Locking and job distribution • Update operations require partition locks • Reads including aggregators – lock free • “read dirty” – cross operation visibility • Updates are atomic per job • Jobs – (only if thread pool enabled)  Key set based request are split – job per partition  Filter based request – one multiple partition job  Calculate key set, lock partitions, execute job

Slide 11

Slide 11 text

Problems of threading model • Event delivery is single threaded • Dispatching of large request may block control thread, making service unresponsive • No discrimination between tasks  Few long running task may saturate thread pool, making cache unresponsive • Limited scheduling priorities • Key based requests are producing more jobs, occupying more threads  Single large getAll() request for DB backed cache may saturate all thread pools on all nodes for considerable time

Slide 12

Slide 12 text

Interaction with backing map  Backing map notifies content changes by events • events received by thread, execution write operation, added to transaction change set (change sets are replicated atomically) • events received out of bound, replicated asynchronously • backup partition copy is passive

Slide 13

Slide 13 text

Operation scalability • Key based operations – linearly scalable  growing cluster – linearly increase operation throughput • Indexed queries / aggregation  growing cluster – marginally contributes to throughput  more data – marginal decrease of throughput • Non indexed queries / aggregation  throughput proportional data / cluster core count

Slide 14

Slide 14 text

Well known pitfalls  Relying on reference walking • Problem - network latency accumulation • Hierarchical organization – is typical example  Solutions • Denormalization • Data affinity • Indexes

Slide 15

Slide 15 text

Well known pitfalls  Too fine grained operations • accumulating network latency  Too bulky operations • blocking control thread for long • Saturation thread pools  Solutions • Grouping operations in limited size batches • Grouping operations per member • Grouping operations per partitions

Slide 16

Slide 16 text

Well known pitfalls  Abusing grid-side (inplace) processing • CPU on storage nodes is limited resources • grid-side processing may require more total serialization efforts  Solution, account all factors choosing • As is data retrieval requires no marshaling on grid side • Network bandwidth is rarely a limitation • Grid nodes CPUs are shared and limited resource

Slide 17

Slide 17 text

Thank you Alexey Ragozin [email protected] http://blog.ragozin.info - my articles http://code.google.com/p/gridkit - my open source code