Agenda
Distributed systems - But why?
Elasticsearch
Data
Search
Analytics
Slide 4
Slide 4 text
The need for distributed systems
Exceeding single system limits (CPU, Memory,
Storage)
Load sharing
Parallelization (shorter response times)
Reliability (SPOF)
Price
Boundaries increase
complexity
Core
Computer
LAN
WAN
Internet
Slide 11
Slide 11 text
Boundaries increase
complexity
Core
Computer
LAN
WAN
Internet
Slide 12
Slide 12 text
Boundaries increase
complexity
Core
Computer
LAN
WAN
Internet
Slide 13
Slide 13 text
Boundaries increase
complexity
🗣 Communication
✍ Coordination
Error handling
Slide 14
Slide 14 text
Fallacies of Distributed Computing
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn't change
There is one administrator
Transport cost is zero
The network is homogeneous
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
Consensus
Achieving a common state among participants
Byzantine Failures
Trust
Crash
Quorum vs. strictness
Slide 17
Slide 17 text
Consensus goals
Cluster Membership
Data writes
Security (BTC)
Finding a leader (Paxos, Raft)
Slide 18
Slide 18 text
Elasticsearch introduction
Slide 19
Slide 19 text
Elasticsearch
Speed. Scale. Relevance.
HTTP based JSON interface
Scales to many nodes
Fast responses
Ranked results (BM25, recency, popularity)
Resiliency
Flexibility (index time vs. query time)
Based on Apache Lucene
Slide 20
Slide 20 text
Use Cases
E-Commerce, E-Procurement, Patents, Dating
Maps: Geo based search
Observability: Logs, Metrics, APM, Uptime
Enterprise Search: Site & App Search, Workplace Search
Security: SIEM, Endpoint Security
Slide 21
Slide 21 text
Master
Slide 22
Slide 22 text
Master node
Pings other nodes
Decides data placement
Removes nodes from the cluster
Not needed for reading/writing
Updates the cluster state and distributes to all nodes
Re-election on failure
Slide 23
Slide 23 text
Node startup
Slide 24
Slide 24 text
Elects itself
Slide 25
Slide 25 text
Node join
Slide 26
Slide 26 text
Node join
Slide 27
Slide 27 text
Master node is not reachable
Slide 28
Slide 28 text
Reelection within remaining nodes
Slide 29
Slide 29 text
Cluster State
Nodes
Data (Shards on nodes)
Mapping (DDL)
Updated based on events (node join/leave, index creation, mapping update)
Sent as diff due to size
Slide 30
Slide 30 text
Data distribution
Shard: Unit of work, self contained inverted index
Index: A logical grouping of shards
Primary shard: Partitioning of data in an index (write scalability)
Replica shard: Copy of a primary shard (read scalability)
Slide 31
Slide 31 text
Primary shards
Slide 32
Slide 32 text
Primary shards per index
Slide 33
Slide 33 text
Redistribution
Slide 34
Slide 34 text
Redistribution
Slide 35
Slide 35 text
Replicas
Slide 36
Slide 36 text
No content
Slide 37
Slide 37 text
Distributed search
Two phase approach
Query all shards, collect top-k hits
Sort all search results on coordinating node
Create real top-k result from all results
Fetch data for all real results (top-k instead of shard_count * top-k )
Slide 38
Slide 38 text
No content
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
No content
Slide 43
Slide 43 text
No content
Slide 44
Slide 44 text
Adaptive Replica Selection
Which shards are the best to select?
Each node contains information of
Response time of prior requests
Previous search durations
Search threadpool size
Less loaded nodes will retrieve more queries
More info: Blog post, C3 paper
Slide 45
Slide 45 text
Searching faster by searching less
Optimization applies to all shards in the query phase
Skip non competitive hits for top-k retrieval
Trading in accuracy of hit counts for speed
More info: Blog post
Slide 46
Slide 46 text
Example: elasticsearch OR kibana OR logstash
At some point there is a minimal score required to be in the top-k
documents
If one of the search terms has a lower score than that minimal score it
can be skipped
Query could be changed to elasticsearch OR kibana for finding matches,
thus skipping all documents containing only logstash
Result: Major speed up
Slide 47
Slide 47 text
Lucene Nightly Benchmarks
Slide 48
Slide 48 text
Many more optimizations
Skip lists (search delta encoded postings list)
Two phase iterations (approximation & verification)
Integer compression
Data structures like BKD tree for numbers, FSTs for completion
Index sorting
Slide 49
Slide 49 text
Aggregations
Aggregations run on top of a result set of a query
Slice, dice und combine data to get insights
Show me the total sales value by quarter for each sales person
The average response time per URL endpoint per day
The number of products within each category
The biggest order of each month in the last year
Slide 50
Slide 50 text
Distributed Aggregations
Some calculations require your data to be central for a certain use-case
Unique values in my dataset - how does this work across shards without
sending the whole data set to a single node?
Solution: Be less accurate, sometimes be probabilistic!
Slide 51
Slide 51 text
terms Aggregation
Count all the categories from returned products
Slide 52
Slide 52 text
terms Aggregation
Slide 53
Slide 53 text
Counts
Slide 54
Slide 54 text
Counts
Slide 55
Slide 55 text
Counts
Count more than top-n buckets: size * 1.5 + 10
Does not eliminate the problem, reduces it only
Provide doc_count_error_upper_bound
Possible solution: Add more roundtrips?
Slide 56
Slide 56 text
Probabilistic Data Structures
Membership check: Bloom, Cuckoo filters
Frequencies in an event stream: Count-min sketch
Cardinality: LogLog algorithms
Quantiles: HDR, T-Digest
Slide 57
Slide 57 text
How many distinct elements are across my whole dataset?
cardinality Aggregation
Slide 58
Slide 58 text
cardinality
Result: 40-65?!
Slide 59
Slide 59 text
cardinality Aggregation
Solution: HyperLogLog++
mergeable data structure
Approximate
Trades memory for accuracy
Fixed memory usage based on configured precision_threshold
Slide 60
Slide 60 text
percentiles Aggregation
Naive implementation (sorted array) is not mergeable across shards and
scales with the number of documents in a shard
T-Digest utilizes a clustering approach, that reduces memory usage by falling
back into approximation at a certain size
Slide 61
Slide 61 text
Summary
Tradeoffs
Behaviour
Algorithms
Data Structures
Every distributed system is
different
Slide 62
Slide 62 text
Summary - ease distributed systems usage
For developers
Elasticsearch Clients check for nodes in the background
For operations
ECK, terraform provider, ecctl