Slide 1

Slide 1 text

Upgrading log-analytics clusters to OpenSearch @amitaistern Amitai Stern Software Engineer and Telemetry Storage Team Lead at Logz.io

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6

Slide 4

Slide 4 text

January 2021 6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6

Slide 5

Slide 5 text

July 2021 6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6 January 2021

Slide 6

Slide 6 text

Apache 2.0-licensed open source Server Side Public License (SSPL) 6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6

Slide 7

Slide 7 text

Supports both AMD64 and ARM64 architectures Supports AMD64 architecture 6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6

Slide 8

Slide 8 text

6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6

Slide 9

Slide 9 text

Log-Engine Application Kibana Query-Service Other Microservices search ingest Log Analytics Cluster Architecture Amazon S3 (cluster snapshots)

Slide 10

Slide 10 text

Log-Engine Application Kibana Query-Service Other Microservices search ingest Log Analytics Cluster Architecture Amazon S3 (cluster snapshots)

Slide 11

Slide 11 text

Preparing for the upgrade Index versions Deprecated APIs Breaking changes Test env Cluster Settings thresholds

Slide 12

Slide 12 text

Common Upgrading Strategies

Slide 13

Slide 13 text

Common Upgrading Strategies: Blue/Green Cluster Cluster read write

Slide 14

Slide 14 text

Common Upgrading Strategies: Blue/Green Cluster Cluster read write

Slide 15

Slide 15 text

Common Upgrading Strategies: In-Place Data nodes Coordinator nodes Cluster manager nodes Cluster

Slide 16

Slide 16 text

Balancing Risk, Cost, and Speed

Slide 17

Slide 17 text

The Drain Method Data nodes PUT _cluster/settings { "persistent": { "cluster.routing.allocation.exclude._ip": "172.22.4.9" } } 172.22.4.9

Slide 18

Slide 18 text

The Drain Method Data nodes PUT _cluster/settings { "persistent": { "cluster.routing.allocation.exclude._ip": "172.22.4.9", "indices.recovery.max_bytes_per_sec": "150mb" } } 172.22.4.9

Slide 19

Slide 19 text

The Drain Method Data nodes PUT _cluster/settings { "persistent": { "cluster.routing.allocation.include._ip": "172.33.14.1,172.22.4.9" } } 172.22.4.9 172.33.14.1

Slide 20

Slide 20 text

The Drain Method Data nodes PUT _cluster/settings { "persistent": { "cluster.routing.allocation.include._ip": "172.33.14.1,172.22.4.9" } }

Slide 21

Slide 21 text

The Drain Method: Upgrade Process Overview PUT _cluster/settings { "persistent": { "cluster.routing.allocation.include._ip": "" } } Data nodes Coordinator nodes Cluster manager nodes

Slide 22

Slide 22 text

The Drain Method: Upgrade Process Overview PUT _cluster/settings { "persistent": { "cluster.routing.allocation.include._ip": "", "cluster.routing.allocation.exclude._ip": "" } } Data nodes Coordinator nodes Cluster manager nodes

Slide 23

Slide 23 text

The Drain Method: Upgrade Process Overview PUT _cluster/settings { "persistent": { "cluster.routing.allocation.include._ip": "", "cluster.routing.allocation.exclude._ip": "" } } Data nodes Coordinator nodes Cluster manager nodes

Slide 24

Slide 24 text

The Drain Method: Upgrade Process Overview PUT _cluster/settings { "persistent": { "cluster.routing.allocation.include._ip": "", "cluster.routing.allocation.exclude._ip": "" } } Data nodes Coordinator nodes Cluster manager nodes "indices.recovery.max_bytes_per_sec": "300mb"

Slide 25

Slide 25 text

The Drain Method: Upgrade Process Overview PUT _cluster/settings { "persistent": { "cluster.routing.allocation.include._ip": "", "cluster.routing.allocation.exclude._ip": "" } } Data nodes Coordinator nodes Cluster manager nodes "indices.recovery.max_bytes_per_sec": "0mb"

Slide 26

Slide 26 text

The Drain Method: Upgrade Process Overview PUT _cluster/settings { "persistent": { "cluster.routing.allocation.include._ip": null, "cluster.routing.allocation.exclude._ip": null } } Data nodes Coordinator nodes Cluster manager nodes

Slide 27

Slide 27 text

The Drain Method: Upgrade Process Overview | Data nodes Coordinator nodes Cluster manager nodes load balancer DNS record

Slide 28

Slide 28 text

The Drain Method: Upgrade Process Overview - New LB - OpenSearch Coordinator Nodes Data nodes load balancer Coordinator nodes Cluster manager nodes load balancer DNS record

Slide 29

Slide 29 text

- New LB - OpenSearch Coordinator Nodes - Override DNS record DNS record The Drain Method: Upgrade Process Overview Data nodes load balancer Coordinator nodes Cluster manager nodes load balancer

Slide 30

Slide 30 text

The Drain Method: Upgrade Process Overview - New LB - OpenSearch Coordinator Nodes - Override DNS record - Remove old Coordinating Nodes Data nodes Coordinator nodes Cluster manager nodes load balancer

Slide 31

Slide 31 text

| The Drain Method: Upgrade Process Overview Data nodes Coordinator nodes Cluster manager nodes

Slide 32

Slide 32 text

The Drain Method: Upgrade Process Overview - Add 3 more Cluster manager Nodes Data nodes Coordinator nodes Cluster manager nodes

Slide 33

Slide 33 text

The Drain Method: Upgrade Process Overview - Add 3 more Cluster manager Nodes - Remove the old ones one at a time (elected one last) Data nodes Coordinator nodes Cluster manager nodes

Slide 34

Slide 34 text

The Drain Method: Upgrade Process Overview - Add 3 more Cluster manager Nodes - Remove the old ones one at a time (elected one last) Data nodes Coordinator nodes Cluster manager nodes

Slide 35

Slide 35 text

The Drain Method: Upgrade Process Overview - Add 3 more Cluster manager Nodes - Remove the old ones one at a time (elected one last) Data nodes Coordinator nodes Cluster manager nodes

Slide 36

Slide 36 text

The Drain Method: Upgrade Process Overview - Add 3 more Cluster manager Nodes - Remove the old ones one at a time (elected one last) Data nodes Coordinator nodes Cluster manager nodes

Slide 37

Slide 37 text

The Drain Method: Upgrade Process Overview - Add 3 more Cluster manager Nodes - Remove the old ones one at a time (elected one last) - Await Cluster Manager Node reelection Data nodes Coordinator nodes ??? Cluster manager nodes

Slide 38

Slide 38 text

DONE :) The Drain Method: Upgrade Process Overview Data nodes Coordinator nodes Cluster manager nodes

Slide 39

Slide 39 text

| The Drain Method: Upgrade Process Overview Data nodes Coordinator nodes Cluster manager nodes

Slide 40

Slide 40 text

The Drain Method: Managing risk Log-Engine Application Kibana Query-Service search ingest Amazon S3 (cluster snapshots)

Slide 41

Slide 41 text

Backup cluster Backup Log-Engine Application Kibana Query-Service search The Drain Method: Managing risk Log-Engine ingest Amazon S3 (cluster snapshots)

Slide 42

Slide 42 text

Backup cluster Application Kibana Query-Service search The Drain Method: Managing risk Log-Engine ingest Amazon S3 (cluster snapshots) Backup Log-Engine

Slide 43

Slide 43 text

The Drain Method: Managing risk Log-Engine Application Kibana Query-Service search ingest Amazon S3 (cluster snapshots)

Slide 44

Slide 44 text

Backup cluster The Drain Method: Managing risk Log-Engine ingest Amazon S3 (cluster snapshots) Backup Log-Engine Application Kibana Query-Service search

Slide 45

Slide 45 text

Backup cluster Application Kibana Query-Service search The Drain Method: Managing risk Log-Engine ingest Amazon S3 (cluster snapshots) Backup Log-Engine

Slide 46

Slide 46 text

Backup cluster Backup Log-Engine Application Kibana Query-Service search The Drain Method: Managing risk ingest Amazon S3 (cluster snapshots)

Slide 47

Slide 47 text

Backup Log-Engine Application Kibana Query-Service search The Drain Method: Managing risk ingest Amazon S3 (cluster snapshots) Restore from Snapshot Backup cluster

Slide 48

Slide 48 text

Backup cluster Backup Log-Engine Application Kibana Query-Service search The Drain Method: Managing risk ingest Amazon S3 (cluster snapshots)

Slide 49

Slide 49 text

Blue/Green In Place Drain Pros Fully revertable (instantly) Can replace hardware as well Fast (within a few hours) Cheap (0 extra nodes) Fully revertable (within hours) Rather fast (many hours) Can replace hardware as well Cheaper than Blue/Green Cons Slow upgrade (days/weeks) Complexity grows over time Double the cluster cost for the duration No rolling back No hardware change Costs more than In Place Complex upgrade process Complex rollback Summary Drain

Slide 50

Slide 50 text

Upgrading log-analytics clusters to OpenSearch Q&A @amitaistern Amitai Stern Software Engineer and Telemetry Storage Team Lead at Logz.io