Upgrading Log-Analytics Clusters to OpenSearch (Amitai Stern, LogzIO) | RTA Summit 2023

Upgrading log-analytics clusters to OpenSearch @amitaistern Amitai Stern Software Engineer
and Telemetry Storage Team Lead at Logz.io

6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6

January 2021 6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6

July 2021 6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6 January
2021

Apache 2.0-licensed open source Server Side Public License (SSPL) 6.8
7.0.0 7.10 8.7 7.11 1.0.0 2.6

Supports both AMD64 and ARM64 architectures Supports AMD64 architecture 6.8
7.0.0 7.10 8.7 7.11 1.0.0 2.6

6.8 7.0.0 7.10 8.7 7.11 1.0.0 2.6

Log-Engine Application Kibana Query-Service Other Microservices search ingest Log Analytics
Cluster Architecture Amazon S3 (cluster snapshots)

Preparing for the upgrade Index versions Deprecated APIs Breaking changes
Test env Cluster Settings thresholds

Common Upgrading Strategies

Common Upgrading Strategies: Blue/Green Cluster Cluster read write

Common Upgrading Strategies: In-Place Data nodes Coordinator nodes Cluster manager
nodes Cluster

Balancing Risk, Cost, and Speed

The Drain Method Data nodes PUT _cluster/settings { "persistent": {
"cluster.routing.allocation.exclude._ip": "172.22.4.9" } } 172.22.4.9

"cluster.routing.allocation.exclude._ip": "172.22.4.9", "indices.recovery.max_bytes_per_sec": "150mb" } } 172.22.4.9

"cluster.routing.allocation.include._ip": "172.33.14.1,172.22.4.9" } } 172.22.4.9 172.33.14.1

"cluster.routing.allocation.include._ip": "172.33.14.1,172.22.4.9" } }

The Drain Method: Upgrade Process Overview PUT _cluster/settings { "persistent":
{ "cluster.routing.allocation.include._ip": "<Elasticsearch IPs>" } } Data nodes Coordinator nodes Cluster manager nodes

{ "cluster.routing.allocation.include._ip": "<Elasticsearch IPs>", "cluster.routing.allocation.exclude._ip": "<OpenSearch IPs>" } } Data nodes Coordinator nodes Cluster manager nodes

{ "cluster.routing.allocation.include._ip": "<OpenSearch IPs>", "cluster.routing.allocation.exclude._ip": "<Elasticsearch IPs>" } } Data nodes Coordinator nodes Cluster manager nodes

{ "cluster.routing.allocation.include._ip": "<OpenSearch IPs>", "cluster.routing.allocation.exclude._ip": "<Elasticsearch IPs>" } } Data nodes Coordinator nodes Cluster manager nodes "indices.recovery.max_bytes_per_sec": "300mb"

{ "cluster.routing.allocation.include._ip": "<OpenSearch IPs>", "cluster.routing.allocation.exclude._ip": "<Elasticsearch IPs>" } } Data nodes Coordinator nodes Cluster manager nodes "indices.recovery.max_bytes_per_sec": "0mb"

{ "cluster.routing.allocation.include._ip": null, "cluster.routing.allocation.exclude._ip": null } } Data nodes Coordinator nodes Cluster manager nodes

The Drain Method: Upgrade Process Overview | Data nodes Coordinator
nodes Cluster manager nodes load balancer DNS record

The Drain Method: Upgrade Process Overview - New LB -
OpenSearch Coordinator Nodes Data nodes load balancer Coordinator nodes Cluster manager nodes load balancer DNS record

- New LB - OpenSearch Coordinator Nodes - Override DNS
record DNS record The Drain Method: Upgrade Process Overview Data nodes load balancer Coordinator nodes Cluster manager nodes load balancer

The Drain Method: Upgrade Process Overview - New LB -
OpenSearch Coordinator Nodes - Override DNS record - Remove old Coordinating Nodes Data nodes Coordinator nodes Cluster manager nodes load balancer

| The Drain Method: Upgrade Process Overview Data nodes Coordinator
nodes Cluster manager nodes

The Drain Method: Upgrade Process Overview - Add 3 more
Cluster manager Nodes Data nodes Coordinator nodes Cluster manager nodes

Cluster manager Nodes - Remove the old ones one at a time (elected one last) Data nodes Coordinator nodes Cluster manager nodes

Cluster manager Nodes - Remove the old ones one at a time (elected one last) - Await Cluster Manager Node reelection Data nodes Coordinator nodes ??? Cluster manager nodes

DONE :) The Drain Method: Upgrade Process Overview Data nodes
Coordinator nodes Cluster manager nodes

| The Drain Method: Upgrade Process Overview Data nodes Coordinator
nodes Cluster manager nodes

The Drain Method: Managing risk Log-Engine Application Kibana Query-Service search
ingest Amazon S3 (cluster snapshots)

Backup cluster Backup Log-Engine Application Kibana Query-Service search The Drain
Method: Managing risk Log-Engine ingest Amazon S3 (cluster snapshots)

Backup cluster Application Kibana Query-Service search The Drain Method: Managing
risk Log-Engine ingest Amazon S3 (cluster snapshots) Backup Log-Engine

The Drain Method: Managing risk Log-Engine Application Kibana Query-Service search
ingest Amazon S3 (cluster snapshots)

Backup cluster The Drain Method: Managing risk Log-Engine ingest Amazon
S3 (cluster snapshots) Backup Log-Engine Application Kibana Query-Service search

Backup cluster Application Kibana Query-Service search The Drain Method: Managing
risk Log-Engine ingest Amazon S3 (cluster snapshots) Backup Log-Engine

Method: Managing risk ingest Amazon S3 (cluster snapshots)

Backup Log-Engine Application Kibana Query-Service search The Drain Method: Managing
risk ingest Amazon S3 (cluster snapshots) Restore from Snapshot Backup cluster

Method: Managing risk ingest Amazon S3 (cluster snapshots)

Blue/Green In Place Drain Pros Fully revertable (instantly) Can replace
hardware as well Fast (within a few hours) Cheap (0 extra nodes) Fully revertable (within hours) Rather fast (many hours) Can replace hardware as well Cheaper than Blue/Green Cons Slow upgrade (days/weeks) Complexity grows over time Double the cluster cost for the duration No rolling back No hardware change Costs more than In Place Complex upgrade process Complex rollback Summary Drain

Upgrading log-analytics clusters to OpenSearch Q&A @amitaistern Amitai Stern Software
Engineer and Telemetry Storage Team Lead at Logz.io

Upgrading Log-Analytics Clusters to OpenSearch ...

Upgrading Log-Analytics Clusters to OpenSearch (Amitai Stern, LogzIO) | RTA Summit 2023

More Decks by StarTree

Other Decks in Technology

Featured

Transcript