Elasticsearch & Marvel

ELASTICSEARCH & MARVEL EKO KURNIAWAN KHANNEDY

ELASTICSEARCH & MARVEL EKO KURNIAWAN KHANNEDY ▸ Principal Software Development
Engineer at Blibli. ▸ Part of Research and Development Team at Blibli. ▸ Code Scala & Java, but sometimes code Ruby (In this demo we use Ruby) ▸ https://www.linkedin.com/in/khannedy

ELASTICSEARCH & MARVEL AGENDA ▸ What is Elasticsearch? ▸ Cluster
▸ Shard & Replication ▸ Distributed Document Store ▸ Distributed Search ▸ Marvel ▸ DEMO ▸ Setup Elasticsearch ▸ Scale Horizontally ▸ Data In & Data Out ▸ Zero Downtime Migration ▸ Marvel Monitoring Tool

WHAT IS ELASTICSEARCH? EKO KURNIAWAN KHANNEDY

YOU KNOW, FOR SEARCH

CLUSTER ELASTICSEARCH & MARVEL

ELASTICSEARCH & MARVEL ELASTICSEARCH ▸ Elasticsearch is build to be
always available, and to scale with your needs. Scale can come from buying bigger servers (vertical scale) or from buying more servers (horizontal scale). ▸ Real scalability comes from horizontal scale - the ability to add more nodes to the cluster and to spread load and reliability between them. ▸ Elasticsearch is distributed by nature; it knows how to manage multiple nodes to provide scale and high availability. This also means that your application doesn’t need to care about it.

ELASTICSEARCH & MARVEL ELASTICSEARCH CLUSTER ▸ A Node is a
running instance of Elasticsearch. ▸ A Cluster consists of one or more nodes with the same cluster.name that working together to share their data and workloads. ▸ As nodes are added to or removed from the cluster, the cluster reorganizes itself to spread the data evenly.

ELASTICSEARCH & MARVEL CLUSTER HEALTH ▸ GREEN : All primary
and replicas shards are active. ▸ YELLOW : All primary shards are active, but not all replicas shards are active. ▸ RED : Not all primary shards are active.

ELASTICSEARCH & MARVEL AN EMPTY CLUSTER

ELASTICSEARCH & MARVEL ADD AN INDEX

ELASTICSEARCH & MARVEL A TWO-NODE CLUSTER

ELASTICSEARCH & MARVEL SCALE HORIZONTALLY

ELASTICSEARCH & MARVEL CLUSTER AFTER KILLING ONE NODE

SHARD & REPLICA ELASTICSEARCH & MARVEL

ELASTICSEARCH & MARVEL ELASTICSEARCH SHARD & REPLICA ▸ By default
Elasticsearch will give you 5 shards and 1 Replica per index. ▸ You can change the replica size in runtime without downtime. ▸ But we can not change the shard size. If we want to change the shard size, we need to create new index and migrate old index to new index.

ELASTICSEARCH & MARVEL 3 SHARDS WITH NO REPLICA

ELASTICSEARCH & MARVEL 3 SHARDS WITH 1 REPLICA

ELASTICSEARCH & MARVEL 3 SHARDS WITH 2 REPLICAS

DISTRIBUTED DOCUMENT STORE ELASTICSEARCH & MARVEL

ELASTICSEARCH & MARVEL ROUTING DOCUMENT TO A SHARD shard =
hash(routing) % number_of_primary_shards

ELASTICSEARCH & MARVEL CREATING, INDEXING AND DELETING A DOCUMENT

ELASTICSEARCH & MARVEL RETRIEVING A DOCUMENT

ELASTICSEARCH & MARVEL UPDATING A DOCUMENT

ELASTICSEARCH & MARVEL IMMUTABILITY ▸ The inverted index that is
written to disk is immutable: it doesn’t change. Ever. This immutability has important beneﬁts. ▸ There is no need for locking. If you never have to update the index, you never have to worry about multiple processes trying to make changes at the same time.

ELASTICSEARCH & MARVEL DELETE AND UPDATES ▸ Because document is
immutable, so the document cannot be removed, nor can be updated to a newer version of the document. ▸ Every commit point includes a .del ﬁle that lists which documents have been deleted. ▸ When a document deleted, it is actually marked as deleted in the .del ﬁle. ▸ Document updates work in similar way: when a document is updated, the old version of the document is marked as deleted, and the new version of the document is indexed in a new segment.

DISTRIBUTED SEARCH ELASTICSEARCH & MARVEL

ELASTICSEARCH & MARVEL DISTRIBUTED SEARCH ▸ Search require a more
complicated execution model because we don’t know which documents will match the query: they could be on any shard in the cluster. ▸ Finding all matching documents is only half the story. Result from multiple shards must be combined into single sorted list before return the results. ▸ For this reason, search executed in two-phase process called “query then fetch”

ELASTICSEARCH & MARVEL QUERY PHASE

ELASTICSEARCH & MARVEL FETCH PHASE

ELASTICSEARCH & MARVEL DEEP PAGINATION ▸ Remember that each shard
must build a priority queue of length from + size, all of which need to be passed back to the coordinating node. And coordinating node needs to sort through number_or_shards + (from + size) documents in order to ﬁnd the correct size documents. ▸ With big-enough from values, the sorting process can become very heavy indeed, using vast amount of CPU, memory and bandwidth. ▸ For this reason, we strongly advice against deep paging. ▸ As alternative, we can use Scan & Scroll API for deep pagination.

MARVEL ELASTICSEARCH & MARVEL

ELASTICSEARCH & MARVEL WHAT IS MARVEL? ▸ Marvel is Elasticsearch
Monitoring Tool. ▸ Marvel can monitoring all nodes in Elasticsearch Cluster. ▸ Marvel running on top Kibana.

ELASTICSEARCH & MARVEL CLUSTER METRICS

ELASTICSEARCH & MARVEL INDEX METRICS

ELASTICSEARCH & MARVEL NODE METRICS

THANKS

ELASTICSEARCH & MARVEL REFERENCES ▸ https://www.elastic.co/ ▸ https://www.elastic.co/products/elasticsearch ▸ https://www.elastic.co/products/marvel
▸ https://www.elastic.co/learn

Elasticsearch & Marvel

Elasticsearch & Marvel

More Decks by Eko Kurniawan Khannedy

Other Decks in Technology

Featured

Transcript