Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch & Marvel

Elasticsearch & Marvel

Elasticsearch & Marvel

Eko Kurniawan Khannedy

July 13, 2016
Tweet

More Decks by Eko Kurniawan Khannedy

Other Decks in Technology

Transcript

  1. ELASTICSEARCH & MARVEL EKO KURNIAWAN KHANNEDY ▸ Principal Software Development

    Engineer at Blibli. ▸ Part of Research and Development Team at Blibli. ▸ Code Scala & Java, but sometimes code Ruby (In this demo we use Ruby) ▸ https://www.linkedin.com/in/khannedy
  2. ELASTICSEARCH & MARVEL AGENDA ▸ What is Elasticsearch? ▸ Cluster

    ▸ Shard & Replication ▸ Distributed Document Store ▸ Distributed Search ▸ Marvel ▸ DEMO ▸ Setup Elasticsearch ▸ Scale Horizontally ▸ Data In & Data Out ▸ Zero Downtime Migration ▸ Marvel Monitoring Tool
  3. ELASTICSEARCH & MARVEL ELASTICSEARCH ▸ Elasticsearch is build to be

    always available, and to scale with your needs. Scale can come from buying bigger servers (vertical scale) or from buying more servers (horizontal scale). ▸ Real scalability comes from horizontal scale - the ability to add more nodes to the cluster and to spread load and reliability between them. ▸ Elasticsearch is distributed by nature; it knows how to manage multiple nodes to provide scale and high availability. This also means that your application doesn’t need to care about it.
  4. ELASTICSEARCH & MARVEL ELASTICSEARCH CLUSTER ▸ A Node is a

    running instance of Elasticsearch. ▸ A Cluster consists of one or more nodes with the same cluster.name that working together to share their data and workloads. ▸ As nodes are added to or removed from the cluster, the cluster reorganizes itself to spread the data evenly.
  5. ELASTICSEARCH & MARVEL CLUSTER HEALTH ▸ GREEN : All primary

    and replicas shards are active. ▸ YELLOW : All primary shards are active, but not all replicas shards are active. ▸ RED : Not all primary shards are active.
  6. ELASTICSEARCH & MARVEL ELASTICSEARCH SHARD & REPLICA ▸ By default

    Elasticsearch will give you 5 shards and 1 Replica per index. ▸ You can change the replica size in runtime without downtime. ▸ But we can not change the shard size. If we want to change the shard size, we need to create new index and migrate old index to new index.
  7. ELASTICSEARCH & MARVEL ROUTING DOCUMENT TO A SHARD shard =

    hash(routing) % number_of_primary_shards
  8. ELASTICSEARCH & MARVEL IMMUTABILITY ▸ The inverted index that is

    written to disk is immutable: it doesn’t change. Ever. This immutability has important benefits. ▸ There is no need for locking. If you never have to update the index, you never have to worry about multiple processes trying to make changes at the same time.
  9. ELASTICSEARCH & MARVEL DELETE AND UPDATES ▸ Because document is

    immutable, so the document cannot be removed, nor can be updated to a newer version of the document. ▸ Every commit point includes a .del file that lists which documents have been deleted. ▸ When a document deleted, it is actually marked as deleted in the .del file. ▸ Document updates work in similar way: when a document is updated, the old version of the document is marked as deleted, and the new version of the document is indexed in a new segment.
  10. ELASTICSEARCH & MARVEL DISTRIBUTED SEARCH ▸ Search require a more

    complicated execution model because we don’t know which documents will match the query: they could be on any shard in the cluster. ▸ Finding all matching documents is only half the story. Result from multiple shards must be combined into single sorted list before return the results. ▸ For this reason, search executed in two-phase process called “query then fetch”
  11. ELASTICSEARCH & MARVEL DEEP PAGINATION ▸ Remember that each shard

    must build a priority queue of length from + size, all of which need to be passed back to the coordinating node. And coordinating node needs to sort through number_or_shards + (from + size) documents in order to find the correct size documents. ▸ With big-enough from values, the sorting process can become very heavy indeed, using vast amount of CPU, memory and bandwidth. ▸ For this reason, we strongly advice against deep paging. ▸ As alternative, we can use Scan & Scroll API for deep pagination.
  12. ELASTICSEARCH & MARVEL WHAT IS MARVEL? ▸ Marvel is Elasticsearch

    Monitoring Tool. ▸ Marvel can monitoring all nodes in Elasticsearch Cluster. ▸ Marvel running on top Kibana.