Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch

 Elasticsearch

10 minutes talk about how Elasticsearch is working. It explains master node responsabilities, and how things work inside a shard. It gives good insight at how refresh, flush and optimize operation impact on elasticsear performance. It also explains how indexing and search work in this distributed database.

Christophe Marchal

October 20, 2016
Tweet

Other Decks in Technology

Transcript

  1. Responsabilities • Shard allocation • Act on node joining and

    leaving • Broadcast cluster state to every node
  2. Shard allocation is based on: • balancing shards accross all

    data nodes • Disk availability on data node • Fault tolerance • Physical configuration (VM, rack, availability zone) • Filters for node decomissioning
  3. Zen Discovery 1. Uses Unicast 2. Try to contact hosts

    received by configuration 3. Retrieve cluster state 4. Send a Join to the master
  4. Master Election • Any non client node can be configured

    to be master eligible • Minimum master eligible nodes before election to prevent 2 masterss • By default only master eligible and data nodes participate
  5. Failure detection • Master pings all nodes periodically • All

    nodes ping master periodically • Only master can update cluster state
  6. Distributed P1 R2 R1 R3 P3 P3 R1 P1 P2

    R3 R1 R3 R3 R2 R2 R2 R1 P2 Index 1 Index 2
  7. Inside a segment: Inverted index Term Doc 1 Doc 2

    Doc 3 Doc 4 brown X X X fox X X X quick X X
  8. Inside a segment: Inverted index Term Doc 1 Doc 2

    Doc 3 Doc 4 Doc 5 brown X X X fox X X X X quick X X write READ
  9. Immutable Segment • Partial inverted index in segment • Immutability

    make it cacheable • Filters on top of segment become immutable and cacheable
  10. Immutable Segment • Partial inverted index in segment • Immutability

    make it cacheable • Filters on top of segment become immutable and cacheable how do I update my index??
  11. Indexing document S1 S2 S3 DISK Memory S1 S2 S3

    Commit point In memory buffer Translog
  12. Refresh (every sec by default) S1 S2 S3 DISK Memory

    S1 S2 S3 Commit point In memory buffer S4 Translog
  13. Flush (every 5 secs) S1 S2 S3 DISK Memory S1

    S2 S3 Commit point In memory buffer S4 Translog S4 fsync
  14. Search: Query Phase R2 P2 R1 R2 P1 1 R1

    2 2 3 3 Return IDs and Score