Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} Tour 2018 Munich : Elasticsearch De...

Elastic Co
February 01, 2018

Elastic{ON} Tour 2018 Munich : Elasticsearch Deep Dive

Elastic{ON} Tour Munich - February 1, 2018

Cross-cluster search, ingest node, rollover API, shrink API, field collapsing, unified highlighter . . . there's lots to love in Elasticsearch these days. Get up to speed on all things 5.x and see how 6.x will address pain points around scale, upgrading, recovery, and sparse data and disk usage.

Simon Willnauer | Co-Founder and Elasticsearch Tech Lead | Elastic

Elastic Co

February 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. 5 Elasticsearch 5.x Still ^ • Keyword normalization • Unified

    highlighter • Field collapse • Multi-word synonyms+proximity • Cancellable searches • Parallel scroll & reindex
  2. 6 Elasticsearch 5.x Still ^ • Numeric & date range

    fields • Automatic optimizations for range searches • Massive aggregations with partitioning • Faster geo-distance sorting • Faster geo-ip lookups and for logs and for numbers and for geo and ... ^
  3. 7 Ever-increasing Scale • More clusters / resource isolation •

    Easier to manage • Easier to upgrade • Reduce potential outages • Need to query across clusters
  4. Cluster UK Master Nodes Data Node Data Node Data Node

    Tribe Node t1 Node Client Cluster US Master Nodes Data Node Data Node Data Node t2 Node Client 8 Tribe Node Merged Cluster State Kibana
  5. 9 Cross-Cluster Search Cluster UK Master Nodes Data Node Data

    Node Data Node Master/Data Node Cluster US Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Optional dedicated cross-cluster search cluster
  6. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 11 What

    are the major pain points? • Disk usage • Resiliency & recovery • Major version upgrades
  7. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 12 What

    are the major pain points? • Disk usage • Resiliency & Recovery • Major version upgrades
  8. 13 Doc Values - Sparse Data (5.x) Segment 1 ID

    fname lname 1 Simon Willnauer 2 Shay Banon 3 Tanya Bragin Segment 2 ID fname lname mi state city 4 Steve Kearns Null Null Boston 5 George Burdell P GA Null 6 Bill Swerski Null Null Chicago Merged Segment 3 Docs fname lname mi state city 1 Simon Willnauer Null Null Null 2 Shay Banon Null Null Null 3 Tanya Bragin Null Null Null 4 Steve Kearns Null Null Boston 5 George Burdell P GA Null 6 Bill Swerski Null Null Baz
  9. 14 Disk Space Improvements • Sparse value support coming in

    6.0 • _all field off by default with special query handler • More efficient configurations in ingest products • Future: Rollups © Tony Weman / CC-BY 2.5
  10. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 16 What

    are the major pain points? • Disk usage • Resiliency & recovery • Major version upgrades
  11. 17 Recovery (5.x) Segment 1 Segment 1 Segment 2 Segment

    2 Segment 3 Segment 3 Primary Replica
  12. 18 Recovery (5.x) Segment 1 Segment 1 Segment 2 Segment

    2 Segment 3 Segment 3 Offline Primary
  13. 19 Recovery (5.x) Segment 1 Segment 1 Segment 2 Segment

    4 Segment 3 Primary Segment 1 Segment 2 Segment 3 Offline
  14. 20 Recovery (5.x) Segment 1 Segment 1 Segment 2 Segment

    4 Segment 3 Primary Segment 1 Segment 2 Segment 3 Repica File copy recovery
  15. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 22 What

    are the major pain points? • Disk usage • Resiliency & recovery • Major version upgrades
  16. 23 Major version upgrades pain • Full cluster restart ==

    downtime • Client/server version compatibility (Java) • Data retention/lifecycle • Too many hard-to-resolve breaking
 changes • Little/no warnings of deprecations &
 breaking changes © Famartin Wikimedia Commons / CC-BY 2.5
  17. 24 Major version upgrades pain • Full cluster restart ==

    downtime • Client/server version compatibility (Java) • Data retention/lifecycle • Too many hard-to-resolve breaking
 changes • Little/no warnings of deprecations &
 breaking changes © Famartin Wikimedia Commons / CC-BY 2.5
  18. 25 Rolling upgrades • Upgrade from 5.latest to 6.x without

    full cluster restart • Some caveats to be aware of: • There are still breaking changes! However many have options backported. Read & test! • All nodes must be 5.latest before • Security+TLS
  19. 26 Major version upgrades pain • Full cluster restart ==

    downtime • Client/server version compatibility (Java) • Data retention/lifecycle • Too many hard-to-resolve breaking
 changes • Little/no warnings of deprecations &
 breaking changes © Famartin Wikimedia Commons / CC-BY 2.5
  20. 27 Java client • All other languages use REST interface

    • Transport client tied to Elasticsearch major version • Complicates firewalls & security
  21. 28 Java Low level REST client • Released in 5.0

    • JSON strings only • Resilient, but not user friendly • Low level like other language clients • Can sniff nodes and Round-Robin requests
  22. 29 Java high level REST client • Released in 5.6

    • Works across major version upgrade • IDE friendly • Similar API to Transport Client • Based on low-level REST client • Supports CRUD & Search
  23. 30 Major version upgrades pain • Full cluster restart ==

    downtime • Client/server version compatibility (Java) • Data retention/lifecycle • Too many hard-to-resolve breaking
 changes • Little/no warnings of deprecations &
 breaking changes © Famartin Wikimedia Commons / CC-BY 2.5
  24. 31 Data Retention / Lifecycle: What are Your Options? Delete

    old data Reindex Run v1.2.0 and hope your security team doesn’t notice • Keeps cluster & servers tidy :) • Good where there’s a practical retention period :) • Still a period that you can’t upgrade :( • No in-between / rollups :( • Get the latest & greatest features :) • Can be time consuming depending on data volumes :( • No easy management :( • Path of least resistance? • Eventually, we stop backporting security fixes :( • You never get the latest & greatest :(
  25. 32 Cross Major Version Search v5.6.0 Master Nodes Data Node

    Data Node v6.0.0 Your App Master Nodes Data Node Cross Cluster Client v5.latest
  26. 33 The latest and greatest in 6.1 and 6.2 •

    Shard Splitting • Adaptive Replica Selection • Paging Aggregations • Java 9 Support (6.2) • SSO / SAML (6.2) • Plugin Extensibility via SPI (6.2)