Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} Tour Washington D.C. - Elasticsearch

Elastic Co
October 26, 2017

Elastic{ON} Tour Washington D.C. - Elasticsearch

Elastic{ON} Tour Washington D.C. - October 26, 2017

Cross-cluster search, ingest node, rollover API, shrink API, field collapsing, unified highlighter . . . there's lots to love in Elasticsearch these days. Get up to speed on 5.x and see how 6.x will address pain points around scale, upgrading, recovery, and sparse data and disk usage.

Dave Erickson | Director of Solutions Architecture | Elastic

Elastic Co

October 26, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. 5 Elasticsearch 5.x Still ^ • Keyword normalization • Unified

    highlighter • Field collapse • Multi-word synonyms+proximity • Cancellable searches • Parallel scroll & reindex
  2. 6 Elasticsearch 5.x Still ^ • Numeric & date range

    fields • Automatic optimizations for range searches • Massive aggregations with partitioning • Faster geo-distance sorting • Faster geo-ip lookups and for logs and for numbers and for geo and ... ^
  3. 7 Ever-increasing Scale • More clusters / resource isolation •

    Easier to manage • Easier to upgrade • Reduce potential outages • Need to query across clusters
  4. 8 Tribe Node (ES 2.X) Master Nodes Data Nodes Master

    Nodes Data Nodes Tribe Node Merged Cluster State
  5. 10 Cross-Cluster Search (ES 5.3, Kibana 5.5) Master Nodes Data

    Nodes Master Nodes Data Nodes Dedicated Cross-Cluster Search Cluster
  6. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 13 What

    are the major pain points? • Disk usage • Resiliency & recovery • Major version upgrades
  7. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 14 What

    are the major pain points? • Disk usage • Resiliency & recovery • Major version upgrades
  8. 15 Doc Values - Sparse Data (5.x) Segment 1 ID

    fname lname 1 Shane Connelly 2 Shay Banon 3 Tanya Bragin Segment 2 ID fname lname mi state city 4 Steve Kearns Null Null Boston 5 George Burdell P GA Null 6 Bill Swerski Null Null Chicago Merged Segment 3 Docs fname lname mi state city 1 Shane Connelly Null Null Null 2 Shay Banon Null Null Null 3 Tanya Bragin Null Null Null 4 Steve Kearns Null Null Boston 5 George Burdell P GA Null 6 Bill Swerski Null Null Baz
  9. 16 Disk Space Improvements • Sparse value support coming in

    6.0 • _all field off by default with special query handler • More efficient configurations in ingest products • Future: Rollups © Tony Weman / CC-BY 2.5
  10. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 18 What

    are the major pain points? • Disk usage • Resiliency & recovery • Major version upgrades
  11. 19 Recovery (5.x) Segment 1 Segment 1 Segment 2 Segment

    2 Segment 3 Segment 3 Primary Replica
  12. 20 Recovery (5.x) Segment 1 Segment 1 Segment 2 Segment

    2 Segment 3 Segment 3 Offline Primary
  13. 21 Recovery (5.x) Segment 1 Segment 1 Segment 2 Segment

    4 Segment 3 Primary Segment 1 Segment 2 Segment 3 Offline
  14. 22 Recovery (5.x) Segment 1 Segment 1 Segment 2 Segment

    4 Segment 3 Primary Segment 1 Segment 2 Segment 3 Repica File copy recovery
  15. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 24 What

    are the major pain points? • Disk usage • Resiliency & recovery • Major version upgrades
  16. 25 Major version upgrades pain • Full cluster restart ==

    downtime • Client/server version compatibility (Java) • Data retention/lifecycle • Too many hard-to-resolve breaking
 changes • Little/no warnings of deprecations &
 breaking changes © Famartin Wikimedia Commons / CC-BY 2.5
  17. 26 Major version upgrades pain • Full cluster restart ==

    downtime • Client/server version compatibility (Java) • Data retention/lifecycle • Too many hard-to-resolve breaking
 changes • Little/no warnings of deprecations &
 breaking changes © Famartin Wikimedia Commons / CC-BY 2.5
  18. 27 Rolling upgrades • Upgrade from 5.latest to 6.latest without

    full cluster restart • Some caveats to be aware of: • There are still breaking changes! However many have options backported. Read & test! • All nodes must be 5.latest before rolling upgrade • Security+TLS
  19. 28 Major version upgrades pain • Full cluster restart ==

    downtime • Client/server version compatibility (Java) • Data retention/lifecycle • Too many hard-to-resolve breaking
 changes • Little/no warnings of deprecations &
 breaking changes © Famartin Wikimedia Commons / CC-BY 2.5
  20. 29 Java client • All other languages use REST interface

    • Transport client tied to Elasticsearch major version • Complicates firewalls & security
  21. 30 Java REST client • Released in 5.0 • JSON

    strings only • Resilient, but not user friendly
  22. 31 Java high level REST client • Released in 5.6

    • Works across major version upgrade • IDE friendly • Similar API to Transport Client • Based on low-level REST client • Supports CRUD & Search
  23. 32 Major version upgrades pain • Full cluster restart ==

    downtime • Client/server version compatibility (Java) • Data retention/lifecycle • Too many hard-to-resolve breaking
 changes • Little/no warnings of deprecations &
 breaking changes © Famartin Wikimedia Commons / CC-BY 2.5
  24. 33 Data Retention / Lifecycle: What are Your Options? Delete

    old data Reindex Run v1.2.0 and hope your security team doesn’t notice • Keeps cluster & servers tidy :) • Good where there’s a practical retention period :) • Still a period that you can’t upgrade :( • No in-between / rollups :( • Get the latest & greatest features :) • Can be time consuming depending on data volumes :( • No easy management :( • Path of least resistance? • Eventually, we stop backporting security fixes :( • You never get the latest & greatest :(
  25. 34 Cross-Cluster Search Multi-Year Architecture Cross Cluster Search Client 2017

    (ES 5.latest) 2018 (ES 6.X ?) 2019 (ES 7.X ?) Cross-Cluster Search Client