Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What's Evolving in Elasticsearch?

What's Evolving in Elasticsearch?

Elasticsearch team and tech leads give an overview of the changes already released in 5.x series, and a taste of the new features coming in 6.0.

Clinton Gormley l Elasticsearch Team Lead l Elastic
Simon Willnauer l Elasticsearch Tech Lead l Elastic

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

March 07, 2017
Tweet

Transcript

  1. Elastic 7 March 2017 @clintongormley & @s1m0nw What’s evolving in

    Elasticsearch Clinton Gormley & Simon Willnauer
  2. 2 • Faster • Friendlier • Smaller • Smarter •

    Safer 26 October 2016 Elasticsearch 5.0.0
  3. Throughput with one replica on two nodes, with auto-generated IDs

    Append-only indexing 3 0 8 15 23 30 v2.4.2 v5.2.1 master K docs/s
  4. What’s new in 5.x?

  5. Mappings

  6. Range Fields & Queries

  7. 7 What’s on at Elasticon tomorrow between 11am and 2pm?

  8. 8

  9. 9 Wednesday 11am - 2pm

  10. 10 Wednesday 11am - 2pm - INTERSECTS

  11. 11 Wednesday 11am - 2pm - CONTAINS

  12. 12 Wednesday 11am - 2pm - WITHIN

  13. Keyword Normalizers

  14. 14 { "city": { "type": "string", "index": "analyzed", "fields": {

    "city.keyword": { "type": "string", "index": "not_analyzed" } } } }
  15. 15 { "city": { "type": "text" "fields": { "city.keyword": {

    "type": "keyword" } } } }
  16. 16 { "city": { "type": "text" "fields": { "city.keyword": {

    "type": "keyword" } } } } Full text queries Full text analysis
  17. 17 { "city": { "type": "text" "fields": { "city.keyword": {

    "type": "keyword" } } } } Keyword queries Aggregations Sorting
  18. 18 { "city": { "type": "text" "fields": { "city.keyword": {

    "type": "keyword" } } } } No analysis
  19. 19 San Francisco SAN FRANCISCO san francisco San franciscO

  20. 20 San Francisco SAN FRANCISCO san francisco San franciscO san

    francisco Normalizer
  21. Search & Aggregations

  22. Multi-Word Synonyms

  23. 23 NY NYC New York New York City } Synonyms

  24. Phrase query: “NYC is OLD!”

  25. 25 Synonym Filter: (ny|nyc|new), (is|york), (old,city)

  26. 26 Synonym Filter: (ny|nyc|new), (is|york), (old,city)

  27. 27 Synonym Filter: (ny|nyc|new), (is|york), (old,city) ny is old nyc

    new york city Synonym Graph Filter:
  28. More Search Improvements

  29. 29 Query Optimizations • Smarter query caching • Faster geo,

    range, and nested queries • Unified highlighter • Field collapsing • Cancellable searches • Partitioned term aggs
  30. Operational Improvements

  31. 31 When your cluster is RED… /_cat/allocation /_cat/indices /_cat/nodes /_cat/recovery

    /_cat/shards /_cluster/health /_cluster/state /{index}/_shard_stores /_cluster/settings /_node/stats /{index}/_settings /_node
  32. 32 When your cluster is RED… /_cluster/allocation/explain

  33. 33 /_cluster/allocation/explain … "allocate_explanation" : "cannot allocate because allocation is

    not permitted to any of the nodes”, … { "decider" : "filter", "decision" : "NO", "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"non_existent_node\"]" } …
  34. 34 /_cluster/allocation/explain … "unassigned_info" : { "reason" : "NODE_LEFT", "at"

    : "2017-01-04T18:03:28.464Z", "details" : "node_left[OIWe8UhhThCK0V5XfmdrmQ]", "last_allocation_status" : "no_valid_shard_copy" }, "can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster" …
  35. 35 /_cluster/allocation/explain … "rebalance_explanation" : "cannot rebalance as no target

    node exists that can both allocate this shard and improve the cluster balance", "node_allocation_decisions" : [ { "node_id" : "oE3EGFc8QN-Tdi5FFEprIA", "node_name" : "node_t1", "transport_address" : "127.0.0.1:9401", "node_decision" : "worse_balance", "weight_ranking" : 1 } …
  36. Java REST Client

  37. 37 Java REST Client - behind the scenes • Came

    late to the party… • Isn’t nearly as extensive as the Transport Client • Should have been fixed years ago but hindsight is 20/20 • Maintaining a transport protocol based client causes a massive engineering overhead • It’s a “second” entry point into the system • Complicates distinguishing between clients and nodes
  38. 38 Java low-level HTTP client • Released in 5.0.0 •

    JSON strings only • Resilient, but not user friendly due to the lack of a higher level API
  39. 39 Java high-level HTTP client • IDE friendly • Similar

    API to Transport Client - easy migration • Based on low-level REST client • Support CRUD & Search • Previews in 5.5 • Depends on elasticsearch-core
  40. Tribe Node

  41. 41 How the Tribe Node works Cluster Sales Master Nodes

    Data Node Data Node Data Node Cluster R&D Master Nodes Data Node Data Node Data Node
  42. 42 How the Tribe Node works Cluster Sales Master Nodes

    Data Node Data Node Data Node Tribe Node Cluster R&D Master Nodes Data Node Data Node Data Node tribe: t1: cluster.name: sales t2: cluster.name: r_and_d
  43. 43 How the Tribe Node works Cluster Sales Master Nodes

    Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node
  44. 44 How the Tribe Node works Cluster Sales Master Nodes

    Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client
  45. 45 How the Tribe Node works Cluster Sales Master Nodes

    Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Cluster State Cluster State
  46. 46 How the Tribe Node works Cluster Sales Master Nodes

    Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Cluster State Cluster State
  47. 47 How the Tribe Node works Cluster Sales Master Nodes

    Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State
  48. 48 How the Tribe Node works Cluster Sales Master Nodes

    Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Kibana
  49. 49 Problems With How the Tribe Node works Cluster Sales

    Master Nodes Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Kibana
  50. 50 Problems With How the Tribe Node works Cluster Sales

    Master Nodes Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Kibana Static Configuration tribe: t1: cluster.name: sales t2: cluster.name: r_and_d
  51. 51 Problems With How the Tribe Node works Cluster Sales

    Master Nodes Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Connections to All Nodes
  52. 52 Problems With How the Tribe Node works Cluster Sales

    Master Nodes Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Frequent cluster state updates
  53. 53 Problems With How the Tribe Node works Cluster Sales

    Master Nodes Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Index names must be unique
  54. 54 Problems With How the Tribe Node works Cluster Sales

    Master Nodes Data Node Data Node Data Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Tribe Node Kibana No master node No index creation
  55. 55 Problems With How the Tribe Node works Cluster Sales

    Master Nodes Data Node Data Node Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Kibana Reduce results from many shards
  56. The Tribe Node is Dead

  57. Long Live Cross-Cluster Search!

  58. Minimal viable solution to supersede tribe 58

  59. Reduces the problem domain to query execution 59

  60. Cluster related information is reduced to a namespace 60

  61. 61 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Cluster R&D Master Nodes Data Node Data Node Data Node
  62. 62 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Any node can perform cross-cluster search
  63. 63 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Optional dedicated cross-cluster search cluster Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node
  64. 64 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node PUT _cluster/settings { "transient": { "search.remote": { "sales.seeds": "10.0.0.1:9300", “r_and_d.seeds”: "10.1.0.1:9300" } } } Dynamic settings Optional dedicated cross-cluster search cluster
  65. 65 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node No cluster state updates Optional dedicated cross-cluster search cluster
  66. 66 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Optional dedicated cross-cluster search cluster
  67. 67 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Can create indices Optional dedicated cross-cluster search cluster
  68. 68 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Optional dedicated cross-cluster search cluster
  69. 69 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Few lightweight connections Optional dedicated cross-cluster search cluster
  70. 70 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Index namespacing GET sales:*,r_and_d:logs*/_search { "query": { … } } Optional dedicated cross-cluster search cluster
  71. 71 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana With many shards Batched Reduce Phase Optional dedicated cross-cluster search cluster
  72. Cross-Cluster Search v5.3.0

  73. Batched Reduce Phase v5.4.0

  74. v6 and beyond

  75. Doc Values v2.x

  76. 76 Doc Values • Columnar store • Fast access to

    a field’s value for many documents. • Used for aggregations, sorting, scripting, and some queries • Written to disk at index time. • Cached in the file-system cache
  77. 77 Doc Values - Dense Values Segment 1 Docs Field

    1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 1 Field 2 1 Four D
  78. 78 Doc Values - Dense Values Segment 1 Docs Field

    1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 1 Field 2 1 Four D Merged Segment 3 Docs Field 1 Field 2 1 One A 2 Two B 3 Three C 4 Four D
  79. 79 Doc Values - Sparse Values Segment 1 Docs Field

    1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 3 Field 4 Field 5 1 Foo Bar Baz
  80. 80 Doc Values - Sparse Values Segment 1 Docs Field

    1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 3 Field 4 Field 5 1 Foo Bar Baz Merged Segment 3 Docs Field 1 Field 2 Field 3 Field 4 Field 5 1 One A Null Null Null 2 Two B Null Null Null 3 Three C Null Null Null 4 Null Null Foo Bar Baz
  81. Sparse Doc Values Lucene 7

  82. Index Sorting Lucene 7

  83. 83 Index sorting • Sort index by e.g. weight, recency,

    or popularity • Ultra-fast search - can terminate once enough hits found
  84. 84 Index sorting • Sort index by e.g. weight, recency,

    or popularity • Ultra-fast search - can terminate once enough hits found • Even helps with total count and aggregations • Sort index by low cardinality terms - faster search • Better sparse index compression • Slower indexing, good for static indices
  85. Sequence Numbers v6.0.0

  86. 86 Sequence Numbers • Internal Feature • Every operation gets

    a sequence number • In 6.0: Fast replica recovery on active indices • Lays groundwork for: • Primary-Replica syncing when Primary fails • Cross Data-Centre Recovery • Changes API
  87. Upgrading

  88. Rolling Upgrades v6.0.0

  89. 89 Rolling Upgrades • Upgrade from 5.latest to 6.latest, without

    a full cluster restart • Why now and not earlier? • Testing needs to be ready • The team and the code must be ready • Growing user-base and faster release cycles required less painful upgrades
  90. 90 Rolling Upgrades • What is 5.latest? • It’s the

    latest release of 5.x that is GA once 6.0.0 goes GA • All 6.x releases will allow upgrading from that 5.x release • There might be subsequent 5.x releases that are also eligible for upgrades to 6.x
  91. 91 Rolling Upgrades • Caveats: • If using security, must

    have TLS enabled • Reserve the right to require full cluster restart in the future - but only if absolutely necessary • All nodes must be upgraded to 5.latest in order to upgrade • Indices created in 2.x still need to be reindexed before upgrading to 6.x
  92. None
  93. Cross Major Version Search v6.0.0

  94. 94 Cross Major Version Search v5.2.0 Kibana Master Nodes Data

    Node Data Node
  95. 95 Cross Major Version Search v5.2.0 Kibana v6.0.0 Master Nodes

    Data Node Data Node Master Nodes Data Node Data Node
  96. 96 Cross Major Version Search v5.2.0 Master Nodes Data Node

    Data Node v6.0.0 v5.latest Kibana Master Nodes Data Node Data Node
  97. 97 Cross Major Version Search v5.2.0 Master Nodes Data Node

    Data Node v6.0.0 Kibana Master Nodes Data Node Cross Cluster Client v5.latest
  98. Questions?

  99. Other Talks You Should See 99 • “Get the Lay

    of the Lucene Land” 
 Adrien Grand - Wednesday • “Consensus and Replication in Elasticsearch” 
 Boaz Leskes, Jason Tedor, and Yannick Welsch - Wednesday • “Elasticsearch Search Improvements”
 Jim Firenczi, Lee Hinman, Nick Knize - Thursday • “Secure, Fast, and Painless”
 Nik Everett - Thursday
  100. 100 More Questions? Visit us at the AMA

  101. www.elastic.co

  102. Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/

    Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 102 Please attribute Elastic with a link to elastic.co