Save 37% off PRO during our Black Friday Sale! »

Elasticsearch Deep Dive - Elastic{ON} Tour Seoul 2017

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
December 12, 2017

Elasticsearch Deep Dive - Elastic{ON} Tour Seoul 2017

Cross-cluster search, ingest node, rollover API, shrink API, field collapsing, unified highlighter . . . there's lots to love in Elasticsearch these days. Get up to speed on 5.x and see how 6.x will address pain points around scale, upgrading, recovery, and sparse data and disk usage.

오늘날 Elasticsearch를 통해 Cross-cluster search, Ingest node, Rollover API, Shrink API, Field collapsing, Unified highlighter 등 많은 것들을 하실 수 있습니다. 5.x에 대해 살펴보고 6.x가 확장성, 업그레이드, 복구, 데이터 밀집도, 디스크 사용 등을 어떻게 개선시켰는지 확인해보세요.

Jongmin Kim | Developer Evangelist | Elastic

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

December 12, 2017
Tweet

Transcript

  1. 1 ӣઙ޹ Jongmin Kim Developer Evangelist @Elastic Elasticsearch җѢ, അ੤

    ӒܻҊ ޷ې
  2. 2 Elasticsearch 5.0 2016֙ 10ਘ

  3. 3 Better at Numbers ं੗ ؘ੉ఠ੄ ߊ੹ Safe উ੿ࢿ Simple

    Things
 Should Be Simple ए਍ ࢎਊ Elasticsearch 5.0
  4. 4 Great for Metrics ं੗ ؘ੉ఠ੄ ѐࢶ • Faster to

    index • Faster to search • Smaller on disk • Less heap • IPv6
  5. 5 Keep Calm and Index On • Bootstrap checks •

    Fully sandboxed scripting (Painless) • Strict settings • Soft limits • All-new circuit breakers
  6. 6 ‘Time-series’ not ‘time consuming’ • Ingest node • Rollover

    API • Shrink API
  7. 7 Elasticsearch 5.x ҅ࣘغח ӝמ ഛ੢

  8. 8 Elasticsearch 5.x Still ^ • Keyword normalization • Unified

    highlighter • Field collapse • Multi-word synonyms+proximity • Cancellable searches • Parallel scroll & reindex
  9. 9 Elasticsearch 5.x Still ^ • Numeric & date range

    fields • Automatic optimizations for range searches • Massive aggregations with partitioning • Faster geo-distance sorting • Faster geo-ip lookups and for logs and for numbers and for geo and ... ^
  10. 10 Where to next? ੉ઁ Ӓ ׮਺਷?

  11. 11 What are the pain points? ࠗ઒ೠ ࠗ࠙਷ ҅ࣘ೧ࢲ ѐࢶਸ

    ೧ ৳णפ׮. Ӓۢীب ࠛҳೞҊ ই૒ب ࡧই೑ ٜࠗ࠙੉ ੓णפ׮.
  12. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 12 What

    are the pain points? • Ever increasing scale • Major version upgrades • Slow recovery • Sparse data and disk usage
  13. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 13 What

    are the pain points? • Ever increasing scale • Major version upgrades • Slow recovery • Sparse data and disk usage
  14. 14 Ever increasing scale • More clusters, not bigger clusters

    • Easier to manage • Easier to upgrade • Reduce potential outages • Need to query across clusters
  15. 15 Tribe Node Yesterday’s solution җѢ੄ ਬޛ...

  16. 16 Cluster Sales Master Nodes Data Node Data Node Data

    Node Cluster R&D Master Nodes Data Node Data Node Data Node How the Tribe Node Works
  17. 17 Cluster Sales Master Nodes Data Node Data Node Data

    Node Tribe Node Cluster R&D Master Nodes Data Node Data Node Data Node tribe: t1: cluster.name: sales t2: cluster.name: r_and_d How the Tribe Node Works
  18. 18 Cluster Sales Master Nodes Data Node Data Node Data

    Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node How the Tribe Node Works
  19. 19 Cluster Sales Master Nodes Data Node Data Node Data

    Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client How the Tribe Node Works
  20. Cluster Sales Master Nodes Data Node Data Node Data Node

    Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client 20 Cluster State Cluster State How the Tribe Node Works
  21. Cluster Sales Master Nodes Data Node Data Node Data Node

    Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client 21 Cluster State Cluster State How the Tribe Node Works
  22. Cluster Sales Master Nodes Data Node Data Node Data Node

    Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client 22 Merged Cluster State How the Tribe Node Works
  23. 23 Kibana Cluster Sales Master Nodes Data Node Data Node

    Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State How the Tribe Node Works
  24. Cluster Sales Master Nodes Data Node Data Node Data Node

    Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client 24 Problems With How the Tribe Node Works Merged Cluster State Kibana
  25. 25 Cluster Sales Master Nodes Data Node Data Node Data

    Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Kibana Static Configuration tribe: t1: cluster.name: sales t2: cluster.name: r_and_d Problems With How the Tribe Node Works
  26. 26 Cluster Sales Master Nodes Data Node Data Node Data

    Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Connections to All Nodes Problems With How the Tribe Node Works
  27. 27 Cluster Sales Master Nodes Data Node Data Node Data

    Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Frequent cluster state updates Problems With How the Tribe Node Works
  28. 28 Cluster Sales Master Nodes Data Node Data Node Data

    Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Index names must be unique Problems With How the Tribe Node Works
  29. 29 Cluster Sales Master Nodes Data Node Data Node Data

    Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Tribe Node Kibana No master node No index creation Problems With How the Tribe Node Works
  30. 30 Cluster Sales Master Nodes Data Node Data Node Data

    Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Kibana Reduce results from many shards Problems With How the Tribe Node Works
  31. 31 Tribe is going away Tribe Node ח ځաоҊ

  32. 32 Welcome to Cross-Cluster Search Cross Cluster Searchо ৳णפ׮.

  33. 33 Cross-Cluster Search - ௿۞झఠр Ѩ࢝ • Minimum viable solution

    to supersede tribe • Reduces the problem domain to query execution • Cluster information is reduced to a namespace
  34. 34 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Cluster R&D Master Nodes Data Node Data Node Data Node
  35. 35 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Any node can perform cross-cluster search
  36. 36 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Optional dedicated cross-cluster search cluster Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node
  37. 37 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node PUT _cluster/settings { "transient": { "search.remote": { "sales.seeds": "10.0.0.1:9300", “r_and_d.seeds”: "10.1.0.1:9300" } } } Dynamic settings Optional dedicated cross-cluster search cluster
  38. 38 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node No cluster state updates Optional dedicated cross-cluster search cluster
  39. 39 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Optional dedicated cross-cluster search cluster
  40. 40 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Can create indices Optional dedicated cross-cluster search cluster
  41. 41 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Optional dedicated cross-cluster search cluster
  42. 42 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Few lightweight connections Optional dedicated cross-cluster search cluster
  43. 43 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Index namespacing GET sales:*,r_and_d:logs*/_search { "query": { … } } Optional dedicated cross-cluster search cluster
  44. 44 How Cross-Cluster search works Cluster Sales Master Nodes Data

    Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana With many shards Batched Reduce Phase Optional dedicated cross-cluster search cluster
  45. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 45 What

    are the pain points? • Ever increasing scale • Major version upgrades • Slow recovery • Sparse data and disk usage
  46. Elasticsearch 6.0

  47. 47 Major version upgrades • Upgrade Lucene • Add new

    features • Streamline existing features • Breaking changes • Remove backwards compatibility cruft • Keep codebase maintainable © Famartin Wikimedia Commons / CC-BY 2.5
  48. 48 Major version upgrade pain • Too many changes at

    once • Full cluster restart • Upgrade Java client at same time as Elasticsearch cluster • Data from major_version - 2 no longer readable
  49. 49 Too many changes at once • Most features backported

    to 5.x • Deprecation logging • Migration assistance API (X-Pack)
  50. 50 Full Cluster Restart © Paul Cross / CC-BY 2.5

  51. 51 Rolling upgrades - ௿۞झఠ ׮਍ হ੉ সӒۨ੉٘! • Upgrade

    from 5.latest to 6.latest without full cluster restart • 5.latest is the latest GA release of 5.x when 6.0.0 goes GA • All 6.x releases will allow upgrading from that 5.x release, unless there is a new 5.x release
  52. 52 Rolling upgrade caveats - ઱੄ࢎ೦! • If using security,

    must have TLS enabled • Reserve the right to require full cluster restart in the future, but only if absolutely necessary • All nodes must be upgraded to 5.latest before upgrading • Indices created in 2.x still need to be reindexed before upgrading to 6.x
  53. 53 Data compatibility - ؘ੉ఠ ഐജࢿ • Any index created

    in 5.x can be upgraded to 6.x • Any index created in 2.x must be reindexed in 5.x or imported with reindex-from-remote • How do you reindex a petabyte of data?
  54. 54 Cross Major Version Search v5.2.0 Kibana Master Nodes Data

    Node Data Node
  55. 55 Cross Major Version Search v5.2.0 Kibana v6.0.0 Master Nodes

    Data Node Data Node Master Nodes Data Node Data Node
  56. 56 Cross Major Version Search v5.2.0 Master Nodes Data Node

    Data Node v6.0.0 v5.latest Kibana Master Nodes Data Node Data Node
  57. 57 Cross Major Version Search v5.2.0 Master Nodes Data Node

    Data Node v6.0.0 Kibana Master Nodes Data Node Cross Cluster Client v5.latest
  58. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 58 What

    are the pain points? • Ever increasing scale • Major version upgrades • Slow recovery • Sparse data and disk usage
  59. How is data stored? In memory buffer Transaction log Lucene

    segments
  60. How is data stored? In memory buffer Transaction log Lucene

    segments 1 1
  61. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 1 2
  62. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 1 2 3
  63. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 1 2 3 REFRESH
  64. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 1 2 3
  65. How is data stored? In memory buffer Transaction log Lucene

    segments 4 5 6 7 1 2 3 4 5 6 7 1 2 3
  66. How is data stored? In memory buffer Transaction log Lucene

    segments 4 5 6 7 1 2 3 4 5 6 7 1 2 3 REFRESH
  67. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 4 5 6 7 1 2 3 4 5 6 7
  68. How is data stored? In memory buffer Transaction log Lucene

    segments 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7
  69. How is data stored? In memory buffer Transaction log Lucene

    segments 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 FLUSH
  70. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
  71. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
  72. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 4 5 6 7 8 9
  73. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 4 5 6 7 8 9 MERGE
  74. How is data stored? In memory buffer Transaction log Lucene

    segments 1 2 3 4 5 6 7 8 9 1 2 3 8 9
  75. How is data stored? In memory buffer Transaction log Lucene

    segments 4 5 6 7 1 2 3 8 9 1 2 3 8 9
  76. Data replication Client Primary shard Replica shard

  77. Data replication Client Primary shard Replica shard

  78. Data replication Client Primary shard Replica shard 1 2

  79. Data replication Client Primary shard Replica shard 1 2

  80. Data replication Client Primary shard Replica shard 1 2

  81. Data replication Client Primary shard Replica shard 1 2

  82. Data replication Client Primary shard Replica shard 1

  83. Data replication Lucene segments 4 5 6 7 1 2

    3 8 9 Primary Lucene segments 1 2 4 7 9 3 5 6 8 Replica
  84. Replica recovery Lucene segments 4 5 6 7 1 2

    3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8
  85. Replica recovery Lucene segments 4 5 6 7 1 2

    3 8 9 Primary 4 5 6 7 1 2 3 8 9 Lucene segments Replica 1 2 4 7 9 3 5 6 8
  86. Replica recovery Lucene segments 4 5 6 7 1 2

    3 8 9 Primary 4 5 6 7 1 2 3 8 9 Lucene segments Replica
  87. Data at rest Lucene segments 4 5 6 7 1

    2 3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8 SYNCED FLUSH
  88. Data at rest Lucene segments 4 5 6 7 1

    2 3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8
  89. Data at rest Lucene segments 4 5 6 7 1

    2 3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8
  90. Data at rest Lucene segments 4 5 6 7 1

    2 3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8
  91. Active indexing Lucene segments 4 5 6 7 1 2

    3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8
  92. Active indexing Lucene segments 4 5 6 7 1 2

    3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8
  93. Active indexing Lucene segments 4 5 6 7 1 2

    3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8 10 11
  94. Active indexing Lucene segments 1 2 3 4 5 6

    7 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8 10 11
  95. Active indexing Lucene segments 1 2 3 4 5 6

    7 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8 10 11 1 2 3 4 5 6 7 8 9 10 11
  96. Active indexing Lucene segments 1 2 3 4 5 6

    7 8 9 Primary Lucene segments Replica 10 11 1 2 3 4 5 6 7 8 9 10 11
  97. Sequence numbers Primary Replica

  98. Sequence numbers Transaction log 1 2 3 Primary Transaction log

    Replica 1 2 3
  99. Sequence numbers Transaction log 1 2 3 Primary Transaction log

    Replica 1 2 3
  100. 1 2 3 Sequence numbers Transaction log 1 2 3

    4 5 Primary Transaction log Replica
  101. Sequence numbers Transaction log 1 2 3 4 5 Primary

    Transaction log Replica 1 2 3
  102. Sequence numbers Transaction log 1 2 3 4 5 6

    7 8 9 Primary Transaction log Replica 1 2 3 4 5 7 8
  103. Trimming the transaction log Transaction log 1 2 3 4

    5 6 7 8 9 Primary Transaction log Replica 1 2 3 4 5 7 8
  104. Trimming the transaction log Transaction log Primary Transaction log Replica

    5 6 7 8 9 5 7 8
  105. 105 Slow recovery • 6.0: ‒ Fast replica recovery ‒

    Configurable transaction log retention period • Lays groundwork for: ‒ Replica syncing after primary failure ‒ Cross-data-centre recovery
  106. © Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 106 What

    are the pain points? • Ever increasing scale • Major version upgrades • Slow recovery • Sparse data and disk usage
  107. 107 Sparse data and disk usage • Doc Values: Columnar

    store • Fast access to a field’s value for many documents • Used for aggregations, sorting, scripting, and some queries • Written to disk at index time • Cached in the file-system cache © Tony Weman / CC-BY 2.5
  108. 108 Doc values - Dense data Segment 2 Docs Field

    1 Field 2 1 Four D Segment 1 Docs Field 1 Field 2 1 One A 2 Two B 3 Three C
  109. 109 Doc values - Dense data Merged Segment 3 Docs

    Field 1 Field 2 1 One A 2 Two B 3 Three C 4 Four D Segment 1 Docs Field 1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 1 Field 2 1 Four D
  110. 110 Doc values - Sparse data Segment 1 Docs Field

    1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 3 Field 4 Field 5 1 Foo Null Null 2 Null Bar Null 3 Null Null Baz
  111. 111 Doc values - Sparse data Segment 1 Docs Field

    1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 3 Field 4 Field 5 1 Foo Null Null 2 Null Bar Null 3 Null Null Baz Merged Segment 3 Docs Field 1 Field 2 Field 3 Field 4 Field 5 1 One A Null Null Null 2 Two B Null Null Null 3 Three C Null Null Null 4 Null Null Foo Null Null 5 Null Null Null Bar Null 6 Null Null Null Null Baz
  112. 112 Doc values - Sparse data Segment 1 Docs Field

    1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 3 Field 4 Field 5 1 Foo 2 Bar 3 Baz Merged Segment 3 Docs Field 1 Field 2 Field 3 Field 4 Field 5 1 One A 2 Two B 3 Three C 4 Foo 5 Bar 6 Baz
  113. 113 Sparse doc value support • In 6.0 • Big

    disk savings for sparse values - pay for what you use • Big file cache savings - 
 more data can be cached • Dense queries still more efficient 
 than sparse © Tony Weman / CC-BY 2.5
  114. Elasticsearch 6.0 সӒۨ੉٘ ೞࣁਃ!!

  115. 6.0 Demo

  116. 116 Scenario 1 - Rolling Upgrade Monitoring Cluster localhost:5601 (6.0.0)

    es-demo es-5-1 (5.6.4) es-5-3 (5.6.4) es-5-2 (5.6.4) localhost:5602 (5.6.4) monitoring (6.0.0)
  117. 117 Scenario 1 - Rolling Upgrade Monitoring Cluster localhost:5601 (6.0.0)

    es-demo es-5-1 (5.6.4) es-5-2 (5.6.4) monitoring (6.0.0) localhost:5602 (5.6.4)
  118. 118 Scenario 1 - Rolling Upgrade Monitoring Cluster localhost:5601 (6.0.0)

    es-demo es-5-1 (5.6.4) es-6-3 (6.0.0) es-5-2 (5.6.4) localhost:5602 (5.6.4) monitoring (6.0.0)
  119. 119 Scenario 1 - Rolling Upgrade Monitoring Cluster localhost:5601 (6.0.0)

    es-demo es-5-1 (5.6.4) es-6-3 (6.0.0) localhost:5602 (5.6.4) monitoring (6.0.0)
  120. 120 Scenario 1 - Rolling Upgrade Monitoring Cluster localhost:5601 (6.0.0)

    es-demo es-5-1 (5.6.4) es-6-3 (6.0.0) es-6-2 (6.0.0) localhost:5602 (5.6.4) monitoring (6.0.0)
  121. 121 Scenario 1 - Rolling Upgrade Monitoring Cluster localhost:5601 (6.0.0)

    es-demo es-6-3 (6.0.0) es-6-2 (6.0.0) monitoring (6.0.0)
  122. 122 Scenario 1 - Rolling Upgrade Monitoring Cluster localhost:5601 (6.0.0)

    es-demo es-6-1 (6.0.0) es-6-3 (6.0.0) es-6-2 (6.0.0) localhost:5603 (6.0.0) monitoring (6.0.0)
  123. Scenario 2 - Sparse Doc Values es-demo-5 es-5-1 (5.6.4) es-5-3

    (5.6.4) es-5-2 (5.6.4) localhost:5602 (5.6.4) es-demo-6 es-6-1 (6.0.0) es-6-3 (6.0.0) es-6-2 (6.0.0) localhost:5603 (6.0.0) _reindex 123 Monitoring Cluster monitoring (6.0.0) localhost:5601 (6.0.0)