Scaling Elasticsearch in production + Shield

Scaling Elasticsearch in Production  + Shield Antonio Bonuccelli Support Engineer
21/05/2015

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written
permission is strictly prohibited 2 Intro • Antonio Bonuccelli – Support engineer – Joined september 2014 – Interested in all things security • Elastic – Founded in 2012 – Distributed company – Mission: getting immediate actionable insight from data – Open-source: Elasticsearch,Logstash,Kibana – Commercial: Marvel,Shield,Watcher – More to come

permission is strictly prohibited 3 Intro • Antonio Bonuccelli – Support engineer – Joined september 2014 – Interested in all things security • Elastic – Founded in 2012 – Distributed company – Mission: getting immediate actionable insight from data – Open-source: Elasticsearch,Logstash,Kibana – Commercial: Marvel,Shield,Watcher – More to come

permission is strictly prohibited 4 How the story begins

permission is strictly prohibited 5 How the story begins • Just download the latest version and install it on the first available server • Don’t change any defaults (good!) • Install logstash, parse/process/enrich some data and send it to ES • Let your application talk to ES  =>Start launching indexing and search requests at your single node cluster • Install kibana for visualising data  =>Run even more search operations

permission is strictly prohibited 6 How the story begins • Just download the latest version and install it on the first available server • Don’t change any defaults (good!) • Install logstash, parse/process/enrich some data and send it to ES • Let your application talk to ES • =>Start throwing indexing and search operations • Install kibana and make management or ops happy • =>Throw more search operations

permission is strictly prohibited 10 What happens next

permission is strictly prohibited 11 What happens next

permission is strictly prohibited 12 What happens next • # of indexes and/or data grows • # of queries per second grows • each index comes with a cost (disk space,file handles, memory)

permission is strictly prohibited 13 (Index Cost)

permission is strictly prohibited 14 Index cost - Fielddata • Field data is lazily loaded (by default) in memory once queries with sorting, aggregation, certain filters like geo-location are run • All documents are loaded, not just the ones matching your query, even other types. • Field data cache is not a transient cache. • Expensive to load, so loading is done once. Evictions are very costly performance wise • indices.fielddata.cache.size unbounded by default • Fielddata is usually the major offender for memory consumption in an elasticsearch cluster.

permission is strictly prohibited 15 Index cost - Fielddata • Monitoring fielddata • Per node:  GET /_nodes/stats/indices/fielddata?fields=* • Per index:  GET /_stats/fielddata?fields=* • Per node/index:  GET /_nodes/stats/indices/fielddata? level=indices&fields=*

permission is strictly prohibited 16 Index cost - Fielddata GET _nodes/stats/indices/fielddata? fields=*&pretty” -s | egrep dst -A1 "dst_ip" : { "memory_size_in_bytes" : 876044 "dst_port" : { "memory_size_in_bytes" : 2981452 } GET _nodes/stats/indices/fielddata?pretty" -s|egrep 'name|memory' "cluster_name" : "tony_prod_sec", "name" : "node1", "memory_size_in_bytes" : 36398680,

permission is strictly prohibited 17 Index cost - Fielddata Marvel view

permission is strictly prohibited 18 Index cost - Fielddata - what can I do? • Use doc-values instead (will be default from 2.0) • Old data will not be evicted by default unless new data needs to be loaded (LRU), keep only most/new accessed data:  indices.fielddata.cache.size: 40% (unbounded default) • Circuit breaker will estimate size of fielddata from fields in query before actually loading it into memory   indices.breaker.fielddata.limit (60% default) • indices.breaker.fielddata.limit > indices.fielddata.cache.size • Add more nodes (memory)

permission is strictly prohibited 19 Index cost - Filter Cache • Almost all filters are cached into memory • Caching can be disabled on a per filter basis • Filters don’t score documents – they simply include or exclude. Done through Bitsets => arrays with 1 and 0 that tells Elasticsearch whether a document matches (or not) • indices.cache.filter.size, which defaults to 10% (LRU) • Watch out for constant filter evictions

permission is strictly prohibited 20 Index cost - Filter Cache Marvel view

permission is strictly prohibited 21 Index cost - Filter Cache - What can I do? • Choose carefully what you want to cache  { "range" : {   "timestamp" : {   ”gt" : "2014-01-02 16:15:14"  },  "_cache": false }  } • Tune indices.cache.filter.size • Add more nodes/memory

permission is strictly prohibited 22 Index cost - Segments • Index is made of shards • Shards are made of segments • Segments do use file handles (and some memory) • In general more shards, means more resources in use  GET_nodes/stats?pretty" -s |egrep -A8 segments "segments" : { "count" : 367, "memory_in_bytes" : 47295014, "index_writer_memory_in_bytes" : 164560, "index_writer_max_memory_in_bytes" : 446587695, "version_map_memory_in_bytes" : 720, "fixed_bit_set_memory_in_bytes" : 145080 },

permission is strictly prohibited 23 Index cost - takeaways • Do you need all your indexes open and searchable? => (use curator to manage retention and more) • These considerations applies both to one node, or 75 nodes cluster. • The suggestions provided optimise resources with what you have but will not necessarily scale with your data growth if hardware stays the same. • You might still need to add nodes/memory

permission is strictly prohibited 24 (back to) What happens next

permission is strictly prohibited 25 What happens next • # of indexes and/or data grows • # of queries per second grows • each index comes with a cost (disk space,file handles, memory) • so do queries and can have an impact if poorly written • as the load grows you eventually run into performance degradation first • then

permission is strictly prohibited 26 What happens next • # of indexes and/or data grows • # of queries per second grows • each index comes with a cost (disk space,file handles, memory) • so do queries and can have an impact if poorly written • as the load grows you eventually run into performance degradation first • then

permission is strictly prohibited 27 What happens next • Data node node-23 running long old-GC    [2015-03-05 20:14:48,199][WARN ][monitor.jvm ] [node-23] [gc][old][2423316][176712] duration [45.8s], collections [1]/[47.3s], total [20.8s]/[10.1h], memory [28.3gb]->[26.5gb]/[29.8gb], all_pools {[young] [70.6mb]->[188.1mb]/[865.3mb]}{[survivor] [53.6mb]->[0b]/[108.1mb]}{[old] [28.2gb]->[26.3gb]/ [28.9gb]}

permission is strictly prohibited 28 What happens next • Master node master-node will attempt reconnecting 3 times x 30 seconds an unresponsive node, then will kick it out     [2015-03-05 20:13:59,199][INFO ][cluster.service ] [master-node] removed {[node-23][- PkRFxhrRsAyyrZMscCMHw][node-23][inet[/ 192.168.0.23:9300]]{master=true},}, reason: zen- disco-master_failed ([node-23][- PkRFxhrRsAyyrZMscCMHw][node-23][inet[/ 192.168.0.23:9300]]{master=true})

permission is strictly prohibited 29 What happens next • And what if the currently elected master node is also serving queries and it’s the one experiencing long old GC?

permission is strictly prohibited 33 Architecting to scale

permission is strictly prohibited 34 Architecting to scale • You’ve reached the limit with your single-node deployment • You’re looking to add more nodes as you want to allow more data to be indexed and • What and how to do?

permission is strictly prohibited 35 Architecting to scale ES node Node

permission is strictly prohibited Cluster 36 Architecting to scale ES node Indexing Querying Cluster Master

permission is strictly prohibited 37 Architecting to scale ES node Your node - Functions Indexing Querying Cluster Master node.data:true comes for free node.master:true

permission is strictly prohibited 38 Architecting to scale • Very common to start with one node performing all the functions • Each function maps to a role • Role separation allow for horizontal scale and growth

permission is strictly prohibited 39 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node

permission is strictly prohibited 40 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node node.master:true node.data:false node.master:false node.data:true node.master:false node.data:false

permission is strictly prohibited 41 Architecting to scale - Definitions • Cluster state: “knowledge bundle” -> index mappings, routing table, shard location…. • Role definitions:  - (elected) Master node: coordinates the cluster, only node able to apply changes to cluster state, publishes updated cluster state to all nodes.  - Data node: performs indexing, can allocate shards locally, knows cluster state.  - Client node: does not perform indexing or allocate shards locally, knows cluster state.

permission is strictly prohibited 42 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node

permission is strictly prohibited 43 Architecting to scale • Separating the roles brings great benefits:  - Currently elected master is no longer subject to high memory consumption => long old GC => freezing up  - Data nodes can greatly scale horizontally: reads, writes, total cluster heap size  - Memory used for running expensive queries with sorting/aggregations is offloaded from the data nodes into client node 

permission is strictly prohibited Cluster 44 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node

permission is strictly prohibited Cluster 45 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Looks good?

permission is strictly prohibited 46 Architecting to scale ES node Cluster Master Indexing Querying Dedicated  Cluster Master Roles Data  node Client  node Data  node Data  node

permission is strictly prohibited Cluster 49 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node ??? ??? ???

permission is strictly prohibited Cluster 50 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node ??? ??? ??? No Masters available - Do nothing

permission is strictly prohibited Cluster 51 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node ??? ??? ??? No Masters available - Do nothing Depending on discovery.zen.no_master_block

permission is strictly prohibited Cluster 52 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node ??? ??? ??? No Masters available - Do nothing Depending on discovery.zen.no_master_block

permission is strictly prohibited Cluster 53 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node ??? ??? ??? No Masters available - Do nothing Feeling lonely tonight Depending on discovery.zen.no_master_block

permission is strictly prohibited 54 Architecting to scale • We have a dedicated master doing nothing else than coordinating the cluster, great! • However we have only one now, hence we have introduced a single point of failure • Need high-availability

permission is strictly prohibited 56 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster

permission is strictly prohibited 57 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master

permission is strictly prohibited 59 Architecting to scale • Great we have now high availability • However we have now potential for having a cluster “split-brain”

permission is strictly prohibited 60 Architecting to scale • Great we have now high availability • However we have now potential for having a cluster “split-brain”

permission is strictly prohibited 61 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 1 (default) • all nodes see at least one master -> split brain

permission is strictly prohibited 62 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 1 (default) • all nodes see at least one master -> split brain

permission is strictly prohibited 63 Architecting to scale • split brain will leave you with 2 different clusters most likely contating different data sets • discovery.zen.minimum_master_nodes -> sets the minimum number of master eligible nodes a node should "see" in order to win a master election. It must be set to a quorum of your master eligible nodes -> N/2 + 1 • so what if we set discovery.zen.minimum_master_nodes to N/2+1?   => 2/2 + 1 = 2

permission is strictly prohibited 64 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Quorum not reached - do nothing

permission is strictly prohibited 65 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Quorum not reached - do nothing

permission is strictly prohibited 66 Architecting to scale • It is best to have at least 3 master eligible nodes • Protection against split-brain and cluster inoperability seen in scenario with 2 masters only • 3 dedicated master allow growing up to 50+ data nodes • You *can* run small clusters using generic purpose nodes (master/data/client) • But you will be exposed to problems should the currently elected master hit high long old GC

permission is strictly prohibited 68 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master

permission is strictly prohibited 69 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Cluster Master Quorum not reached - do nothing

permission is strictly prohibited 70 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Cluster Master Quorum not reached - do nothing

permission is strictly prohibited 71 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Cluster Master Quorum not reached - do nothing Replicate data

permission is strictly prohibited 72 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data  node Client  node Data  node Data  node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Cluster Master Quorum not reached - do nothing Replicate data

permission is strictly prohibited 73 Architecting to scale • What about Kibana? • It is a HTTP Rest client!

permission is strictly prohibited 74 Architecting to scale Cluster Master Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master

permission is strictly prohibited 75 Architecting to scale Cluster Master Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master Kibana

permission is strictly prohibited 76 Architecting to scale Cluster Master Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master Kibana

permission is strictly prohibited 77 Architecting to scale Cluster Master Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master Kibana - will act as coordinating node for each search - will perform intense in- memory sorting/aggregation - voiding the benefit of having dedicated client node

permission is strictly prohibited 78 Architecting to scale Cluster Master Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master Kibana - Will loadbalance search request across all nodes

permission is strictly prohibited 79 Sharding

permission is strictly prohibited 80 Sharding • There is no magic formula to calculate • Implicitly referring to *primary* shards within *one* index in the next slides • There are considerations to make:  - A single shard is a full stand-alone operable Lucene index  - Two or more shards sitting on the same host will share the hardware resources among themselves  - Remember “Index Cost -> Segments” 

permission is strictly prohibited 81 Sharding • More considerations:  - shard do have a cost  - if you have a 1 data nodes cluster and create a new index, 10 primaries will probably be excessive.  - IF you have 10 servers, then 10 primaries is fine and each of these will be using dedicated hw and effectively mean more concurrent operations and higher throughput  - Thread pools are tied to number of cpu cores not the number of shards (e.g. search => 3x # of cores)  - Focus on primary shards/nodes ratio instead;

permission is strictly prohibited 82 Sharding • Some tests using:  -i7 3630QM CPU (2.4 GHz)  -16GB RAM  -Windows 8 64b (Though try to use Linux please!)  -SSD for the OS, HDD for elasticsearch  • Source: http://blog.trifork.com/2014/01/07/ elasticsearch-how-many-shards/ 

permission is strictly prohibited 83 Sharding • Test - 1 host: Indexing speed by # of shards/docs 

permission is strictly prohibited 84 Sharding • Test - 1 host: Query speed by # of shards/docs/users 

permission is strictly prohibited 85 Sharding • So is the rule to use one primary shard per server? • No • However it is a good way to start the sizing • At the end of the day it depends heavily on how much shards can a single server fully handle without degrading performance because of excessive sharding overhead.

permission is strictly prohibited 86 Sharding • You have a super powerful server with super fast SSD? try more than one shard per node then. • Another parameter to take into account, in case of shard relocation, how long would it take to transfer one shard from one node to another within you network? • Rule: Only testing with real hardware,real data, real mappings, real queries,real # of users, can find the sweet spot!

permission is strictly prohibited 87 Sharding • # of primary shards can’t be changed today for an existing index, what if I want to add more nodes? • If you use time-based indices it is easy, just change # of primary shards in your new indices • If you have logic indexes:  - reindex into new index with different sharding (logstash, stream2es CLI, reindex API);  - overallocate to allow for growth  - wait for Elasticsearch 2.0 (ootb reindexing )

permission is strictly prohibited 88 Shield

permission is strictly prohibited 89 Shield - Secure your cluster • User authentication (esusers, LDAP, more to come) • Fine-grained permissions and ACLs • Encrypted communications http/transport protocols • Enforce data integrity • Audit who is doing what • More

permission is strictly prohibited 90 Shield - Secure your cluster • Get started!  - bin/plugin -i elasticsearch/license/latest  - bin/plugin -i elasticsearch/shield/latest  - bin/elasticsearch  - bin/shield/esusers useradd es_admin —p password -r admin  - curl -u es_admin:password -XGET 'http://localhost:9200/'

permission is strictly prohibited 91 Shield - Secure your cluster • Preconfigured roles out of the box for the entire stack in $ES_HOME/config/shied/roles.yml  • <role_name>:  cluster: <comma separated cluster_privileges>  indices: <comma separated index privileges>

permission is strictly prohibited 92 Shield - Secure your cluster • Preconfigured roles out of the box for the entire stack in $ES_HOME/config/shied/roles.yml  • <role_name>:  cluster: <comma separated cluster_privileges>  indices: <comma separated index privileges> can use * wildcard for index names

permission is strictly prohibited 93 Shield - Secure your cluster • admin:  - cluster: all  - indices:  '*': all • marvel_agent:  - cluster: indices:admin/template/get, indices:admin/template/put  - indices:  '.marvel-*': indices:data/write/bulk, create_index • roles for all components (~/config/shield/roles.yml)

permission is strictly prohibited 94 Shield - Secure your cluster • Cluster actions privileges • cluster:admin/nodes/restart • cluster:admin/nodes/shutdown • cluster:admin/repository/delete • cluster:admin/repository/get • cluster:admin/repository/put • cluster:admin/repository/verify • cluster:admin/reroute • cluster:admin/settings/update • cluster:admin/snapshot/create • cluster:admin/snapshot/delete • cluster:admin/snapshot/get • cluster:admin/snapshot/restore • cluster:admin/snapshot/status • cluster:admin/plugin/license/get • cluster:admin/plugin/license/delete • cluster:admin/plugin/license/put • cluster:admin/indices/scroll/clear_all • cluster:admin/analyze • cluster:admin/shield/realm/cache/clear • cluster:monitor/health • cluster:monitor/nodes/hot_threads • cluster:monitor/nodes/info • cluster:monitor/nodes/stats • cluster:monitor/state • cluster:monitor/stats • cluster:monitor/task • indices:admin/template/delete • indices:admin/template/get • indices:admin/template/put • Indices actions privileges • indices:admin/aliases • indices:admin/aliases/exists • indices:admin/aliases/get • indices:admin/analyze • indices:admin/cache/clear • indices:admin/close • indices:admin/create • indices:admin/delete • indices:admin/exists • indices:admin/flush • indices:admin/get • indices:admin/mapping/delete • indices:admin/mapping/put • indices:admin/mappings/fields/get • indices:admin/mappings/get • indices:admin/open • indices:admin/optimize • indices:admin/refresh • indices:admin/settings/update • indices:admin/shards/search_shards • indices:admin/types/exists • indices:admin/validate/query • indices:admin/warmers/delete • indices:admin/warmers/get • indices:admin/warmers/put • indices:monitor/recovery • indices:monitor/segments • indices:monitor/settings/get • indices:monitor/stats • indices:monitor/status

permission is strictly prohibited 95 Shield - Secure your cluster • If basic authentication is not enough • Get your trusted CA signed certs   and Enable SSL on HTTP and/or transport protocols • ./bin/elasticsearch …  --shield.transport.ssl=true => internode communication and transport clients 

permission is strictly prohibited 96 Shield - Secure your cluster Cluster Master Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master Kibana

permission is strictly prohibited 97 Shield - Secure your cluster Cluster Master Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master Kibana

permission is strictly prohibited 98 Shield - Secure your cluster Cluster Master Data  node Client  node Data  node Data  node Cluster Cluster Master Cluster Master Kibana Java App

permission is strictly prohibited 99 Shield - Secure your cluster • If basic authentication is not enough • Get your trusted CA signed certs   and Enable SSL on HTTP and/or transport protocols • ./bin/elasticsearch …  --shield.transport.ssl=true => internode communication and transport clients 

permission is strictly prohibited 100 Shield - Secure your cluster • If basic authentication is not enough • Get your trusted CA signed certs   and Enable SSL on HTTP and/or transport protocols • ./bin/elasticsearch …  --shield.transport.ssl=true => internode communication and transport clients    --shield.http.ssl=true => HTTP Rest clients  e.g. curl -u user:password https://… 

permission is strictly prohibited 101 Shield - Secure your cluster

permission is strictly prohibited 102 Shield - Secure your cluster • Auditing • in elasticsearch.yml  -> shield.audit.enabled: true

permission is strictly prohibited 103 Shield - Secure your cluster • Track unintended/malicious use

permission is strictly prohibited 104 Shield - Secure your cluster • Track unintended misuse

permission is strictly prohibited 105 Shield - Secure your cluster • Track unintended/malicious use • curl -XDELETE -u frankie:password -k -s https:// localhost:9200/production_data • Unless user is mapped to a role with “indices:admin/ delete” permission then this will be rejected    {“error”:"AuthorizationException[action [indices:admin/delete] is unauthorized for user [frankie]]","status":403}

permission is strictly prohibited 106 Shield - Secure your cluster • In any case with auditing enabled, in audit log    [2015-05-19 23:34:19,393] [node1] [transport] [access_denied] origin_type=[rest], origin_address=[/ workstation232:36186], principal=[frankie], action=[indices:admin/delete], indices=[production_data]

permission is strictly prohibited 107 Shield - Secure your cluster • IP/hostname Filtering , in elasticsearch.yml • Allow only 192.168.0.1 in 192.168.0.0/24  shield.transport.filter.allow: "129.168.0.1"  shield.transport.filter.deny: "192.168.0.0/24" • Allow only 82.13.121.12 and my.host.com  shield.transport.filter.allow: [ "82.13.121.12", “my.host.com”]  shield.transport.filter.deny: _all • IPV6 support  shield.transport.filter.allow: "2001:0db8:1234::/48"  shield.transport.filter.deny: "1234:0db8:85a3:0000:0000:8a2e: 0370:7334" 

permission is strictly prohibited 108 Shield - Secure your cluster • Shield reference    docs  https://www.elastic.co/guide/en/shield/current/index.html    recorded webinar  https://www.elastic.co/webinars/shield-securing-your- data-in-elasticsearch     

permission is strictly prohibited 109 Questions?

Thank you! www.elastic.co Antonio Bonuccelli  @nellicus

Scaling Elasticsearch in production + Shield

Scaling Elasticsearch in production + Shield

More Decks by Elastic Co

Featured

Transcript