permission is strictly prohibited 2 Intro • Antonio Bonuccelli – Support engineer – Joined september 2014 – Interested in all things security • Elastic – Founded in 2012 – Distributed company – Mission: getting immediate actionable insight from data – Open-source: Elasticsearch,Logstash,Kibana – Commercial: Marvel,Shield,Watcher – More to come
permission is strictly prohibited 3 Intro • Antonio Bonuccelli – Support engineer – Joined september 2014 – Interested in all things security • Elastic – Founded in 2012 – Distributed company – Mission: getting immediate actionable insight from data – Open-source: Elasticsearch,Logstash,Kibana – Commercial: Marvel,Shield,Watcher – More to come
permission is strictly prohibited 5 How the story begins • Just download the latest version and install it on the first available server • Don’t change any defaults (good!) • Install logstash, parse/process/enrich some data and send it to ES • Let your application talk to ES =>Start launching indexing and search requests at your single node cluster • Install kibana for visualising data =>Run even more search operations
permission is strictly prohibited 6 How the story begins • Just download the latest version and install it on the first available server • Don’t change any defaults (good!) • Install logstash, parse/process/enrich some data and send it to ES • Let your application talk to ES • =>Start throwing indexing and search operations • Install kibana and make management or ops happy • =>Throw more search operations
permission is strictly prohibited 7 How the story begins • Just download the latest version and install it on the first available server • Don’t change any defaults (good!) • Install logstash, parse/process/enrich some data and send it to ES • Let your application talk to ES • =>Start throwing indexing and search operations • Install kibana and make management or ops happy • =>Throw more search operations
permission is strictly prohibited 8 How the story begins • Just download the latest version and install it on the first available server • Don’t change any defaults (good!) • Install logstash, parse/process/enrich some data and send it to ES • Let your application talk to ES • =>Start throwing indexing and search operations • Install kibana and make management or ops happy • =>Throw more search operations
permission is strictly prohibited 9 How the story begins • Just download the latest version and install it on the first available server • Don’t change any defaults (good!) • Install logstash, parse/process/enrich some data and send it to ES • Let your application talk to ES • =>Start throwing indexing and search operations • Install kibana and make management or ops happy • =>Throw more search operations
permission is strictly prohibited 12 What happens next • # of indexes and/or data grows • # of queries per second grows • each index comes with a cost (disk space,file handles, memory)
permission is strictly prohibited 14 Index cost - Fielddata • Field data is lazily loaded (by default) in memory once queries with sorting, aggregation, certain filters like geo-location are run • All documents are loaded, not just the ones matching your query, even other types. • Field data cache is not a transient cache. • Expensive to load, so loading is done once. Evictions are very costly performance wise • indices.fielddata.cache.size unbounded by default • Fielddata is usually the major offender for memory consumption in an elasticsearch cluster.
permission is strictly prohibited 15 Index cost - Fielddata • Monitoring fielddata • Per node: GET /_nodes/stats/indices/fielddata?fields=* • Per index: GET /_stats/fielddata?fields=* • Per node/index: GET /_nodes/stats/indices/fielddata? level=indices&fields=*
permission is strictly prohibited 18 Index cost - Fielddata - what can I do? • Use doc-values instead (will be default from 2.0) • Old data will not be evicted by default unless new data needs to be loaded (LRU), keep only most/new accessed data: indices.fielddata.cache.size: 40% (unbounded default) • Circuit breaker will estimate size of fielddata from fields in query before actually loading it into memory indices.breaker.fielddata.limit (60% default) • indices.breaker.fielddata.limit > indices.fielddata.cache.size • Add more nodes (memory)
permission is strictly prohibited 19 Index cost - Filter Cache • Almost all filters are cached into memory • Caching can be disabled on a per filter basis • Filters don’t score documents – they simply include or exclude. Done through Bitsets => arrays with 1 and 0 that tells Elasticsearch whether a document matches (or not) • indices.cache.filter.size, which defaults to 10% (LRU) • Watch out for constant filter evictions
permission is strictly prohibited 22 Index cost - Segments • Index is made of shards • Shards are made of segments • Segments do use file handles (and some memory) • In general more shards, means more resources in use GET_nodes/stats?pretty" -s |egrep -A8 segments "segments" : { "count" : 367, "memory_in_bytes" : 47295014, "index_writer_memory_in_bytes" : 164560, "index_writer_max_memory_in_bytes" : 446587695, "version_map_memory_in_bytes" : 720, "fixed_bit_set_memory_in_bytes" : 145080 },
permission is strictly prohibited 23 Index cost - takeaways • Do you need all your indexes open and searchable? => (use curator to manage retention and more) • These considerations applies both to one node, or 75 nodes cluster. • The suggestions provided optimise resources with what you have but will not necessarily scale with your data growth if hardware stays the same. • You might still need to add nodes/memory
permission is strictly prohibited 25 What happens next • # of indexes and/or data grows • # of queries per second grows • each index comes with a cost (disk space,file handles, memory) • so do queries and can have an impact if poorly written • as the load grows you eventually run into performance degradation first • then
permission is strictly prohibited 26 What happens next • # of indexes and/or data grows • # of queries per second grows • each index comes with a cost (disk space,file handles, memory) • so do queries and can have an impact if poorly written • as the load grows you eventually run into performance degradation first • then
permission is strictly prohibited 28 What happens next • Master node master-node will attempt reconnecting 3 times x 30 seconds an unresponsive node, then will kick it out [2015-03-05 20:13:59,199][INFO ][cluster.service ] [master-node] removed {[node-23][- PkRFxhrRsAyyrZMscCMHw][node-23][inet[/ 192.168.0.23:9300]]{master=true},}, reason: zen- disco-master_failed ([node-23][- PkRFxhrRsAyyrZMscCMHw][node-23][inet[/ 192.168.0.23:9300]]{master=true})
permission is strictly prohibited 29 What happens next • And what if the currently elected master node is also serving queries and it’s the one experiencing long old GC?
permission is strictly prohibited 30 What happens next • And what if the currently elected master node is also serving queries and it’s the one experiencing long old GC?
permission is strictly prohibited 31 What happens next • And what if the currently elected master node is also serving queries and it’s the one experiencing long old GC?
permission is strictly prohibited 32 What happens next • And what if the currently elected master node is also serving queries and it’s the one experiencing long old GC?
permission is strictly prohibited 34 Architecting to scale • You’ve reached the limit with your single-node deployment • You’re looking to add more nodes as you want to allow more data to be indexed and • What and how to do?
permission is strictly prohibited 37 Architecting to scale ES node Your node - Functions Indexing Querying Cluster Master node.data:true comes for free node.master:true
permission is strictly prohibited 38 Architecting to scale • Very common to start with one node performing all the functions • Each function maps to a role • Role separation allow for horizontal scale and growth
permission is strictly prohibited 41 Architecting to scale - Definitions • Cluster state: “knowledge bundle” -> index mappings, routing table, shard location…. • Role definitions: - (elected) Master node: coordinates the cluster, only node able to apply changes to cluster state, publishes updated cluster state to all nodes. - Data node: performs indexing, can allocate shards locally, knows cluster state. - Client node: does not perform indexing or allocate shards locally, knows cluster state.
permission is strictly prohibited 42 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node
permission is strictly prohibited 43 Architecting to scale • Separating the roles brings great benefits: - Currently elected master is no longer subject to high memory consumption => long old GC => freezing up - Data nodes can greatly scale horizontally: reads, writes, total cluster heap size - Memory used for running expensive queries with sorting/aggregations is offloaded from the data nodes into client node
permission is strictly prohibited Cluster 44 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node
permission is strictly prohibited Cluster 45 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Looks good?
permission is strictly prohibited 46 Architecting to scale ES node Cluster Master Indexing Querying Dedicated Cluster Master Roles Data node Client node Data node Data node
permission is strictly prohibited Cluster 47 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node
permission is strictly prohibited Cluster 48 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node
permission is strictly prohibited Cluster 49 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node ??? ??? ???
permission is strictly prohibited Cluster 50 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node ??? ??? ??? No Masters available - Do nothing
permission is strictly prohibited Cluster 51 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node ??? ??? ??? No Masters available - Do nothing Depending on discovery.zen.no_master_block
permission is strictly prohibited Cluster 52 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node ??? ??? ??? No Masters available - Do nothing Depending on discovery.zen.no_master_block
permission is strictly prohibited Cluster 53 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node ??? ??? ??? No Masters available - Do nothing Feeling lonely tonight Depending on discovery.zen.no_master_block
permission is strictly prohibited 54 Architecting to scale • We have a dedicated master doing nothing else than coordinating the cluster, great! • However we have only one now, hence we have introduced a single point of failure • Need high-availability
permission is strictly prohibited Cluster 55 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node
permission is strictly prohibited 56 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster
permission is strictly prohibited 57 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master
permission is strictly prohibited 58 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master
permission is strictly prohibited 59 Architecting to scale • Great we have now high availability • However we have now potential for having a cluster “split-brain”
permission is strictly prohibited 60 Architecting to scale • Great we have now high availability • However we have now potential for having a cluster “split-brain”
permission is strictly prohibited 61 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 1 (default) • all nodes see at least one master -> split brain
permission is strictly prohibited 62 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 1 (default) • all nodes see at least one master -> split brain
permission is strictly prohibited 63 Architecting to scale • split brain will leave you with 2 different clusters most likely contating different data sets • discovery.zen.minimum_master_nodes -> sets the minimum number of master eligible nodes a node should "see" in order to win a master election. It must be set to a quorum of your master eligible nodes -> N/2 + 1 • so what if we set discovery.zen.minimum_master_nodes to N/2+1? => 2/2 + 1 = 2
permission is strictly prohibited 64 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Quorum not reached - do nothing
permission is strictly prohibited 65 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Quorum not reached - do nothing
permission is strictly prohibited 66 Architecting to scale • It is best to have at least 3 master eligible nodes • Protection against split-brain and cluster inoperability seen in scenario with 2 masters only • 3 dedicated master allow growing up to 50+ data nodes • You *can* run small clusters using generic purpose nodes (master/data/client) • But you will be exposed to problems should the currently elected master hit high long old GC
permission is strictly prohibited 67 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master
permission is strictly prohibited 68 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master Cluster Master
permission is strictly prohibited 69 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Cluster Master Quorum not reached - do nothing
permission is strictly prohibited 70 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Cluster Master Quorum not reached - do nothing
permission is strictly prohibited 71 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Cluster Master Quorum not reached - do nothing Replicate data
permission is strictly prohibited 72 Architecting to scale ES node Cluster Master Indexing Querying Cluster Master Roles Data node Client node Data node Data node Cluster Cluster Master • discovery.zen.minimum_master_nodes = 2/2+1 = 2 Cluster Master Quorum not reached - do nothing Replicate data
permission is strictly prohibited 74 Architecting to scale Cluster Master Data node Client node Data node Data node Cluster Cluster Master Cluster Master
permission is strictly prohibited 75 Architecting to scale Cluster Master Data node Client node Data node Data node Cluster Cluster Master Cluster Master Kibana
permission is strictly prohibited 76 Architecting to scale Cluster Master Data node Client node Data node Data node Cluster Cluster Master Cluster Master Kibana
permission is strictly prohibited 77 Architecting to scale Cluster Master Data node Client node Data node Data node Cluster Cluster Master Cluster Master Kibana - will act as coordinating node for each search - will perform intense in- memory sorting/aggregation - voiding the benefit of having dedicated client node
permission is strictly prohibited 78 Architecting to scale Cluster Master Data node Client node Data node Data node Cluster Cluster Master Cluster Master Kibana - Will loadbalance search request across all nodes
permission is strictly prohibited 80 Sharding • There is no magic formula to calculate • Implicitly referring to *primary* shards within *one* index in the next slides • There are considerations to make: - A single shard is a full stand-alone operable Lucene index - Two or more shards sitting on the same host will share the hardware resources among themselves - Remember “Index Cost -> Segments”
permission is strictly prohibited 81 Sharding • More considerations: - shard do have a cost - if you have a 1 data nodes cluster and create a new index, 10 primaries will probably be excessive. - IF you have 10 servers, then 10 primaries is fine and each of these will be using dedicated hw and effectively mean more concurrent operations and higher throughput - Thread pools are tied to number of cpu cores not the number of shards (e.g. search => 3x # of cores) - Focus on primary shards/nodes ratio instead;
permission is strictly prohibited 82 Sharding • Some tests using: -i7 3630QM CPU (2.4 GHz) -16GB RAM -Windows 8 64b (Though try to use Linux please!) -SSD for the OS, HDD for elasticsearch • Source: http://blog.trifork.com/2014/01/07/ elasticsearch-how-many-shards/
permission is strictly prohibited 85 Sharding • So is the rule to use one primary shard per server? • No • However it is a good way to start the sizing • At the end of the day it depends heavily on how much shards can a single server fully handle without degrading performance because of excessive sharding overhead.
permission is strictly prohibited 86 Sharding • You have a super powerful server with super fast SSD? try more than one shard per node then. • Another parameter to take into account, in case of shard relocation, how long would it take to transfer one shard from one node to another within you network? • Rule: Only testing with real hardware,real data, real mappings, real queries,real # of users, can find the sweet spot!
permission is strictly prohibited 87 Sharding • # of primary shards can’t be changed today for an existing index, what if I want to add more nodes? • If you use time-based indices it is easy, just change # of primary shards in your new indices • If you have logic indexes: - reindex into new index with different sharding (logstash, stream2es CLI, reindex API); - overallocate to allow for growth - wait for Elasticsearch 2.0 (ootb reindexing )
permission is strictly prohibited 89 Shield - Secure your cluster • User authentication (esusers, LDAP, more to come) • Fine-grained permissions and ACLs • Encrypted communications http/transport protocols • Enforce data integrity • Audit who is doing what • More
permission is strictly prohibited 91 Shield - Secure your cluster • Preconfigured roles out of the box for the entire stack in $ES_HOME/config/shied/roles.yml • <role_name>: cluster: <comma separated cluster_privileges> indices: <comma separated index privileges>
permission is strictly prohibited 92 Shield - Secure your cluster • Preconfigured roles out of the box for the entire stack in $ES_HOME/config/shied/roles.yml • <role_name>: cluster: <comma separated cluster_privileges> indices: <comma separated index privileges> can use * wildcard for index names
permission is strictly prohibited 95 Shield - Secure your cluster • If basic authentication is not enough • Get your trusted CA signed certs and Enable SSL on HTTP and/or transport protocols • ./bin/elasticsearch … --shield.transport.ssl=true => internode communication and transport clients
permission is strictly prohibited 99 Shield - Secure your cluster • If basic authentication is not enough • Get your trusted CA signed certs and Enable SSL on HTTP and/or transport protocols • ./bin/elasticsearch … --shield.transport.ssl=true => internode communication and transport clients
permission is strictly prohibited 100 Shield - Secure your cluster • If basic authentication is not enough • Get your trusted CA signed certs and Enable SSL on HTTP and/or transport protocols • ./bin/elasticsearch … --shield.transport.ssl=true => internode communication and transport clients --shield.http.ssl=true => HTTP Rest clients e.g. curl -u user:password https://…
permission is strictly prohibited 105 Shield - Secure your cluster • Track unintended/malicious use • curl -XDELETE -u frankie:password -k -s https:// localhost:9200/production_data • Unless user is mapped to a role with “indices:admin/ delete” permission then this will be rejected {“error”:"AuthorizationException[action [indices:admin/delete] is unauthorized for user [frankie]]","status":403}
permission is strictly prohibited 106 Shield - Secure your cluster • In any case with auditing enabled, in audit log [2015-05-19 23:34:19,393] [node1] [transport] [access_denied] origin_type=[rest], origin_address=[/ workstation232:36186], principal=[frankie], action=[indices:admin/delete], indices=[production_data]