Elasticsearch: You Know, for s/Search/Operations/

elasticsearch: you know, for s/search/operations/ OpenWest 2015 Tyler Langlois

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written
permission is strictly prohibited Speaker Bio • Infrastructure Engineering @ Elastic ◦ Previous: Qualtrics, Sandia National Laboratories, Blue Coat Systems, BYU • Background in systems, security, *nix, smattering of different coding experience (scripting, web dev, devops) • Happy as long as I’m automating things in a terminal • Permanent mental bindings for vim and zsh leothrix tylerjl tjll.net Introduction

permission is strictly prohibited • Prelude: Wat is this? • Why is this useful for Ops? • How? ◦ Architecture (hardware, net, etc.) ◦ Security (subnets, REST, etc.) ◦ Data in/Data out • What Could Go Wrong? • Q&A What We’ll Cover Introduction

permission is strictly prohibited • Scalable, fault-tolerant search and analytics engine • Ideal for search, fits other cases excellently as well • Open source, fast-moving, broad ecosystem ◦ kopf, paramedic, marvel, bigdesk, client libraries, etc. etc… ◦ Neat JS apps that run in browser and operate locally • Has given rise to the ELK stack: ◦ elasticsearch for storage ◦ logstash for log/event processing ◦ kibana for visualization Elasticsearch in a nutshell Prelude: What?

permission is strictly prohibited What sets ES apart What sets ES apart as a search platform? • Free as in beer and in speech • Paired with logstash, nearly infinite inputs and outputs (and dead easy to extend) • Some nice ES-specific features ◦ geo mapping, percolator, tribe, etc. • Flurry of developer interest, lots of tutorials/use cases circulating • Aside: Different data processing paradigm (pre- vs. post-) & tradeoffs (reminder: ask me about this at the end if you’re interested) Prelude: What?

permission is strictly prohibited • ELK log analytics on the cheap (widely adopted, lots of development) ◦ Generic logs from OS-level processes (/var/log/) ◦ Application logs sent through message broker or other protocol ◦ Hardware logs sent to syslog listeners • Myriads of secondary use cases ◦ Network analytics ◦ Alerting - percolator ◦ SIEM - pipe snort events, etc. into elasticsearch ◦ srsly big data - can scale out to multiple clusters with tribe nodes Example Use Cases Why?

permission is strictly prohibited • Kibana gives the power to query to users directly without bothering ops, who we all know are already angry enough at computers • Unstructured/schema-less documents (paired with type mappings) means you can be somewhat hands-off even more in terms of data ingress • Less friction between dev and ops = happiness • No charging for $/byte means power to log everything, forever • Data lifecycle can be highly customized for graceful retirement & retention • Native clustering and elasticity means scaling is dead easy • Ops eye candy: a look at kopf, bigdesk, and paramedic ◦ https://github.com/tylerjl/vagrant-elk-box Operational Benefits Why?

permission is strictly prohibited • ES works equally well PaaS or on-premise • PaaS/Cloud ◦ Remember: discovery.zen.ping.multicast.enabled: false ◦ If on EC2, can use the EC2 plugin for host discovery ◦ Use application or OS level raiding for speed boost ◦ Don’t leave it open (CVE-2014-3120) • On-premise ◦ Good network throughput, fast disks, cores, 30GB RAM ◦ Be aware of multicast • Both: ◦ Size appropriately (RAM, disk, cores) ◦ Secure appropriately ◦ Design appropriately Architecture How?

permission is strictly prohibited • Shield ◦ Commercial plugin (i.e. comes with a support plan) ◦ Pretty thoroughly vetted (pentested, been through a few releases) ◦ Encryption throughout, RBAC, etc. etc. • Otherwise… ◦ Isolated subnet (avoid random joins) ◦ Sit behind proxy to catch actions (nginx?) ◦ Be aware of non-encrypted traffic/node chatter ◦ Get security req’s up-front so you can design indices/types appropriately ◦ Understand ES does not provide for access controls by default Other Operational Considerations How?

permission is strictly prohibited • The big question, spend time designing here: • Sources ◦ Application? Filesystem? Hardware devices? • Transit ◦ Open internet? Local network? Cloud? • Storage/Retrieval ◦ Access controls? Kibana or something else? What kind of latency/data expiry? Data in/Data out - Intro How?

permission is strictly prohibited • ES will guess at datatypes and will do pretty well (schemaless*...) • How about custom mappings? ◦ Dynamic mapping - i.e., tell ES to store every int_* field as integer, etc. ◦ Reindex! • Log buffering/HA ◦ Fluentd: use file buffers to avoid loss ◦ Logstash: pull from queue while FS buffering in dev ◦ Both: rely on extraneous source for queuing, don’t want ruby being a buffer • Data formats ◦ Use native JSON when possible to simplify life (parsing eats CPU) ◦ Grok makes this easier ◦ For common formats (syslog, S3 access logs) there’s community stuff available Data in/Data out - Sources How?

permission is strictly prohibited • Open internet? SSL ◦ Fluentd and logstash have this ◦ Use some HA designs to avoid loss (i.e. archive all to S3, define multiple log endpoints) • Enrich the data! ◦ GeoIP, timestamp parsing, tagging, etc. • Log passing ◦ For most needs, just use native input/output plugins ◦ Possibly to use native fluentd/lumberjack protocols ◦ For native application calls? Either route stdout to log files or use message broker • Avoid memory buffering, keep data safe! Data in/Data out - Transit How?

permission is strictly prohibited • Kibana ◦ Either talk to local ES node or remote (local is nice for LB, but isn’t free) ◦ Basic auth if needed (K4 passthrough) • Beware cluster-killers ◦ Huge time span facets/aggregations on analyzed fields ◦ Way too much resident data for cluster size ◦ Field lists that grow out of control (personal gripe) • Devs will find new and creative ways to break it (don’t shoot yourself in the foot) Data in/Data out - Storage/Retrieval How?

permission is strictly prohibited Or, Preparing For the Worst: An Ops Tale What Could Go Wrong?

permission is strictly prohibited • How to fix ◦ See following slides on OOM ◦ Decrease shard number - either change defaults or expire data ◦ Get some RAID going on, either hardware or application ◦ ES analytics (bigdesk, hot threads, caches) Taking time to tweak usage patterns and data schemas will go a long way. Use doc_values, dynamic mappings. Most often OOM, which takes us to... Unresponsiveness What Could Go Wrong?

permission is strictly prohibited • How to tell ◦ Unresponsive nodes, slow queries ◦ Tail the logs and watch it happen • How to fix ◦ ES_HEAP_SIZE to 50% of RAM, max 30GB ◦ Make intelligent use of units of scale (shards, indices, etc.) ◦ Spend a day reading the guide and tune usage patterns (doc_values, analyzed versus non, decrease field count, etc.) ◦ Best practices will do a lot, scale out if there’s not much else to optimize OOM What Could Go Wrong?

permission is strictly prohibited • How to tell ◦ CPU iowait times • How to fix ◦ Keep RAM balance 50/50 for lucene FS caches ◦ RAID! ▪ Either hardware or application-level ▪ Gets you a cheap stripe, though SSDs will be easier ◦ Scale out for parallelized reads I/O What Could Go Wrong?

permission is strictly prohibited • How to tell ◦ Full disks? ◦ When Elasticsearch stops allocating shards to full nodes • How to fix ◦ Snapshot indices to S3 and delete ◦ Good workflow: ▪ Optimize rotated indices -> close -> snapshot -> delete ◦ ES is space-aware and will try to keep a cluster balanced space-wise ◦ Alternatively, just scale out Disk Space (eventually) What Could Go Wrong?

permission is strictly prohibited Questions? Q&A

permission is strictly prohibited Information • Elasticsearch documentation ◦ www.elastic.co/guide ◦ Elasticsearch - The Definitive Guide - for in-depth learning ◦ Official documentation, API docs, etc. ◦ Client library docs (javascript, ruby, python, java, php) • Get involved in the ES community ◦ www.elastic.co/community/meetups ◦ SLC Meetup! • Give feedback at: https://joind.in/talk/view/14000 Additional Resources

Elasticsearch: You Know, for s/Search/Operations/

Elasticsearch: You Know, for s/Search/Operations/

Tyler L

More Decks by Tyler L

Other Decks in Technology

Featured

Transcript

elasticsearch: you know, for s/search/operations/ OpenWest 2015 Tyler Langlois

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written