Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch: You Know, for s/Search/Operations/

Elasticsearch: You Know, for s/Search/Operations/

Elasticsearch is a popular solution for search and analytics engines. However, it can also serve as a powerful tool for operations teams to provide easy application monitoring, log collection, and self-serve dashboarding and analysis tools. In this presentation we'll cover some of these use cases, and how operations can provide the most reliable and performant service for stakeholders.

Avatar for Tyler L

Tyler L

May 08, 2015
Tweet

More Decks by Tyler L

Other Decks in Technology

Transcript

  1. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited Speaker Bio • Infrastructure Engineering @ Elastic ◦ Previous: Qualtrics, Sandia National Laboratories, Blue Coat Systems, BYU • Background in systems, security, *nix, smattering of different coding experience (scripting, web dev, devops) • Happy as long as I’m automating things in a terminal • Permanent mental bindings for vim and zsh leothrix tylerjl tjll.net Introduction
  2. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • Prelude: Wat is this? • Why is this useful for Ops? • How? ◦ Architecture (hardware, net, etc.) ◦ Security (subnets, REST, etc.) ◦ Data in/Data out • What Could Go Wrong? • Q&A What We’ll Cover Introduction
  3. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • Scalable, fault-tolerant search and analytics engine • Ideal for search, fits other cases excellently as well • Open source, fast-moving, broad ecosystem ◦ kopf, paramedic, marvel, bigdesk, client libraries, etc. etc… ◦ Neat JS apps that run in browser and operate locally • Has given rise to the ELK stack: ◦ elasticsearch for storage ◦ logstash for log/event processing ◦ kibana for visualization Elasticsearch in a nutshell Prelude: What?
  4. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited What sets ES apart What sets ES apart as a search platform? • Free as in beer and in speech • Paired with logstash, nearly infinite inputs and outputs (and dead easy to extend) • Some nice ES-specific features ◦ geo mapping, percolator, tribe, etc. • Flurry of developer interest, lots of tutorials/use cases circulating • Aside: Different data processing paradigm (pre- vs. post-) & tradeoffs (reminder: ask me about this at the end if you’re interested) Prelude: What?
  5. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • ELK log analytics on the cheap (widely adopted, lots of development) ◦ Generic logs from OS-level processes (/var/log/) ◦ Application logs sent through message broker or other protocol ◦ Hardware logs sent to syslog listeners • Myriads of secondary use cases ◦ Network analytics ◦ Alerting - percolator ◦ SIEM - pipe snort events, etc. into elasticsearch ◦ srsly big data - can scale out to multiple clusters with tribe nodes Example Use Cases Why?
  6. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • Kibana gives the power to query to users directly without bothering ops, who we all know are already angry enough at computers • Unstructured/schema-less documents (paired with type mappings) means you can be somewhat hands-off even more in terms of data ingress • Less friction between dev and ops = happiness • No charging for $/byte means power to log everything, forever • Data lifecycle can be highly customized for graceful retirement & retention • Native clustering and elasticity means scaling is dead easy • Ops eye candy: a look at kopf, bigdesk, and paramedic ◦ https://github.com/tylerjl/vagrant-elk-box Operational Benefits Why?
  7. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • ES works equally well PaaS or on-premise • PaaS/Cloud ◦ Remember: discovery.zen.ping.multicast.enabled: false ◦ If on EC2, can use the EC2 plugin for host discovery ◦ Use application or OS level raiding for speed boost ◦ Don’t leave it open (CVE-2014-3120) • On-premise ◦ Good network throughput, fast disks, cores, 30GB RAM ◦ Be aware of multicast • Both: ◦ Size appropriately (RAM, disk, cores) ◦ Secure appropriately ◦ Design appropriately Architecture How?
  8. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • Shield ◦ Commercial plugin (i.e. comes with a support plan) ◦ Pretty thoroughly vetted (pentested, been through a few releases) ◦ Encryption throughout, RBAC, etc. etc. • Otherwise… ◦ Isolated subnet (avoid random joins) ◦ Sit behind proxy to catch actions (nginx?) ◦ Be aware of non-encrypted traffic/node chatter ◦ Get security req’s up-front so you can design indices/types appropriately ◦ Understand ES does not provide for access controls by default Other Operational Considerations How?
  9. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • The big question, spend time designing here: • Sources ◦ Application? Filesystem? Hardware devices? • Transit ◦ Open internet? Local network? Cloud? • Storage/Retrieval ◦ Access controls? Kibana or something else? What kind of latency/data expiry? Data in/Data out - Intro How?
  10. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • ES will guess at datatypes and will do pretty well (schemaless*...) • How about custom mappings? ◦ Dynamic mapping - i.e., tell ES to store every int_* field as integer, etc. ◦ Reindex! • Log buffering/HA ◦ Fluentd: use file buffers to avoid loss ◦ Logstash: pull from queue while FS buffering in dev ◦ Both: rely on extraneous source for queuing, don’t want ruby being a buffer • Data formats ◦ Use native JSON when possible to simplify life (parsing eats CPU) ◦ Grok makes this easier ◦ For common formats (syslog, S3 access logs) there’s community stuff available Data in/Data out - Sources How?
  11. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • Open internet? SSL ◦ Fluentd and logstash have this ◦ Use some HA designs to avoid loss (i.e. archive all to S3, define multiple log endpoints) • Enrich the data! ◦ GeoIP, timestamp parsing, tagging, etc. • Log passing ◦ For most needs, just use native input/output plugins ◦ Possibly to use native fluentd/lumberjack protocols ◦ For native application calls? Either route stdout to log files or use message broker • Avoid memory buffering, keep data safe! Data in/Data out - Transit How?
  12. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • Kibana ◦ Either talk to local ES node or remote (local is nice for LB, but isn’t free) ◦ Basic auth if needed (K4 passthrough) • Beware cluster-killers ◦ Huge time span facets/aggregations on analyzed fields ◦ Way too much resident data for cluster size ◦ Field lists that grow out of control (personal gripe) • Devs will find new and creative ways to break it (don’t shoot yourself in the foot) Data in/Data out - Storage/Retrieval How?
  13. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited Or, Preparing For the Worst: An Ops Tale What Could Go Wrong?
  14. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • How to fix ◦ See following slides on OOM ◦ Decrease shard number - either change defaults or expire data ◦ Get some RAID going on, either hardware or application ◦ ES analytics (bigdesk, hot threads, caches) Taking time to tweak usage patterns and data schemas will go a long way. Use doc_values, dynamic mappings. Most often OOM, which takes us to... Unresponsiveness What Could Go Wrong?
  15. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • How to tell ◦ Unresponsive nodes, slow queries ◦ Tail the logs and watch it happen • How to fix ◦ ES_HEAP_SIZE to 50% of RAM, max 30GB ◦ Make intelligent use of units of scale (shards, indices, etc.) ◦ Spend a day reading the guide and tune usage patterns (doc_values, analyzed versus non, decrease field count, etc.) ◦ Best practices will do a lot, scale out if there’s not much else to optimize OOM What Could Go Wrong?
  16. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • How to tell ◦ CPU iowait times • How to fix ◦ Keep RAM balance 50/50 for lucene FS caches ◦ RAID! ▪ Either hardware or application-level ▪ Gets you a cheap stripe, though SSDs will be easier ◦ Scale out for parallelized reads I/O What Could Go Wrong?
  17. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • How to tell ◦ Full disks? ◦ When Elasticsearch stops allocating shards to full nodes • How to fix ◦ Snapshot indices to S3 and delete ◦ Good workflow: ▪ Optimize rotated indices -> close -> snapshot -> delete ◦ ES is space-aware and will try to keep a cluster balanced space-wise ◦ Alternatively, just scale out Disk Space (eventually) What Could Go Wrong?
  18. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited Information • Elasticsearch documentation ◦ www.elastic.co/guide ◦ Elasticsearch - The Definitive Guide - for in-depth learning ◦ Official documentation, API docs, etc. ◦ Client library docs (javascript, ruby, python, java, php) • Get involved in the ES community ◦ www.elastic.co/community/meetups ◦ SLC Meetup! • Give feedback at: https://joind.in/talk/view/14000 Additional Resources