Upgrade to Pro — share decks privately, control downloads, hide ads and more …

5 years of running Elasticsearch in production

5 years of running Elasticsearch in production

At todays "Search Usergroup Berlin" ([1]) I gave this talk about how we operate Elasticsearch in production here at Infopark ([2]).

In this presentation I show our Elasticsearch cluster setup and what lessons we learned over the years.

This presentation was prepared by Anne Schulz ([3]) and me ([4]). All not referenced cat pictures are by Anne.

[1] https://www.meetup.com/de-DE/Search-UG-Berlin/events/239101829/
[2] https://infopark.com/
[3] https://twitter.com/AnneMoneSchulz
[4] https://twitter.com/_apepper

Alexander Pepper

May 30, 2017
Tweet

Other Decks in Programming

Transcript

  1. Index Size • ~8 million documents • ~45 GB data

    • ~300 search requests/min • ~120 index requests/min
  2. Our history with Elasticsearch • 2011: started with version 0.17

    • 2014: migrated to 1.x (with new setup, regular maintenance and backups) • 2016: migrated to 2.x
  3. Cluster Location • Amazon Web Services (AWS) • Region: eu-west-1

    (Ireland) • Using AWS Elastic Cloud Computing (EC2) • Management by AWS OpsWorks • Not accessible via the internet
  4. 3x EC2 Instances • r3.xlarge instance type • CPU: Intel

    Xeon 2,5 GHz • RAM: 30 GB • Hard drive: 80 GB SSD • OS: Amazon Linux (based on Red Hat)
  5. Cluster Discovery • External • Private instances inside a Virtual

    Private Cloud (VPC) • AWS Elastic Load Balancer (ELB) - only accessible from the VPC • API instances do have access to the ELB
  6. VPC Pitfalls • Network Address Translation (NAT) instance needed •

    Disable OpsWorks auto healing (for private instances)
  7. Installation • OpsWorks uses Chef Cookbooks • Comparable to ansible

    and puppet • Standard Cookbooks from
 https://supermarket.chef.io • Custom Cookbooks
  8. Packaging • On AWS Simple Storage Service (S3): • Cookbooks

    • Java • Elasticsearch • Elasticsearch plugins
  9. Cookbooks • disable swapiness • mount data volume • install

    Java • install Elasticsearch (with Monit) • install Elasticsearch plugins (Kibana, Marvel, Sense, etc.) • install backups • install monitoring
  10. Backup Cronjob • ruby script • only backup on master

    node • daily snapshot repository on AWS S3 • 30 days data retention • 1st of month 365 days data retention • data retention via S3 lifecycle rules • hourly incremental backup • current size per day: 50 GB
  11. Restore • ruby script • clones OpsWorks stack • starts

    instances • restores requested backup • Current runtime: • instance boot ~7 min • restore snapshot ~22 min
  12. Monitoring • Pingdom Server Monitoring (formerly known as Scout) •

    CPU • Diskspace/Open files • Memory/Swap • Cluster status • Number of nodes • Backup ("Say cheese") • AWS ELB
  13. Maintenance • Quarterly • Check for new versions • OS

    • Cookbooks • Java • Elasticsearch • Plugins (Kibana, Marvel, etc.)
  14. Maintenance • Check restore • Full reindex • For other

    product: snapshot restore + partial reindex
  15. Pitfalls • Minimum Master Nodes • 50% RAM for Elasticsearch

    • VPC: Network Address Translation (NAT) instance needed • Private VPC instance: Disable OpsWorks auto healing • OpsWorks: start Elasticsearch via monit
  16. Picture Sources • https://www.flickr.com/photos/sigalrm/31560595165/ • https://www.flickr.com/photos/selda_eigler/8686009651/ • https://www.flickr.com/photos/aon/7817771968/ • https://www.flickr.com/photos/nathanf/2314676429/

    • https://www.flickr.com/photos/renarl/3400468165 • https://www.flickr.com/photos/aon/6272938468/ • https://www.flickr.com/photos/muratlivaneli/6104145120 • https://www.flickr.com/photos/30884177@N08/4107269864/ • https://www.flickr.com/photos/aon/7817811212/ • https://www.flickr.com/photos/29278394@N00/4689679306/ • https://www.flickr.com/photos/pustovit/15867520885/