Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Hackathon to Production: Elasticsearch @ F...

From Hackathon to Production: Elasticsearch @ Facebook

This talk was presented at the inaugural Elastic{ON} conference, http://elasticon.com

Session Abstract:

Facebook has been using Elasticsearch for 3 plus years, having gone from a simple enterprise search to over 40 tools across multiple clusters with 60+ million queries a day and growing. This talk will focus on the entire Elasticsearch journey, from a hackathon project to a self-service infrastructure used across internal tools and public production sites.

Presented by Peter Vulgaris, Facebook

Elastic Co

March 11, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. In The Beginning Google Search Appliance ▪ Sits in a

    rack, takes URLs and scrapes them ▪ Basically a block box ▪ No structured data other than rendered HTML pages ▪ No selective boosting ▪ Not really the hacker way
  2. Also In The Beginning Apache Solr ▪ No built-in way

    to handle querying multiple indices ▪ Manual sharding ▪ Verbose XML queries :( ▪ Built tools to handle what elasticsearch does out of the box
  3. Why Elasticsearch? It's easy to hack on ▪ Same power

    of Lucene with simpler REST + JSON interface ▪ Quick to get it up and running ▪ Automatic replication and rebalancing ▪ Great community
  4. 2012 2015 ▪ 1 cluster (0.18) ▪ 1 application (internal

    search) ▪ Tens of thousands of docs ▪ 7 clusters (that we know of, 1.2+) ▪ Dozens of applications ▪ 100+ nodes in multiple datacenters ▪ ~4 billion documents ▪ 4+ TB data ▪ 1500+ QPS (for WWW at least) ▪ 1 common deployment infrastructure ▪ 2-3 indexing frameworks
  5. Goals for Elasticsearch Products ▪ Engineer comes in knowing nothing

    about search ▪ Able to index docs and add search ASAP ▪ Path to more advanced usage
  6. Goals for Elasticsearch Infrastructure ▪ Spin up a new cluster

    with nodes in multiple DC's in minutes with common settings ▪ Survive the "storms" ▪ Transparent transitions for clients to new clusters ▪ Management, logging, alarms, etc.
  7. Getting Started With Elasticsearch The Old Way ▪ "I don't

    know anything about search, but I'm tasked with adding it to my tool. Help!" -Engineer ▪ "Good luck!" -Me ▪ Lucene in Action ▪ elasticsearch.org
  8. Getting Started With Elasticsearch The New Way ▪ Copy/paste documentation

    ▪ Sandbox environments ▪ Sample settings/mappings ▪ Indexing framework ▪ One config per index or type ▪ Scheduling, consistency, live updates, retries, etc.
  9. Getting Started With Elasticsearch The New Way Continued ▪ Finding

    stuff is still a little wild west ▪ Google-y query string query ▪ Elastica PHP library ▪ This tends to be the easier bit
  10. Ramping Up With Elasticsearch Old Way vs New Way ▪

    No longer spending entire internships adding indexing and search to products ▪ More tools teams have a “cool search guy” (quote is mine) ▪ More adoption and spreading to product teams ▪ Bottom line: less time learning elasticsearch and more time searching
  11. Intern Search Putting it all together ▪ Configs for wiki,

    dex, tasks, code, employees, etc. ▪ CTR, bounce rates and pins ▪ Query string query ▪ A/B testing
  12. Single Cluster LOL ▪ Just one cluster for all projects

    ▪ Pros: ▪ Simpler migrations to new versions for Elasticsearch ▪ Easier to debug issues ▪ Faster ramp-up for other teams ▪ Cons: ▪ When the cluster goes down...
  13. Multiple Clusters AKA Get More Sleep ▪ Engineers are dangerous

    ▪ Engineers don't want to worry about usage quotas ▪ Move fast and over-index your data ▪ Add search, head home and crack open a beer ▪ How do we scale this?

  14. Deployment Tupperware ▪ Simple config ▪ LXC containers ▪ Add/remove

    nodes ▪ Health checks ▪ Automatic node replacement ▪ Log aggregation
  15. Monitoring Scuba ▪ Free with config ▪ Watch CPU, heap,

    requests by endpoint, etc. ▪ Alarms ▪ Why not Marvel?
  16. Multiple Datacenters Redundancy ▪ Masters need low latency ▪ Still

    experimenting ▪ Data nodes in multiple datacenters ▪ Needs SHIELD-ing
  17. Rebuilding Indices AKA Disaster Recovery ▪ Cron job to save

    settings and mappings with version control ▪ Indexer configs can rebuild most core indices ▪ Product-specific fallbacks ▪ Full-text query on backing DB ▪ Watch for shard failures ▪ We need snapshots
  18. Migrations ▪ Attempt #1: 0.18 -> 0.19 ▪ Shutdown cluster,

    backup data, build new version and restart ▪ Oops, only partial data copy ▪ Corruptions = complete rebuild ▪ Attempt #2: 0.19 -> 0.20 ▪ Shutdown cluster, update nodes and restart ▪ Worked great with about an hour of downtime
  19. Live Migration! ▪ Attempt #3: 0.20 -> 0.90 ▪ Lots

    more teams using ES now ▪ Build cross-cluster replication mechanism based on elasticsearch-changes-plugin ▪ Live migration to new cluster... ▪ ...rollback to old cluster when boosting bug was found ▪ Second live migration attempt a success...
  20. Migrations Today Aliases for clusters ▪ Run shadow traffic to

    new cluster ▪ Cluster data in sync ▪ Check for exceptions ▪ Stats for good measure ▪ Flip a switch when we're ready ▪ Can flip back too!
  21. Shield Seamless security ▪ Shadow cluster with 1.4 and Shield

    ▪ On the fly HTTP -> HTTPS ▪ Let's try it in production! ▪ #yolo ▪ Didn't read the manual ▪ Now it's rock solid ▪ Running for weeks now ▪ H1 deploying to all clusters
  22. What's next? Upgrades ▪ Default HTTPS ▪ Automated snapshots to

    GlusterFS ▪ Role-based access control ▪ Not for security, but for sanity ▪ Wild-west cluster
  23. Lessons Learned The hard way ▪ One cluster is easy

    to manage. And easy to bring down. ▪ Search ranking is hard. So cheat. ▪ Make it easy for engineers and they will come.