Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Solr to Elasticsearch: The Evolution of Zendesk's Search

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
March 18, 2015
4.6k

Apache Solr to Elasticsearch: The Evolution of Zendesk's Search

Zendesk manages terabytes of customer data in the form of support ticket comments, user data, knowledge base content, etc. In Zendesk’s previous search architecture based on Apache Solr 3.6 (pre-cloud), manual sharding, operational overhead, difficulty of schema updates, and lack of near-real time indexing was limiting scalability, performance and development velocity. This talk will provide insights into the reasons Zendesk migrated from Apache Solr to Elasticsearch, the journey, and the lessons learned while building their new search and indexing architecture.

Presented by Stefan Will & Sameera Mahajani

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

March 18, 2015
Tweet

Transcript

  1. The Evolution of Search at Zendesk Stefan Will & Sameera

    Mahajani
  2. { } CC-BY-ND 4.0 Use Cases 2 End users Agents

  3. { } CC-BY-ND 4.0 3

  4. { } CC-BY-ND 4.0 4 70  Million   Tickets 12,000

      Customers 8.5  Million   New  Tickets/Month 40  Million   User  Records 2011 { 500M 1,000M 1,500M 2,000M
  5. { } CC-BY-ND 4.0 5 1.1  Billion   Tickets 52,000

      Customers 45  Million   New  Tickets/Month 670  Million   User  Records 2015 { 17  Million   Queries/Day 500M 1,000M 1,500M 2,000M
  6. { } CC-BY-ND 4.0 Why the heck did we migrate

    to Elasticsearch ? 6
  7. { } CC-BY-ND 4.0 History 7 Single Shard Rails MySQL

    Solr   Core Write  Master R/O  Slave R/O  Slave
  8. { } CC-BY-ND 4.0 History 8 Multiple Shards Rails MySQL

    Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core
  9. { } CC-BY-ND 4.0 History 9 Data  Size 0 50

    100 150 200 Shard
  10. { } CC-BY-ND 4.0 History 10 Multiple Clusters Rails MySQL

    Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core
  11. { } CC-BY-ND 4.0 History 11 Multiple Pods Rails MySQL

    Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core Rails MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core Rails MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core 52k  Accounts   45  Solr  Clusters   133  Solr  Servers   500 MySQL Shards 50  MySQL  Clusters   6  Pods   3  Data  Centers
  12. { } CC-BY-ND 4.0 Main Goals • Easily deploy new

    search features and bugfixes • Be able to easily implement search support for new products • Minimize the amount of effort spent on configuration management and developing automation scripts 12
  13. { } CC-BY-ND 4.0 Architecture 13

  14. { } CC-BY-ND 4.0 Architecture 14 Before Application Slave Slave

    Master Indexer DB DB DB VIP Slave Slave Master Indexer DB DB DB VIP Slave Slave Master Indexer DB DB DB VIP
  15. { } CC-BY-ND 4.0 Architecture 15 Before Application search2 search3

    search1 search1 Shard  1 Shard  2 Shard  3 S1 search5 search6 search4 search4 Shard  4 Shard  5 Shard  6 S2 search9 search10 search8 search7 Shard  7 Shard  8 Shard  9 S3
  16. { } CC-BY-ND 4.0 16 shard_1: port: 8080 core: shard_1

    host: solr04.pod1.ord.zdsys.com # Only used by shard mover master_shard_1: port: 8080 core: shard_1 host: search17.pod1.ord.zdsys.com shard_2: port: 8080 core: shard_2 host: solr04.pod1.ord.zdsys.com # Only used by shard mover master_shard_2: port: 8080 core: shard_2 host: search17.pod1.ord.zdsys.com shard_3: port: 8080 core: shard_3
  17. { } CC-BY-ND 4.0 Architecture 17 Now Search   Service

    ES   DataNode Indexer DB   Shard Search   Service ES   DataNode Indexer DB   Shard Search   Service ES   DataNode Indexer DB   Shard VIP Application
  18. { } CC-BY-ND 4.0 18 zendesk_search_service: http://elasticsearch.vip.pod1.ord.zdsys.com:8085

  19. { } CC-BY-ND 4.0 Index Structure • Separate indexes per

    type (tickets, users, groups etc.) • Complex schemas, with a lot of dynamic fields • Accounts split into multiple sub-indexes (ticket_1, ticket_2, …) • Use aliases to point to current index version • *All* queries are filtered by account id 19
  20. { } CC-BY-ND 4.0 20

  21. { } CC-BY-ND 4.0 21 ticket_1 ticket_2 ticket_3 DB1 DB2

    DB3 I1 I2 I3 ticket_N R Incremental   Indexers Database   Shards Elasticsearch   Indexes Reindexer
  22. { } CC-BY-ND 4.0 Monitoring • Metrics using Datadog •

    Alerting (icinga + datadog threshold alerts) • Index analytics using Kibana3 (installed as an Elasticsearch plugin) 22
  23. { } CC-BY-ND 4.0 Datadog for Operational Monitoring 23

  24. { } CC-BY-ND 4.0 Kibana for Index Analytics 24

  25. { } CC-BY-ND 4.0 Kibana for Index Analytics 25

  26. { } CC-BY-ND 4.0 Recommendations • build with reindexing in

    mind • Load test before ordering hardware • Heaps of RAM > SSD • rehearse cluster upgrades - then rehearse them again • Get your rolling restart process rock solid 26
  27. { } CC-BY-ND 4.0 Things we’re looking at • Use

    embedded client nodes in indexers and search service • Use Tribe Nodes to federate searches across clusters • Explore Shield and Marvel • Use Elasticsearch for KNN text classification 27
  28. { } Thank You swill@zendesk.com, smahajani@zendesk.com Stefan  Will,  Sameera  Mahajani,

     David  Bowen,  Jack  Li,  Bjornar  Sandvik,  Erin  Boyle,  Shyam  Sundaramurthy zendesk.com/careers
  29. { } CC-BY-ND 4.0 Appendix 29

  30. { } CC-BY-ND 4.0 Indexer Pipeline 30 • Tail&Database& IDs& Id&Producer&

    • Query&DB& content&for& changed&IDs& DB&Fetch& • Language& Detec<on& • I18n&Field& Expansion& Transform& • Filter&out&fields& incompa<ble& with&current& schema& Schema& Valida<on& • Bulk&index&to& Elas<csearch& based&on& Account&ID& Index&
  31. { } CC-BY-ND 4.0 Schema Templates { /* Ticket Mappings

    */ "settings": { "index": { "analysis": "_include(analysis.json)" } }, "mappings": { "ticket": { "_id": { "path": "id" }, "_all": { "enabled": false }, 31 Macro Java  Style   Comment Schemas are versioned: •ticket •01.json •02.json •…
  32. { } CC-BY-ND 4.0 Partitioned Indexes index#(account_id) = (account_id %

    N) + 1 32
  33. { } CC-BY-ND 4.0 Index Naming Scheme 33 ticket_1_v12_4 type

    index  # schema   version build  #
  34. { } CC-BY-ND 4.0 34