Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Solr to Elasticsearch: The Evolution of Zendesk's Search

Elastic Co
March 18, 2015
4.6k

Apache Solr to Elasticsearch: The Evolution of Zendesk's Search

Zendesk manages terabytes of customer data in the form of support ticket comments, user data, knowledge base content, etc. In Zendesk’s previous search architecture based on Apache Solr 3.6 (pre-cloud), manual sharding, operational overhead, difficulty of schema updates, and lack of near-real time indexing was limiting scalability, performance and development velocity. This talk will provide insights into the reasons Zendesk migrated from Apache Solr to Elasticsearch, the journey, and the lessons learned while building their new search and indexing architecture.

Presented by Stefan Will & Sameera Mahajani

Elastic Co

March 18, 2015
Tweet

More Decks by Elastic Co

Transcript

  1. { } CC-BY-ND 4.0 4 70  Million   Tickets 12,000

      Customers 8.5  Million   New  Tickets/Month 40  Million   User  Records 2011 { 500M 1,000M 1,500M 2,000M
  2. { } CC-BY-ND 4.0 5 1.1  Billion   Tickets 52,000

      Customers 45  Million   New  Tickets/Month 670  Million   User  Records 2015 { 17  Million   Queries/Day 500M 1,000M 1,500M 2,000M
  3. { } CC-BY-ND 4.0 History 7 Single Shard Rails MySQL

    Solr   Core Write  Master R/O  Slave R/O  Slave
  4. { } CC-BY-ND 4.0 History 8 Multiple Shards Rails MySQL

    Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core
  5. { } CC-BY-ND 4.0 History 10 Multiple Clusters Rails MySQL

    Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core
  6. { } CC-BY-ND 4.0 History 11 Multiple Pods Rails MySQL

    Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core Rails MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core Rails MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core MySQL Solr   Core 52k  Accounts   45  Solr  Clusters   133  Solr  Servers   500 MySQL Shards 50  MySQL  Clusters   6  Pods   3  Data  Centers
  7. { } CC-BY-ND 4.0 Main Goals • Easily deploy new

    search features and bugfixes • Be able to easily implement search support for new products • Minimize the amount of effort spent on configuration management and developing automation scripts 12
  8. { } CC-BY-ND 4.0 Architecture 14 Before Application Slave Slave

    Master Indexer DB DB DB VIP Slave Slave Master Indexer DB DB DB VIP Slave Slave Master Indexer DB DB DB VIP
  9. { } CC-BY-ND 4.0 Architecture 15 Before Application search2 search3

    search1 search1 Shard  1 Shard  2 Shard  3 S1 search5 search6 search4 search4 Shard  4 Shard  5 Shard  6 S2 search9 search10 search8 search7 Shard  7 Shard  8 Shard  9 S3
  10. { } CC-BY-ND 4.0 16 shard_1: port: 8080 core: shard_1

    host: solr04.pod1.ord.zdsys.com # Only used by shard mover master_shard_1: port: 8080 core: shard_1 host: search17.pod1.ord.zdsys.com shard_2: port: 8080 core: shard_2 host: solr04.pod1.ord.zdsys.com # Only used by shard mover master_shard_2: port: 8080 core: shard_2 host: search17.pod1.ord.zdsys.com shard_3: port: 8080 core: shard_3
  11. { } CC-BY-ND 4.0 Architecture 17 Now Search   Service

    ES   DataNode Indexer DB   Shard Search   Service ES   DataNode Indexer DB   Shard Search   Service ES   DataNode Indexer DB   Shard VIP Application
  12. { } CC-BY-ND 4.0 Index Structure • Separate indexes per

    type (tickets, users, groups etc.) • Complex schemas, with a lot of dynamic fields • Accounts split into multiple sub-indexes (ticket_1, ticket_2, …) • Use aliases to point to current index version • *All* queries are filtered by account id 19
  13. { } CC-BY-ND 4.0 21 ticket_1 ticket_2 ticket_3 DB1 DB2

    DB3 I1 I2 I3 ticket_N R Incremental   Indexers Database   Shards Elasticsearch   Indexes Reindexer
  14. { } CC-BY-ND 4.0 Monitoring • Metrics using Datadog •

    Alerting (icinga + datadog threshold alerts) • Index analytics using Kibana3 (installed as an Elasticsearch plugin) 22
  15. { } CC-BY-ND 4.0 Recommendations • build with reindexing in

    mind • Load test before ordering hardware • Heaps of RAM > SSD • rehearse cluster upgrades - then rehearse them again • Get your rolling restart process rock solid 26
  16. { } CC-BY-ND 4.0 Things we’re looking at • Use

    embedded client nodes in indexers and search service • Use Tribe Nodes to federate searches across clusters • Explore Shield and Marvel • Use Elasticsearch for KNN text classification 27
  17. { } Thank You [email protected], [email protected] Stefan  Will,  Sameera  Mahajani,

     David  Bowen,  Jack  Li,  Bjornar  Sandvik,  Erin  Boyle,  Shyam  Sundaramurthy zendesk.com/careers
  18. { } CC-BY-ND 4.0 Indexer Pipeline 30 • Tail&Database& IDs& Id&Producer&

    • Query&DB& content&for& changed&IDs& DB&Fetch& • Language& Detec<on& • I18n&Field& Expansion& Transform& • Filter&out&fields& incompa<ble& with&current& schema& Schema& Valida<on& • Bulk&index&to& Elas<csearch& based&on& Account&ID& Index&
  19. { } CC-BY-ND 4.0 Schema Templates { /* Ticket Mappings

    */ "settings": { "index": { "analysis": "_include(analysis.json)" } }, "mappings": { "ticket": { "_id": { "path": "id" }, "_all": { "enabled": false }, 31 Macro Java  Style   Comment Schemas are versioned: •ticket •01.json •02.json •…
  20. { } CC-BY-ND 4.0 Index Naming Scheme 33 ticket_1_v12_4 type

    index  # schema   version build  #