Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Evolution of (Elastic)Search at Yelp

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
March 18, 2015
3.9k

The Evolution of (Elastic)Search at Yelp

There was a time when Yelp was a local search company with one search engine and a single ‘search’ team. Now, Yelp has hundreds of search applications that rely on tens of search clusters run by tens of teams. Whether we use it for real time full text searches, or offline batch analysis jobs, Elasticsearch remains fundamental to the way Yelp scales search. In this talk, we will tell you the story of this transformation, highlighting the reason we chose Elasticsearch, the infrastructure we have built around Elasticsearch, and some of the applications that our engineers are using our search platform to achieve.

Presented by Joseph Lynch & Christopher Tidder, Yelp

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

March 18, 2015
Tweet

Transcript

  1. Joseph Lynch and Chris Tidder The Evolution of (Elastic) Search

    at Yelp
  2. Outline 1. A Bit About Yelp 2. Early Yelp Search

    Infrastructure 3. First Elasticsearch Cluster 4. Indexing Challenges 5. Query Challenges 6. The Yelp Search Platform 7. The Future of Search at Yelp 8. Questions
  3. What do people search on Yelp? • Local businesses

  4. What do people search on Yelp? • Reviews

  5. What do people search on Yelp? • Talk, Lists, Events

    and more
  6. What do people look for on Yelp? • Recommendations

  7. Yelp’s Elasticsearch Scale • # of production ES clusters: +30

    • Highest qps to a realtime ES cluster: 400-800 qps • Most machines in an ES cluster: 32
  8. Yelp’s Search Scale How did we get here?

  9. Early Search Infrastructure “Grab Bag” Search “Core” Search

  10. Early Search Infrastructure “Core” Search “Grab Bag” Search • Business

    search • Custom sharding • Lucene queries • Everything except business search • Per use clusters • Solr queries
  11. Requirements • Fast • Fast • Low level control •

    Fast “Grab Bag” Search • Easy to start new search applications • Easy for a non search engineer “Core” Search
  12. Was it Working? “Grab Bag” Search “Core” Search :-) :-(

  13. Early Search Infrastructure Problems • Needed a specific skillset •

    Low infrastructure re-use • Monitoring was poor • Iteration was slow
  14. “I hope you like Java”

  15. “I hope you like Java” ~3k lines of Java code

    to define two search endpoints
  16. “I hope you like XML”

  17. “I hope you like XML” > 1000 Lines of XML

    to define two search endpoints
  18. Early Search Infrastructure “Grab Bag” Search “Search Platform” ?

  19. What to build this house upon?

  20. What Did We Want in 2012? • To create new

    search applications! ◦ Expressive query language ◦ Fast iteration speeds • Low operational burden ◦ Mature technology ◦ Monitoring ◦ Self healing • Active community • Preferably based on Lucene
  21. What Did We Decide in 2012? • Wanted to keep

    search in-house • ElasticSearch seemed to be the best in show ◦ Mature (ish) ◦ Automatic recovery ◦ Well-documented ◦ Supported java and python ◦ Strong community
  22. The First Elasticsearch Clusters “Grab Bag” Search “Search Platform”

  23. yelp-main Index Query The First Elasticsearch Clusters

  24. The Indexing Problem We want elasticsearch to reflect the current

    data state.
  25. Old Indexing Method Make developers do it yelp-main 1 2

  26. Problems with Manual Indexing Manual

  27. Our solution? EI (Elasticindexer) • watch a table change log

    and make indexing requests based on changes to rows Elasticindexer
  28. Problems with Elasticindexer v1

  29. Elasticindexer v2 Ditch Gearman Embrace Cores

  30. Elasticindexer v2 Elasticindexer Events Indexers Elasticindexer Elasticindexer Elasticindexer Elasticindexer Datastore

  31. Template system Elasticindexer v2

  32. Partial updates Elasticindexer v2 Elasticindexer

  33. Queries - Ideally yelp-main Elasticsearch

  34. ES 0.90.1 (datacenter) list search review search nearby search ad

    delivery ES 1.2.1 (cloud) (cloud) (cloud) moar clusters moar search moar search moar search ad selection nearby service moar services moar services moar services Query Reality
  35. None
  36. Query Challenges • Many supported ways to query Elasticsearch ◦

    2+ languages ◦ 4+ Elasticsearch versions ◦ 5+ client libraries ◦ 10+ services • Performance is hard, need metrics • Security and auditing is hard, need control
  37. Query Solution: Apollo ES 0.90.1 (datacenter) ES 1.2.1 (cloud) (cloud)

    (cloud) moar clusters apollo review client nearby client moar search moar search ad client moar clients yelp-main nearby service ad delivery
  38. Composable Clients

  39. Composable Clients PyESClient Clientlib: PyES OfficialClient Clientlib: elasticsearch- py Cluster

    1 Location: Datacenter ES Version: 0.90.1 Cluster 2 Location: Datacenter ES Version: 1.0.1 Cluster 3 Location: Cloud ES Version: 1.2.1 Slow Query Logger Mux Tee OfficialClientCloud Clientlib: elasticsearch-py 50% 50% Request Path
  40. Performance

  41. What does it take to build a Search service today

    at Yelp? 1. Clusters (30+) a. `clops launch` AWS clusters b. create index schema, and indices on cluster 2. Indexers (40+) a. create the EI indexer and start tailing b. run one-time full index your table(s) 3. Query Clients (20+) a. build + test Apollo client The Search Platform
  42. The Search Platform What does the Search Platform provide? •

    Cluster: almost push button launch of cluster • Indexer: serialization, retries, position management, tunable number of workers, automatic indexing based on data changes and monitoring • Query client: serialization, load balancing, muxing, abstraction from ES versions, metrics and monitoring
  43. Future of Elasticsearch at Yelp Clusters • Terraform based clusters

    • Fully self service • New versions Indexing • Row based replication • Kafka event source • Storm indexers Querying • Full query client isolation • More metrics • Comparison dark launch
  44. • Port business search to Elasticsearch ◦ True real time-search

    ◦ Better geosharding Future of Search at Yelp
  45. • Open-source our tools and plugins at github.com/Yelp Future of

    Search at Yelp
  46. Twitter: @YelpEngineering Joey: jlynch@yelp.com Chris: cstidder@yelp.com Questions?

  47. This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0

    International License. To view a copy of this license, visit: http: //creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA CC-BY-ND 4.0
  48. None