Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Graduating with Honors: How Quizlet is Scaling Elasticsearch to Help Children Learn

Graduating with Honors: How Quizlet is Scaling Elasticsearch to Help Children Learn

This talk was presented at the inaugural Elastic{ON} conference, http://elasticon.com

Session Abstract:

Elasticsearch is powering a generation of learning on Quizlet - a tool used by millions of students to study for classes and assessments. With over 60 million study sets, finding the right content is essential to user experience, making Elasticsearch a key piece of user experience and infrastructure. This talk will focus on how Quizlet scaled their cluster for the yearly ramp-up in traffic as students returned to school this past fall as well as touching on the benefits of working directly with Elastic’s support team in the process.

Presented by Peter Bakkum, Quizlet

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

March 11, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Graduating with Honors: How Quizlet is Scaling Elasticsearch to Help

    Students Learn Peter Bakkum - March 11, 2015
  2. { } CC-BY-ND 4.0 Fall traffic surge 2

  3. { } CC-BY-ND 4.0 Quizlet 3 Enter subhead here

  4. { } CC-BY-ND 4.0 Weekly unique visitors 4

  5. { } CC-BY-ND 4.0 5

  6. { } CC-BY-ND 4.0 Peter Bakkum @pbbakkum peter@quizlet.com •Infrastructure Lead

    at Quizlet •Background in Finance and Database Research 6
  7. { } CC-BY-ND 4.0 7 Enter subhead here

  8. { } CC-BY-ND 4.0 Quizlet's Elasticsearch clusters • 2 clusters:

    • Completion: Used for word auto-define. • 261 GB - 1,500 RPM (requests / min) • Search: Used for finding relevant site content. • 110 GB - 14,000 RPM • Search cluster: • 3 nodes running ES masters and Nginx load distributors. • 20 data nodes with 15 GB RAM. • 10 shards, 1 replica. • Constant index updates. 8
  9. { } CC-BY-ND 4.0 Queries • Users search for sets,

    classes, users, and images. • Boost on many factors - title, common classes, common schools, geographical distance, bounce rate of target, etc. • Groovy scripting to help with boosts. • We allow search indexers to hit Elasticsearch. 9
  10. { } CC-BY-ND 4.0 Quizlet's Platform •Running ~150 machines on

    Joyent. •Using SmartOS, a forked Solaris. •Kernel virtualization gives us great CPU performance. •Elasticsearch and Lucene are not certified on SmartOS. •Sigar is not zone-aware - you must configure Elasticsearch processors manually. •docs.joyent.com/jpc/running-elasticsearch-on-joyent-cloud 10
  11. { } CC-BY-ND 4.0 11

  12. { } CC-BY-ND 4.0 Problem 1 How do we duplicate

    production traffic? 12
  13. { } CC-BY-ND 4.0 em-proxy >  em-­‐proxy      

     -­‐l  8080        -­‐r  localhost:8081        -­‐d  localhost:8082,localhost:8083   github.com/igrigorik/em-proxy 13
  14. { } CC-BY-ND 4.0 Problem 2 How do we simulate

    elevated load? 14
  15. { } CC-BY-ND 4.0 Traffic replay We pieced this together

    from the simplest tools we could find: •dumpcap and tshark for capturing traffic - no proxy necessary. •pcap_tools (modified) for processing the packet capture. •httperf (modified) for replaying traffic. qreplay: our driver for these tools. 15
  16. { } CC-BY-ND 4.0 qreplay github.com/quizlet/qreplay >  qreplay  capture  -­‐-­‐capture-­‐time

     60  -­‐-­‐port  80   >  qreplay  replay  -­‐-­‐host  127.0.0.1  -­‐-­‐port  80  -­‐-­‐req-­‐sec  50 16
  17. { } CC-BY-ND 4.0 Problem 3 Optimizing the cluster 17

  18. { } CC-BY-ND 4.0 18

  19. { } CC-BY-ND 4.0 •Experimented with GC settings, ultimately moving

    from G1GC to CMS. •Explicitly set number of GC threads. •Experimented with niofs vs mmapping, found it didn’t make much difference for us. 19
  20. { } CC-BY-ND 4.0 •Experimented with request queue sizes. •Experimented

    with worker counts. Setting these based on the CPUs available made a real difference. 20
  21. { } CC-BY-ND 4.0 •We discovered that our workload is

    very CPU intensive. •Experimented with running Elasticsearch on SmartOS and Linux. •The ability to replay traffic was critical to ensuring our cluster could handle traffic peaks. 21
  22. { } CC-BY-ND 4.0 22

  23. { } Thanks for listening! Peter Bakkum - peter@quizlet.com -

    @pbbakkum
  24. { } This work is licensed under the Creative Commons

    Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA CC-BY-ND 4.0 24