Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing the limits of elasticsearch

Testing the limits of elasticsearch

A short presentation about how to find the maximum shard size for your elasticsearch instance. For more details on the subject, visit: http://blog.trifork.com/2013/09/26/maximum-shard-size-in-elasticsearch/

Avatar for Bogdan Dumitrescu

Bogdan Dumitrescu

October 02, 2013
Tweet

Other Decks in Programming

Transcript

  1. Have you ever wondered? • What’s the maximum number of

    shards a node can hold? • What’s the refresh rate supposed to be? • What’s the maximum shard size? Well, it depends...
  2. Testing with Wikipedia data • Indexing all Wikipedia english content

    in one shard (~42 GB XML file) • Test done on development machine ◦ Intel 4 core machine ◦ 16 GB of RAM ◦ SSD for the OS (Windows 8 64b) ◦ HDD for elasticsearch • elasticsearch 0.90.5 ◦ 6GB of RAM allocated to the JVM
  3. Indexing results - takeaways? • elasticsearch can index a lot

    of data in one shard • no slowdown in indexing speed apparent
  4. Querying tests • Query text chosen at random out of

    list of countries and US cities • Only returning non- redirect pages (contain more text)
  5. Querying tests • 1 user ◦ max - 600ms ◦

    average - 150ms • 25 users ◦ max - 28.5s ◦ average - 3.5s • 500 users ◦ max - 103s ◦ average - 28.5s
  6. • Depends on: ◦ data structure ◦ querying requirements ◦

    indexing requirements ◦ server configuration Maximum shard size
  7. Setting limits • Number of concurrent users: 200 • Querying

    requirements: 1s avg. • Indexing requirements: 1 thread • Concurrent querying and indexing
  8. Server monitoring - paramedic • No history • Limited information

    http://karmi.github.io/elasticsearch-paramedic/
  9. Server monitoring - Kibana • Idea of Amit Bronner @

    Beeld en Geluid ◦ https://github.com/abronner/elasticsearch-monitoring • Create index with the stats API data ◦ http://localhost:9200/_cluster/nodes/stats?all ◦ can also query per node • Index it every X seconds ◦ dashboards to your heart's desire