Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Refactoring a Solr based api application

Refactoring a Solr based api application

Held on Apache Lucene Eurocon 2011 in Barcelona

A6bb61c55fa41db28e68cd476cb54ab9?s=128

Torsten Bøgh Köster

April 13, 2012
Tweet

Transcript

  1. Architectural lessons learned from refactoring a Solr based API application.

    Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011
  2. Contents Shopping24 and it‘s API Technical scaling solutions Sharding Caching

    Solr Cores „Elastic“ infrastructure business requirements as key factor
  3. @tboeghk Software- and systems- architect 2 years experience with Solr

    3 years experience with Lucene Team of 7 Java developers currently at Shopping24
  4. shopping24 internet group

  5. 1 portal became n portals

  6. 30 partner shops became 700

  7. 500k to 7m documents

  8. index fact time •16 Gig Data •Single-Core-Layout •Up to 17s

    response time •Machine size limited •Stalled at solr version 1.4 •API designed for small tools
  9. scaling goal: 15-50m documents

  10. ask the nerds „Shard!“ That‘ll be fun! „Use spare compute

    cores at Amazon?“ breathe load into the cloud „Reduce that index size“ „Get rid of those long running queries!“
  11. data sharding ...

  12. ... is highly effective. 125ms 250ms 375ms 500ms 1 4

    8 12 16 20 1shard 2shard 3shard 4shard 6shard 8shard concurrent requests
  13. Sharding: size matters the bigger your index gets, the more

    complex your queries are, the more concurrent requests, the more sharding you need
  14. but wait ...

  15. Why do we have such a big index?

  16. 7m documents vs. 2m active poducts

  17. fashion product lifecycle meets SEO Bastografie / photocase.com

  18. Separation of duties! Remove unsearchable data from your index.

  19. Why do we have complex queries?

  20. A Solr index designed for 1 portal

  21. Grown into a multi-portal index

  22. Let “sharding“ follow your data ...

  23. ... and build separate cores for every client.

  24. Duplicate data as long as access is fast. andybahn /

    photocase.com
  25. Streamline your index provisioning process.

  26. A thousand splendid cores at your fingertips.

  27. Throwing hardware at problems. Automated.

  28. evil traps: latency, $$

  29. mirror your complete system – solve load balancer problems froodmat

    / photocase.com
  30. I said faster!

  31. use a cache layer like Varnish.

  32. What about those complex queries? Why do we have them?

    And how do we get rid of them?
  33. Lost in encapsulation: Solr API exposed to world.

  34. What‘s the key factor?

  35. look at your business requirements

  36. decrease complexity

  37. Questions? Comments? Ideas? Twitter: @tboeghk Github: @tboeghk Email: torsten.koester@s24.com Web:

    http://www.s24.com Images: sxc.hu (unless noted otherwise)