declare a solr index Infrastructure Upgrade version 2 - single node version 3 - current infrastructure Frenzy API Example of product operation Content recommendation Architecture http://elo7.com 2013 3/29
for University of São Paulo. Holds more than 12 years of experience in R&D deploying cool systems for companies like RedHat(JBoss), Globo and Locaweb. Currently is focusing his research and interests in machine learning, information retrieve and statistics. Felipe Besson - B.S. in Information Systems and Masters in Computer Sci- ence for the University of São Paulo, Brazil. His research focused on automated testing of web services composition. Now, he is expanding his horizons by working with searching, data mining, machine learning and other geek stuff. http://elo7.com 2013 5/29
queries per second • from 3500 to 4200 users on site per minute • 15000 requests per minute on AppServer • 160000 (avg.) bot/requests per day • 160000 (avg.) bot/requests per day • 1200000 indexed products • 20000 active sellers http://elo7.com 2013 6/29
from product where text like ’%query%’ • Search v0.1 - Sphinx – No delta index – Poor index/query performance for large scale dataset • Search v1.0 - Apache Solr http://elo7.com 2013 7/29
runs as a standalone full-text search server within a servlet container such as Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. http://elo7.com 2013 8/29
index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. http://elo7.com 2013 9/29
in Lucene A.K.A: That answer to the dollar question: Why isn’t this product appearing by searching "bleh" Lucene conceptual Scoring formula [?] score(q,d) = coord-factor(q,d).query-boost(q). A·B A B .doc-len-norm(d).score(d) http://elo7.com 2013 11/29
Scaling issues • M1.xlarge => m2.2xlarge => c1.xlarge 90% CPU • Solr 3.6 • Full index with ruby scripts (takes 3.5hs to full index ) http://elo7.com 2013 16/29
(20% CPU Usage) behind an amazon ELB • 1 m1.xlarge Search API (50% of logged users staging ) • Solr Data Importer (takes 15mn to full index) http://elo7.com 2013 17/29
indexing and deleting • Resources: Products, stores, auto-complete suggestions and categories • Recommendations Advantages • Removing search and indexing logic from marketplace • Providing a search service to other applications (e.g., mobile) http://elo7.com 2013 18/29
query term – filters: city, min. price and max. price – sort: featured, organic, oldest, newest, ... • output (json) – metadata (query status, response time and hits) – list of products – references (previous and next page urls) http://elo7.com 2013 19/29
analyse and take advantage on our users navigation patterns. • Any user receiver an unique ID • This ID follows any user’s interaction with the website • Whenever an user interacts with a product: views; add to favorites; social share; add to cart or buys. we trigger a convertion action. http://elo7.com 2013 23/29